public class PDFMarkedContentExtractor extends PDFStreamEngine
Modifier and Type | Field | Description |
---|---|---|
protected java.lang.String |
outputEncoding |
encoding that text will be written in (or null).
|
Constructor | Description |
---|---|
PDFMarkedContentExtractor() |
Instantiate a new PDFTextStripper object.
|
PDFMarkedContentExtractor(java.lang.String encoding) |
Instantiate a new PDFTextStripper object.
|
PDFMarkedContentExtractor(java.util.Properties props) |
Instantiate a new PDFTextStripper object.
|
Modifier and Type | Method | Description |
---|---|---|
void |
beginMarkedContentSequence(COSName tag,
COSDictionary properties) |
|
void |
endMarkedContentSequence() |
|
java.util.List<PDMarkedContent> |
getMarkedContents() |
|
protected void |
processTextPosition(TextPosition text) |
This will process a TextPosition object and add the
text to the list of characters on a page.
|
void |
xobject(PDXObject xobject) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, inspectFontEncoding, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix
protected java.lang.String outputEncoding
public PDFMarkedContentExtractor() throws java.io.IOException
java.io.IOException
- If there is an error loading the properties.public PDFMarkedContentExtractor(java.util.Properties props) throws java.io.IOException
props
- The properties containing the mapping of operators to PDFOperator
classes.java.io.IOException
- If there is an error reading the properties.public PDFMarkedContentExtractor(java.lang.String encoding) throws java.io.IOException
encoding
- The encoding that the output will be written in.java.io.IOException
- If there is an error reading the properties.public void beginMarkedContentSequence(COSName tag, COSDictionary properties)
public void endMarkedContentSequence()
public void xobject(PDXObject xobject)
protected void processTextPosition(TextPosition text)
processTextPosition
in class PDFStreamEngine
text
- The text to process.public java.util.List<PDMarkedContent> getMarkedContents()