public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator
Constructor | Description |
---|---|
PDFTextStripperByArea() |
Constructor.
|
PDFTextStripperByArea(java.lang.String encoding) |
Instantiate a new PDFTextStripperArea object.
|
PDFTextStripperByArea(java.util.Properties props) |
Instantiate a new PDFTextStripperArea object.
|
Modifier and Type | Method | Description |
---|---|---|
void |
addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect) |
Add a new region to group text by.
|
void |
extractRegions(PDPage page) |
Process the page to extract the region text.
|
java.util.List<java.lang.String> |
getRegions() |
Get the list of regions that have been setup.
|
java.lang.String |
getTextForRegion(java.lang.String regionName) |
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text) |
This will process a TextPosition object and add the
text to the list of characters on a page.
|
void |
removeRegion(java.lang.String regionName) |
Delete a region to group text by.
|
protected void |
writePage() |
This will print the processed page text to the output stream.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparator
public PDFTextStripperByArea() throws java.io.IOException
java.io.IOException
- If there is an error loading properties.public PDFTextStripperByArea(java.util.Properties props) throws java.io.IOException
props
- The properties containing the mapping of operators to
PDFOperator classes.java.io.IOException
- If there is an error reading the properties.public PDFTextStripperByArea(java.lang.String encoding) throws java.io.IOException
encoding
- The encoding that the output will be written in.java.io.IOException
- If there is an error reading the properties.public void addRegion(java.lang.String regionName, java.awt.geom.Rectangle2D rect)
regionName
- The name of the region.rect
- The rectangle area to retrieve the text from.public void removeRegion(java.lang.String regionName)
regionName
- The name of the region to delete.public java.util.List<java.lang.String> getRegions()
public java.lang.String getTextForRegion(java.lang.String regionName)
regionName
- The name of the region to get the text from.public void extractRegions(PDPage page) throws java.io.IOException
page
- The page to extract the regions from.java.io.IOException
- If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition
in class PDFTextStripper
text
- The text to process.protected void writePage() throws java.io.IOException
writePage
in class PDFTextStripper
java.io.IOException
- If there is an error writing the text.