Class MCParser


  • public class MCParser
    extends Object
    This class will parse page content streams and add Do operators in a marked-content sequence for every field that needs to be flattened.
    • Field Detail

      • LOGGER

        protected static final Logger LOGGER
        The Logger instance
      • RASFACTORY

        protected static final RandomAccessSourceFactory RASFACTORY
        Factory that will help us build a RandomAccessSource.
      • TSTAR

        public static final PdfLiteral TSTAR
        A new line operator
      • items

        protected StructureItems items
        The list with structure items.
      • annots

        protected PdfArray annots
        the annotations of the page that is being processed.
      • structParents

        protected PdfNumber structParents
        the StructParents of the page that is being processed.
      • xobjects

        protected PdfDictionary xobjects
        the XObject dictionary of the page that is being processed.
      • btWrite

        protected boolean btWrite
        Did we postpone writing a BT operator?
      • etExtra

        protected boolean etExtra
        Did we postpone writing a BT operator?
      • inText

        protected boolean inText
        Are we inside a BT/ET sequence?
      • text

        protected StringBuffer text
        A buffer containing text state.
    • Constructor Detail

      • MCParser

        public MCParser​(StructureItems items)
        Creates an MCParser object.
        Parameters:
        items - a list of StructureItem objects
    • Method Detail

      • populateOperators

        protected void populateOperators()
        Populates the operators variable.
      • parse

        public void parse​(PdfDictionary page,
                          PdfIndirectReference pageref)
                   throws IOException,
                          DocumentException
        Parses the content of a page, inserting the normal (/N) appearances (/AP) of annotations into the content stream as Form XObjects.
        Parameters:
        page - a page dictionary
        pageref - the reference to the page dictionary
        finalPage - indicates whether the page being processed is the final page in the document
        Throws:
        IOException
        DocumentException
      • dealWithXObj

        protected void dealWithXObj​(PdfName xobj)
        When an XObject with a StructParent is encountered, we want to remove it from the stack.
        Parameters:
        xobj - the name of an XObject
      • dealWithMcid

        protected void dealWithMcid​(PdfNumber mcid)
                             throws IOException,
                                    DocumentException
        When an MCID is encountered, the parser will check the list structure items and turn an annotation into an XObject if necessary.
        Parameters:
        mcid - the MCID that was encountered in the content stream
        Throws:
        IOException
        DocumentException
      • printOperator

        protected void printOperator​(PdfLiteral operator,
                                     List<PdfObject> operands)
                              throws IOException
        Adds an operator and its operands (if any) to baos.
        Parameters:
        operator - the operator
        operands - its operands
        Throws:
        IOException
      • printTextOperator

        protected void printTextOperator​(PdfLiteral operator,
                                         List<PdfObject> operands)
                                  throws IOException
        Adds an operator and its operands (if any) to baos, keeping track of the text state.
        Parameters:
        operator - the operator
        operands - its operands
        Throws:
        IOException
      • printsp

        protected void printsp​(PdfObject o)
                        throws IOException
        Writes a PDF object to the OutputStream, followed by a space character.
        Parameters:
        o - a PdfObject
        Throws:
        IOException
      • println

        protected void println​(PdfObject o)
                        throws IOException
        Writes a PDF object to the OutputStream, followed by a newline character.
        Parameters:
        o - a PdfObject
        Throws:
        IOException
      • checkBT

        protected void checkBT()
                        throws IOException
        Checks if a BT operator is waiting to be added.
        Throws:
        IOException
      • setInText

        protected void setInText​(boolean inText)
        Informs the parser that we're inside or outside a text object. Also sets a parameter indicating that BT needs to be written.
        Parameters:
        inText - true if we're inside.