Class CompositeBreakIterator


  • final class CompositeBreakIterator
    extends java.lang.Object
    An internal BreakIterator for multilingual text, following recommendations from: UAX #29: Unicode Text Segmentation. (http://unicode.org/reports/tr29/)

    See http://unicode.org/reports/tr29/#Tailoring for the motivation of this design.

    Text is first divided into script boundaries. The processing is then delegated to the appropriate break iterator for that specific script.

    This break iterator also allows you to retrieve the ISO 15924 script code associated with a piece of text.

    See also UAX #29, UTR #24

    • Method Detail

      • next

        int next()
        Retrieve the next break position. If the RBBI range is exhausted within the script boundary, examine the next script boundary.
        Returns:
        the next break position or BreakIterator.DONE
      • current

        int current()
        Retrieve the current break position.
        Returns:
        the current break position or BreakIterator.DONE
      • getRuleStatus

        int getRuleStatus()
        Retrieve the rule status code (token type) from the underlying break iterator
        Returns:
        rule status code (see RuleBasedBreakIterator constants)
      • getScriptCode

        int getScriptCode()
        Retrieve the UScript script code for the current token. This code can be decoded with UScript into a name or ISO 15924 code.
        Returns:
        UScript script code for the current token.
      • setText

        void setText​(char[] text,
                     int start,
                     int length)
        Set a new region of text to be examined by this iterator
        Parameters:
        text - buffer of text
        start - offset into buffer
        length - maximum length to examine