Package org.apache.lucene.analysis.ckb
Class SoraniNormalizer
- java.lang.Object
-
- org.apache.lucene.analysis.ckb.SoraniNormalizer
-
public class SoraniNormalizer extends java.lang.Object
Normalizes the Unicode representation of Sorani text.Normalization consists of:
- Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
- Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
- Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
- Alternate (joining) form of 'h' (06BE) is converted to 0647
- Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
- Harakat, tatweel, and formatting characters such as directional controls are removed.
-
-
Field Summary
Fields Modifier and Type Field Description (package private) static char
AE
(package private) static char
DAMMA
(package private) static char
DAMMATAN
(package private) static char
DOTLESS_YEH
(package private) static char
FARSI_YEH
(package private) static char
FATHA
(package private) static char
FATHATAN
(package private) static char
HEH
(package private) static char
HEH_DOACHASHMEE
(package private) static char
KAF
(package private) static char
KASRA
(package private) static char
KASRATAN
(package private) static char
KEHEH
(package private) static char
REH
(package private) static char
RREH
(package private) static char
RREH_ABOVE
(package private) static char
SHADDA
(package private) static char
SUKUN
(package private) static char
TATWEEL
(package private) static char
TEH_MARBUTA
(package private) static char
YEH
(package private) static char
ZWNJ
-
Constructor Summary
Constructors Constructor Description SoraniNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
normalize(char[] s, int len)
Normalize an input buffer of Sorani text
-
-
-
Field Detail
-
YEH
static final char YEH
- See Also:
- Constant Field Values
-
DOTLESS_YEH
static final char DOTLESS_YEH
- See Also:
- Constant Field Values
-
FARSI_YEH
static final char FARSI_YEH
- See Also:
- Constant Field Values
-
KAF
static final char KAF
- See Also:
- Constant Field Values
-
KEHEH
static final char KEHEH
- See Also:
- Constant Field Values
-
HEH
static final char HEH
- See Also:
- Constant Field Values
-
AE
static final char AE
- See Also:
- Constant Field Values
-
ZWNJ
static final char ZWNJ
- See Also:
- Constant Field Values
-
HEH_DOACHASHMEE
static final char HEH_DOACHASHMEE
- See Also:
- Constant Field Values
-
TEH_MARBUTA
static final char TEH_MARBUTA
- See Also:
- Constant Field Values
-
REH
static final char REH
- See Also:
- Constant Field Values
-
RREH
static final char RREH
- See Also:
- Constant Field Values
-
RREH_ABOVE
static final char RREH_ABOVE
- See Also:
- Constant Field Values
-
TATWEEL
static final char TATWEEL
- See Also:
- Constant Field Values
-
FATHATAN
static final char FATHATAN
- See Also:
- Constant Field Values
-
DAMMATAN
static final char DAMMATAN
- See Also:
- Constant Field Values
-
KASRATAN
static final char KASRATAN
- See Also:
- Constant Field Values
-
FATHA
static final char FATHA
- See Also:
- Constant Field Values
-
DAMMA
static final char DAMMA
- See Also:
- Constant Field Values
-
KASRA
static final char KASRA
- See Also:
- Constant Field Values
-
SHADDA
static final char SHADDA
- See Also:
- Constant Field Values
-
SUKUN
static final char SUKUN
- See Also:
- Constant Field Values
-
-