Package org.apache.lucene.analysis.ko
Class KoreanTokenizerFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenizerFactory
-
- org.apache.lucene.analysis.ko.KoreanTokenizerFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class KoreanTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
Factory forKoreanTokenizer
.<fieldType name="text_ko" class="solr.TextField"> <analyzer> <tokenizer class="solr.KoreanTokenizerFactory" decompoundMode="discard" userDictionary="user.txt" userDictionaryEncoding="UTF-8" outputUnknownUnigrams="false" discardPunctuation="true" /> </analyzer> </fieldType>
Supports the following attributes:
- userDictionary: User dictionary path.
- userDictionaryEncoding: User dictionary encoding.
- decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See
KoreanTokenizer.DecompoundMode
- outputUnknownUnigrams: If true outputs unigrams for unknown words.
- discardPunctuation: true if punctuation tokens should be dropped from the output.
- Since:
- 7.4.0
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.String
DECOMPOUND_MODE
private static java.lang.String
DISCARD_PUNCTUATION
private boolean
discardPunctuation
private KoreanTokenizer.DecompoundMode
mode
static java.lang.String
NAME
SPI nameprivate static java.lang.String
OUTPUT_UNKNOWN_UNIGRAMS
private boolean
outputUnknownUnigrams
private static java.lang.String
USER_DICT_ENCODING
private static java.lang.String
USER_DICT_PATH
private UserDictionary
userDictionary
private java.lang.String
userDictionaryEncoding
private java.lang.String
userDictionaryPath
-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description KoreanTokenizerFactory()
Default ctor for compatibility with SPIKoreanTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new KoreanTokenizerFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description KoreanTokenizer
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactoryvoid
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).-
Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final java.lang.String NAME
SPI name- See Also:
- Constant Field Values
-
USER_DICT_PATH
private static final java.lang.String USER_DICT_PATH
- See Also:
- Constant Field Values
-
USER_DICT_ENCODING
private static final java.lang.String USER_DICT_ENCODING
- See Also:
- Constant Field Values
-
DECOMPOUND_MODE
private static final java.lang.String DECOMPOUND_MODE
- See Also:
- Constant Field Values
-
OUTPUT_UNKNOWN_UNIGRAMS
private static final java.lang.String OUTPUT_UNKNOWN_UNIGRAMS
- See Also:
- Constant Field Values
-
DISCARD_PUNCTUATION
private static final java.lang.String DISCARD_PUNCTUATION
- See Also:
- Constant Field Values
-
userDictionaryPath
private final java.lang.String userDictionaryPath
-
userDictionaryEncoding
private final java.lang.String userDictionaryEncoding
-
userDictionary
private UserDictionary userDictionary
-
mode
private final KoreanTokenizer.DecompoundMode mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams
-
discardPunctuation
private final boolean discardPunctuation
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws java.io.IOException
Description copied from interface:ResourceLoaderAware
Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
inform
in interfaceResourceLoaderAware
- Throws:
java.io.IOException
-
create
public KoreanTokenizer create(AttributeFactory factory)
Description copied from class:TokenizerFactory
Creates a TokenStream of the specified input using the given AttributeFactory- Specified by:
create
in classTokenizerFactory
-
-