Class WordSegmenter
- java.lang.Object
-
- org.apache.lucene.analysis.cn.smart.WordSegmenter
-
class WordSegmenter extends java.lang.Object
Segment a sentence of Chinese text into words.
-
-
Field Summary
Fields Modifier and Type Field Description private HHMMSegmenter
hhmmSegmenter
private SegTokenFilter
tokenFilter
-
Constructor Summary
Constructors Constructor Description WordSegmenter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description SegToken
convertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)
Process aSegToken
so that it is ready for indexing.java.util.List<SegToken>
segmentSentence(java.lang.String sentence, int startOffset)
Segment a sentence into words withHHMMSegmenter
-
-
-
Field Detail
-
hhmmSegmenter
private HHMMSegmenter hhmmSegmenter
-
tokenFilter
private SegTokenFilter tokenFilter
-
-
Method Detail
-
segmentSentence
public java.util.List<SegToken> segmentSentence(java.lang.String sentence, int startOffset)
Segment a sentence into words withHHMMSegmenter
- Parameters:
sentence
- input sentencestartOffset
- start offset of sentence- Returns:
List
ofSegToken
-
convertSegToken
public SegToken convertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)
Process aSegToken
so that it is ready for indexing.This method calculates offsets and normalizes the token with
SegTokenFilter
.
-
-