- java.lang.Object
-
- org.apache.lucene.search.spell.SpellChecker
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class SpellChecker extends java.lang.Object implements java.io.Closeable
Spell Checker class (Main class).
(initially inspired by the David Spencer code).Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory); // To index a field of a user index: spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field)); // To index a file containing words: spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt"))); String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
-
-
Field Summary
Fields Modifier and Type Field Description private float
accuracy
private float
bEnd
private float
bStart
Boost value for start and end gramsprivate boolean
closed
private java.util.Comparator<SuggestWord>
comparator
static float
DEFAULT_ACCURACY
The default minimum score to use, if not specified by callingsetAccuracy(float)
.static java.lang.String
F_WORD
Field name for each word in the ngram index.private java.lang.Object
modifyCurrentIndexLock
private StringDistance
sd
private IndexSearcher
searcher
private java.lang.Object
searcherLock
(package private) Directory
spellIndex
the spell index
-
Constructor Summary
Constructors Constructor Description SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index with aLevenshteinDistance
as the defaultStringDistance
.SpellChecker(Directory spellIndex, StringDistance sd)
Use the given directory as a spell checker index.SpellChecker(Directory spellIndex, StringDistance sd, java.util.Comparator<SuggestWord> comparator)
Use the given directory as a spell checker index with the givenStringDistance
measure and the givenComparator
for sorting the results.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static void
add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value)
Add a clause to a boolean query.private static void
add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value, float boost)
Add a clause to a boolean query.private static void
addGram(java.lang.String text, Document doc, int ng1, int ng2)
void
clearIndex()
Removes all terms from the spell check index.void
close()
Close the IndexSearcher used by this SpellCheckerprivate static Document
createDocument(java.lang.String text, int ng1, int ng2)
(package private) IndexSearcher
createSearcher(Directory dir)
Creates a new read-only IndexSearcherprivate void
ensureOpen()
boolean
exist(java.lang.String word)
Check whether the word exists in the index.private static java.lang.String[]
formGrams(java.lang.String text, int ng)
Form all ngrams for a given word.float
getAccuracy()
The accuracy (minimum score) to be used, unless overridden insuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
, to decide whether a suggestion is included or not.java.util.Comparator<SuggestWord>
getComparator()
Gets the comparator in use for ranking suggestions.private static int
getMax(int l)
private static int
getMin(int l)
StringDistance
getStringDistance()
Returns theStringDistance
instance used by thisSpellChecker
instance.void
indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge)
Indexes the data from the givenDictionary
.(package private) boolean
isClosed()
private IndexSearcher
obtainSearcher()
private void
releaseSearcher(IndexSearcher aSearcher)
void
setAccuracy(float acc)
Sets the accuracy 0 < minScore < 1; defaultDEFAULT_ACCURACY
void
setComparator(java.util.Comparator<SuggestWord> comparator)
Sets theComparator
for theSuggestWordQueue
.void
setSpellIndex(Directory spellIndexDir)
Use a different index as the spell checker index or re-open the existing index ifspellIndex
is the same value as given in the constructor.void
setStringDistance(StringDistance sd)
Sets theStringDistance
implementation for thisSpellChecker
instance.java.lang.String[]
suggestSimilar(java.lang.String word, int numSug)
Suggest similar words.java.lang.String[]
suggestSimilar(java.lang.String word, int numSug, float accuracy)
Suggest similar words.java.lang.String[]
suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode)
java.lang.String[]
suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode, float accuracy)
Suggest similar words (optionally restricted to a field of an index).private void
swapSearcher(Directory dir)
-
-
-
Field Detail
-
DEFAULT_ACCURACY
public static final float DEFAULT_ACCURACY
The default minimum score to use, if not specified by callingsetAccuracy(float)
.- See Also:
- Constant Field Values
-
F_WORD
public static final java.lang.String F_WORD
Field name for each word in the ngram index.- See Also:
- Constant Field Values
-
spellIndex
Directory spellIndex
the spell index
-
bStart
private float bStart
Boost value for start and end grams
-
bEnd
private float bEnd
-
searcher
private IndexSearcher searcher
-
searcherLock
private final java.lang.Object searcherLock
-
modifyCurrentIndexLock
private final java.lang.Object modifyCurrentIndexLock
-
closed
private volatile boolean closed
-
accuracy
private float accuracy
-
sd
private StringDistance sd
-
comparator
private java.util.Comparator<SuggestWord> comparator
-
-
Constructor Detail
-
SpellChecker
public SpellChecker(Directory spellIndex, StringDistance sd) throws java.io.IOException
Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.- Parameters:
spellIndex
- the spell index directorysd
- theStringDistance
measurement to use- Throws:
java.io.IOException
- if Spellchecker can not open the directory
-
SpellChecker
public SpellChecker(Directory spellIndex) throws java.io.IOException
Use the given directory as a spell checker index with aLevenshteinDistance
as the defaultStringDistance
. The directory is created if it doesn't exist yet.- Parameters:
spellIndex
- the spell index directory- Throws:
java.io.IOException
- if spellchecker can not open the directory
-
SpellChecker
public SpellChecker(Directory spellIndex, StringDistance sd, java.util.Comparator<SuggestWord> comparator) throws java.io.IOException
Use the given directory as a spell checker index with the givenStringDistance
measure and the givenComparator
for sorting the results.- Parameters:
spellIndex
- The spelling indexsd
- The distancecomparator
- The comparator- Throws:
java.io.IOException
- if there is a problem opening the index
-
-
Method Detail
-
setSpellIndex
public void setSpellIndex(Directory spellIndexDir) throws java.io.IOException
Use a different index as the spell checker index or re-open the existing index ifspellIndex
is the same value as given in the constructor.- Parameters:
spellIndexDir
- the spell directory to use- Throws:
AlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
- if spellchecker can not open the directory
-
setComparator
public void setComparator(java.util.Comparator<SuggestWord> comparator)
Sets theComparator
for theSuggestWordQueue
.- Parameters:
comparator
- the comparator
-
getComparator
public java.util.Comparator<SuggestWord> getComparator()
Gets the comparator in use for ranking suggestions.- See Also:
setComparator(Comparator)
-
setStringDistance
public void setStringDistance(StringDistance sd)
Sets theStringDistance
implementation for thisSpellChecker
instance.- Parameters:
sd
- theStringDistance
implementation for thisSpellChecker
instance
-
getStringDistance
public StringDistance getStringDistance()
Returns theStringDistance
instance used by thisSpellChecker
instance.- Returns:
- the
StringDistance
instance used by thisSpellChecker
instance.
-
setAccuracy
public void setAccuracy(float acc)
Sets the accuracy 0 < minScore < 1; defaultDEFAULT_ACCURACY
- Parameters:
acc
- The new accuracy
-
getAccuracy
public float getAccuracy()
The accuracy (minimum score) to be used, unless overridden insuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
, to decide whether a suggestion is included or not.- Returns:
- The current accuracy setting
-
suggestSimilar
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug) throws java.io.IOException
Suggest similar words.As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word
- the word you want a spell check done onnumSug
- the number of suggested words- Returns:
- String[]
- Throws:
java.io.IOException
- if the underlying index throws anIOException
AlreadyClosedException
- if the Spellchecker is already closed- See Also:
suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
-
suggestSimilar
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, float accuracy) throws java.io.IOException
Suggest similar words.As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word
- the word you want a spell check done onnumSug
- the number of suggested wordsaccuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the results- Returns:
- String[]
- Throws:
java.io.IOException
- if the underlying index throws anIOException
AlreadyClosedException
- if the Spellchecker is already closed- See Also:
suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
-
suggestSimilar
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode) throws java.io.IOException
- Throws:
java.io.IOException
-
suggestSimilar
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode, float accuracy) throws java.io.IOException
Suggest similar words (optionally restricted to a field of an index).As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.suggestMode
- (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)accuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the results- Returns:
- String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
- Throws:
java.io.IOException
- if the underlying index throws anIOException
AlreadyClosedException
- if the Spellchecker is already closed
-
add
private static void add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value, float boost)
Add a clause to a boolean query.
-
add
private static void add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value)
Add a clause to a boolean query.
-
formGrams
private static java.lang.String[] formGrams(java.lang.String text, int ng)
Form all ngrams for a given word.- Parameters:
text
- the word to parseng
- the ngram length e.g. 3- Returns:
- an array of all ngrams in the word and note that duplicates are not removed
-
clearIndex
public void clearIndex() throws java.io.IOException
Removes all terms from the spell check index.- Throws:
java.io.IOException
- If there is a low-level I/O error.AlreadyClosedException
- if the Spellchecker is already closed
-
exist
public boolean exist(java.lang.String word) throws java.io.IOException
Check whether the word exists in the index.- Parameters:
word
- word to check- Returns:
- true if the word exists in the index
- Throws:
java.io.IOException
- If there is a low-level I/O error.AlreadyClosedException
- if the Spellchecker is already closed
-
indexDictionary
public final void indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws java.io.IOException
Indexes the data from the givenDictionary
.- Parameters:
dict
- Dictionary to indexconfig
-IndexWriterConfig
to usefullMerge
- whether or not the spellcheck index should be fully merged- Throws:
AlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
- If there is a low-level I/O error.
-
getMin
private static int getMin(int l)
-
getMax
private static int getMax(int l)
-
createDocument
private static Document createDocument(java.lang.String text, int ng1, int ng2)
-
addGram
private static void addGram(java.lang.String text, Document doc, int ng1, int ng2)
-
obtainSearcher
private IndexSearcher obtainSearcher()
-
releaseSearcher
private void releaseSearcher(IndexSearcher aSearcher) throws java.io.IOException
- Throws:
java.io.IOException
-
ensureOpen
private void ensureOpen()
-
close
public void close() throws java.io.IOException
Close the IndexSearcher used by this SpellChecker- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
- if the close operation causes anIOException
AlreadyClosedException
- if theSpellChecker
is already closed
-
swapSearcher
private void swapSearcher(Directory dir) throws java.io.IOException
- Throws:
java.io.IOException
-
createSearcher
IndexSearcher createSearcher(Directory dir) throws java.io.IOException
Creates a new read-only IndexSearcher- Parameters:
dir
- the directory used to open the searcher- Returns:
- a new read-only IndexSearcher
- Throws:
java.io.IOException
- f there is a low-level IO error
-
isClosed
boolean isClosed()
- Returns:
true
if and only if theSpellChecker
is closed, otherwisefalse
.
-
-