Class SpellChecker

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class SpellChecker
    extends java.lang.Object
    implements java.io.Closeable
    Spell Checker class (Main class).
    (initially inspired by the David Spencer code).

    Example Usage:

      SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
      // To index a field of a user index:
      spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
      // To index a file containing words:
      spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
      String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
     
    • Field Detail

      • F_WORD

        public static final java.lang.String F_WORD
        Field name for each word in the ngram index.
        See Also:
        Constant Field Values
      • spellIndex

        Directory spellIndex
        the spell index
      • bStart

        private float bStart
        Boost value for start and end grams
      • bEnd

        private float bEnd
      • searcherLock

        private final java.lang.Object searcherLock
      • modifyCurrentIndexLock

        private final java.lang.Object modifyCurrentIndexLock
      • closed

        private volatile boolean closed
      • accuracy

        private float accuracy
      • comparator

        private java.util.Comparator<SuggestWord> comparator
    • Constructor Detail

      • SpellChecker

        public SpellChecker​(Directory spellIndex,
                            StringDistance sd)
                     throws java.io.IOException
        Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        sd - the StringDistance measurement to use
        Throws:
        java.io.IOException - if Spellchecker can not open the directory
      • SpellChecker

        public SpellChecker​(Directory spellIndex)
                     throws java.io.IOException
        Use the given directory as a spell checker index with a LevenshteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        Throws:
        java.io.IOException - if spellchecker can not open the directory
      • SpellChecker

        public SpellChecker​(Directory spellIndex,
                            StringDistance sd,
                            java.util.Comparator<SuggestWord> comparator)
                     throws java.io.IOException
        Use the given directory as a spell checker index with the given StringDistance measure and the given Comparator for sorting the results.
        Parameters:
        spellIndex - The spelling index
        sd - The distance
        comparator - The comparator
        Throws:
        java.io.IOException - if there is a problem opening the index
    • Method Detail

      • setSpellIndex

        public void setSpellIndex​(Directory spellIndexDir)
                           throws java.io.IOException
        Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
        Parameters:
        spellIndexDir - the spell directory to use
        Throws:
        AlreadyClosedException - if the Spellchecker is already closed
        java.io.IOException - if spellchecker can not open the directory
      • setComparator

        public void setComparator​(java.util.Comparator<SuggestWord> comparator)
        Sets the Comparator for the SuggestWordQueue.
        Parameters:
        comparator - the comparator
      • setAccuracy

        public void setAccuracy​(float acc)
        Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
        Parameters:
        acc - The new accuracy
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug)
                                          throws java.io.IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        Returns:
        String[]
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug,
                                                 float accuracy)
                                          throws java.io.IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[]
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public java.lang.String[] suggestSimilar​(java.lang.String word,
                                                 int numSug,
                                                 IndexReader ir,
                                                 java.lang.String field,
                                                 SuggestMode suggestMode,
                                                 float accuracy)
                                          throws java.io.IOException
        Suggest similar words (optionally restricted to a field of an index).

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        ir - the indexReader of the user index (can be null see field param)
        field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
        suggestMode - (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
        Throws:
        java.io.IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
      • add

        private static void add​(BooleanQuery.Builder q,
                                java.lang.String name,
                                java.lang.String value,
                                float boost)
        Add a clause to a boolean query.
      • add

        private static void add​(BooleanQuery.Builder q,
                                java.lang.String name,
                                java.lang.String value)
        Add a clause to a boolean query.
      • formGrams

        private static java.lang.String[] formGrams​(java.lang.String text,
                                                    int ng)
        Form all ngrams for a given word.
        Parameters:
        text - the word to parse
        ng - the ngram length e.g. 3
        Returns:
        an array of all ngrams in the word and note that duplicates are not removed
      • clearIndex

        public void clearIndex()
                        throws java.io.IOException
        Removes all terms from the spell check index.
        Throws:
        java.io.IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed
      • exist

        public boolean exist​(java.lang.String word)
                      throws java.io.IOException
        Check whether the word exists in the index.
        Parameters:
        word - word to check
        Returns:
        true if the word exists in the index
        Throws:
        java.io.IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed
      • indexDictionary

        public final void indexDictionary​(Dictionary dict,
                                          IndexWriterConfig config,
                                          boolean fullMerge)
                                   throws java.io.IOException
        Indexes the data from the given Dictionary.
        Parameters:
        dict - Dictionary to index
        config - IndexWriterConfig to use
        fullMerge - whether or not the spellcheck index should be fully merged
        Throws:
        AlreadyClosedException - if the Spellchecker is already closed
        java.io.IOException - If there is a low-level I/O error.
      • getMin

        private static int getMin​(int l)
      • getMax

        private static int getMax​(int l)
      • createDocument

        private static Document createDocument​(java.lang.String text,
                                               int ng1,
                                               int ng2)
      • addGram

        private static void addGram​(java.lang.String text,
                                    Document doc,
                                    int ng1,
                                    int ng2)
      • releaseSearcher

        private void releaseSearcher​(IndexSearcher aSearcher)
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • ensureOpen

        private void ensureOpen()
      • close

        public void close()
                   throws java.io.IOException
        Close the IndexSearcher used by this SpellChecker
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - if the close operation causes an IOException
        AlreadyClosedException - if the SpellChecker is already closed
      • swapSearcher

        private void swapSearcher​(Directory dir)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • createSearcher

        IndexSearcher createSearcher​(Directory dir)
                              throws java.io.IOException
        Creates a new read-only IndexSearcher
        Parameters:
        dir - the directory used to open the searcher
        Returns:
        a new read-only IndexSearcher
        Throws:
        java.io.IOException - f there is a low-level IO error
      • isClosed

        boolean isClosed()
        Returns true if and only if the SpellChecker is closed, otherwise false.
        Returns:
        true if and only if the SpellChecker is closed, otherwise false.