Class LatvianStemmer


  • public class LatvianStemmer
    extends java.lang.Object
    Light stemmer for Latvian.

    This is a light version of the algorithm in Karlis Kreslin's PhD thesis A stemming algorithm for Latvian with the following modifications:

    • Only explicitly stems noun and adjective morphology
    • Stricter length/vowel checks for the resulting stems (verb etc suffix stripping is removed)
    • Removes only the primary inflectional suffixes: case and number for nouns ; case, number, gender, and definitiveness for adjectives.
    • Palatalization is only handled when a declension II,V,VI noun suffix is removed.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      (package private) static class  LatvianStemmer.Affix  
    • Constructor Summary

      Constructors 
      Constructor Description
      LatvianStemmer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private int numVowels​(char[] s, int len)
      Count the vowels in the string, we always require at least one in the remaining stem to accept it.
      int stem​(char[] s, int len)
      Stem a latvian word.
      private int unpalatalize​(char[] s, int len)
      Most cases are handled except for the ambiguous ones: s -> š t -> š d -> ž z -> ž
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • LatvianStemmer

        public LatvianStemmer()
    • Method Detail

      • stem

        public int stem​(char[] s,
                        int len)
        Stem a latvian word. returns the new adjusted length.
      • unpalatalize

        private int unpalatalize​(char[] s,
                                 int len)
        Most cases are handled except for the ambiguous ones:
        • s -> š
        • t -> š
        • d -> ž
        • z -> ž
      • numVowels

        private int numVowels​(char[] s,
                              int len)
        Count the vowels in the string, we always require at least one in the remaining stem to accept it.