Module org.apache.lucene.core
Class Lucene90BlockTreeTermsReader
- java.lang.Object
-
- org.apache.lucene.index.Fields
-
- org.apache.lucene.codecs.FieldsProducer
-
- org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,java.lang.Iterable<java.lang.String>
public final class Lucene90BlockTreeTermsReader extends FieldsProducer
A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that seekExact is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has its own fixed terms index (ie, it does not support a pluggable terms index implementation).NOTE: this terms dictionary supports min/maxItemsPerBlock during indexing to control how much memory the terms index uses.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use
CheckIndex
with the-verbose
option to see summary statistics on the blocks in the dictionary.
-
-
Field Summary
Fields Modifier and Type Field Description private FieldInfos
fieldInfos
private java.util.List<java.lang.String>
fieldList
private IntObjectHashMap<FieldReader>
fieldMap
(package private) static Outputs<BytesRef>
FST_OUTPUTS
(package private) IndexInput
indexIn
(package private) static BytesRef
NO_OUTPUT
(package private) static int
OUTPUT_FLAG_HAS_TERMS
(package private) static int
OUTPUT_FLAG_IS_FLOOR
(package private) static int
OUTPUT_FLAGS_MASK
(package private) static int
OUTPUT_FLAGS_NUM_BITS
(package private) PostingsReaderBase
postingsReader
(package private) java.lang.String
segment
(package private) static java.lang.String
TERMS_CODEC_NAME
(package private) static java.lang.String
TERMS_EXTENSION
Extension of terms file(package private) static java.lang.String
TERMS_INDEX_CODEC_NAME
(package private) static java.lang.String
TERMS_INDEX_EXTENSION
Extension of terms index file(package private) static java.lang.String
TERMS_META_CODEC_NAME
(package private) static java.lang.String
TERMS_META_EXTENSION
Extension of terms meta file(package private) IndexInput
termsIn
(package private) int
version
static int
VERSION_CURRENT
Current terms format.static int
VERSION_FST_CONTINUOUS_ARCS
The version that specialize arc store for continuous label in FST.static int
VERSION_MSB_VLONG_OUTPUT
Version that encode output as MSB VLong for better outputs sharing in FST, see GITHUB#12620.static int
VERSION_START
Initial terms format.-
Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY
-
-
Constructor Summary
Constructors Constructor Description Lucene90BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state)
Sole constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkIntegrity()
Checks consistency of this reader.void
close()
java.util.Iterator<java.lang.String>
iterator()
Returns an iterator that will step through all fields names.private static BytesRef
readBytesRef(IndexInput in)
int
size()
Returns the number of fields or -1 if the number of distinct field names is unknown.private static java.util.List<java.lang.String>
sortFieldNames(IntObjectHashMap<FieldReader> fieldMap, FieldInfos fieldInfos)
Terms
terms(java.lang.String field)
Get theTerms
for this field.java.lang.String
toString()
-
Methods inherited from class org.apache.lucene.codecs.FieldsProducer
getMergeInstance
-
-
-
-
Field Detail
-
NO_OUTPUT
static final BytesRef NO_OUTPUT
-
OUTPUT_FLAGS_NUM_BITS
static final int OUTPUT_FLAGS_NUM_BITS
- See Also:
- Constant Field Values
-
OUTPUT_FLAGS_MASK
static final int OUTPUT_FLAGS_MASK
- See Also:
- Constant Field Values
-
OUTPUT_FLAG_IS_FLOOR
static final int OUTPUT_FLAG_IS_FLOOR
- See Also:
- Constant Field Values
-
OUTPUT_FLAG_HAS_TERMS
static final int OUTPUT_FLAG_HAS_TERMS
- See Also:
- Constant Field Values
-
TERMS_EXTENSION
static final java.lang.String TERMS_EXTENSION
Extension of terms file- See Also:
- Constant Field Values
-
TERMS_CODEC_NAME
static final java.lang.String TERMS_CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
Initial terms format.- See Also:
- Constant Field Values
-
VERSION_MSB_VLONG_OUTPUT
public static final int VERSION_MSB_VLONG_OUTPUT
Version that encode output as MSB VLong for better outputs sharing in FST, see GITHUB#12620.- See Also:
- Constant Field Values
-
VERSION_FST_CONTINUOUS_ARCS
public static final int VERSION_FST_CONTINUOUS_ARCS
The version that specialize arc store for continuous label in FST.- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
Current terms format.- See Also:
- Constant Field Values
-
TERMS_INDEX_EXTENSION
static final java.lang.String TERMS_INDEX_EXTENSION
Extension of terms index file- See Also:
- Constant Field Values
-
TERMS_INDEX_CODEC_NAME
static final java.lang.String TERMS_INDEX_CODEC_NAME
- See Also:
- Constant Field Values
-
TERMS_META_EXTENSION
static final java.lang.String TERMS_META_EXTENSION
Extension of terms meta file- See Also:
- Constant Field Values
-
TERMS_META_CODEC_NAME
static final java.lang.String TERMS_META_CODEC_NAME
- See Also:
- Constant Field Values
-
termsIn
final IndexInput termsIn
-
indexIn
final IndexInput indexIn
-
postingsReader
final PostingsReaderBase postingsReader
-
fieldInfos
private final FieldInfos fieldInfos
-
fieldMap
private final IntObjectHashMap<FieldReader> fieldMap
-
fieldList
private final java.util.List<java.lang.String> fieldList
-
segment
final java.lang.String segment
-
version
final int version
-
-
Constructor Detail
-
Lucene90BlockTreeTermsReader
public Lucene90BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state) throws java.io.IOException
Sole constructor.- Throws:
java.io.IOException
-
-
Method Detail
-
readBytesRef
private static BytesRef readBytesRef(IndexInput in) throws java.io.IOException
- Throws:
java.io.IOException
-
sortFieldNames
private static java.util.List<java.lang.String> sortFieldNames(IntObjectHashMap<FieldReader> fieldMap, FieldInfos fieldInfos)
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Specified by:
close
in classFieldsProducer
- Throws:
java.io.IOException
-
iterator
public java.util.Iterator<java.lang.String> iterator()
Description copied from class:Fields
Returns an iterator that will step through all fields names. This will not return null.
-
terms
public Terms terms(java.lang.String field) throws java.io.IOException
Description copied from class:Fields
Get theTerms
for this field. This will return null if the field does not exist.
-
size
public int size()
Description copied from class:Fields
Returns the number of fields or -1 if the number of distinct field names is unknown. If >= 0,Fields.iterator()
will return as many field names.
-
checkIntegrity
public void checkIntegrity() throws java.io.IOException
Description copied from class:FieldsProducer
Checks consistency of this reader.Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.
- Specified by:
checkIntegrity
in classFieldsProducer
- Throws:
java.io.IOException
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-