Class STUniformSplitTermsWriter
- java.lang.Object
-
- org.apache.lucene.codecs.FieldsConsumer
-
- org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
-
- org.apache.lucene.codecs.uniformsplit.sharedterms.STUniformSplitTermsWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public class STUniformSplitTermsWriter extends UniformSplitTermsWriter
ExtendsUniformSplitTermsWriter
by sharing all the fields terms in the same dictionary and by writing all the fields of a term in the same block line.The
block file
contains all the term blocks for all fields. Each block line, for a single term, may have multiple fieldsTermState
. The block file also contains the fields metadata at the end of the file.The
dictionary file
contains a single trie (FST
bytes) for all fields.This structure is adapted when there are lots of fields. In this case the shared-terms dictionary trie is much smaller.
This
FieldsConsumer
requires a custommerge(MergeState, NormsProducer)
method for efficiency. The regular merge would scan all the fields sequentially, which internally would scan the whole shared-terms dictionary as many times as there are fields. Whereas the custom merge directly scans the internal shared-terms dictionary of all segments to merge, thus scanning once whatever the number of fields is.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
STUniformSplitTermsWriter.FieldsIterator
private class
STUniformSplitTermsWriter.FieldTerms
private class
STUniformSplitTermsWriter.MergingFieldTerms
(package private) class
STUniformSplitTermsWriter.SegmentPostings
private class
STUniformSplitTermsWriter.SegmentTerms
private static interface
STUniformSplitTermsWriter.SharedTermsWriter
private class
STUniformSplitTermsWriter.TermIterator<T>
private class
STUniformSplitTermsWriter.TermIteratorQueue<T>
-
Field Summary
-
Fields inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
blockEncoder, blockOutput, DEFAULT_DELTA_NUM_LINES, DEFAULT_TARGET_NUM_BLOCK_LINES, deltaNumLines, dictionaryOutput, fieldInfos, fieldMetadataWriter, MAX_NUM_BLOCK_LINES, maxDoc, postingsWriter, targetNumBlockLines
-
-
Constructor Summary
Constructors Modifier Constructor Description STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder)
protected
STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, java.lang.String codecName, int versionCurrent, java.lang.String termsBlocksExtension, java.lang.String dictionaryExtension)
STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder)
-
Method Summary
-
Methods inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
close, validateSettings, writeDictionary, writeEncodedFieldsMetadata, writeFieldsMetadata, writeFieldTerms, writePostingLine, writeUnencodedFieldsMetadata
-
-
-
-
Constructor Detail
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder) throws java.io.IOException
- Throws:
java.io.IOException
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder) throws java.io.IOException
- Throws:
java.io.IOException
-
STUniformSplitTermsWriter
protected STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, java.lang.String codecName, int versionCurrent, java.lang.String termsBlocksExtension, java.lang.String dictionaryExtension) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
write
public void write(Fields fields, NormsProducer normsProducer) throws java.io.IOException
Description copied from class:FieldsConsumer
Write all fields, terms and postings. This the "pull" API, allowing you to iterate more than once over the postings, somewhat analogous to using a DOM API to traverse an XML tree.Notes:
- You must compute index statistics, including each Term's docFreq and totalTermFreq, as well as the summary sumTotalTermFreq, sumTotalDocFreq and docCount.
- You must skip terms that have no docs and fields that have no terms, even though the provided Fields API will expose them; this typically requires lazily writing the field or term until you've actually seen the first term or document.
- The provided Fields instance is limited: you cannot call any methods that return statistics/counts; you cannot pass a non-null live docs when pulling docs/positions enums.
- Overrides:
write
in classUniformSplitTermsWriter
- Throws:
java.io.IOException
-
writeSegment
private void writeSegment(STUniformSplitTermsWriter.SharedTermsWriter termsWriter) throws java.io.IOException
Writes the new segment with the providedSTUniformSplitTermsWriter.SharedTermsWriter
, which can be either a single segment writer, or a multiple segment merging writer.- Throws:
java.io.IOException
-
writeSingleSegment
private java.util.Collection<FieldMetadata> writeSingleSegment(Fields fields, NormsProducer normsProducer, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) throws java.io.IOException
- Throws:
java.io.IOException
-
createFieldMetadataList
private java.util.List<FieldMetadata> createFieldMetadataList(java.util.Iterator<FieldInfo> fieldInfos, int maxDoc)
-
createFieldTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.FieldTerms> createFieldTermsQueue(Fields fields, java.util.List<FieldMetadata> fieldMetadataList) throws java.io.IOException
- Throws:
java.io.IOException
-
groupByTerm
private <T> void groupByTerm(STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue, STUniformSplitTermsWriter.TermIterator<T> topTermIterator, java.util.List<STUniformSplitTermsWriter.TermIterator<T>> groupedTermIterators)
-
writePostingLines
private void writePostingLines(BytesRef term, java.util.List<? extends STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.FieldTerms>> groupedFieldTerms, NormsProducer normsProducer, java.util.List<FieldMetadataTermState> termStates) throws java.io.IOException
- Throws:
java.io.IOException
-
nextTermForIterators
private <T> void nextTermForIterators(java.util.List<? extends STUniformSplitTermsWriter.TermIterator<T>> termIterators, STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue) throws java.io.IOException
- Throws:
java.io.IOException
-
writeFieldMetadataList
private int writeFieldMetadataList(java.util.Collection<FieldMetadata> fieldMetadataList) throws java.io.IOException
- Throws:
java.io.IOException
-
writeDictionary
protected void writeDictionary(int fieldsNumber, IndexDictionary.Builder dictionaryBuilder) throws java.io.IOException
- Throws:
java.io.IOException
-
merge
public void merge(MergeState mergeState, NormsProducer normsProducer) throws java.io.IOException
Description copied from class:FieldsConsumer
Merges in the fields from the readers inmergeState
. The default implementation skips and maps around deleted documents, and callsFieldsConsumer.write(Fields,NormsProducer)
. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
merge
in classFieldsConsumer
- Throws:
java.io.IOException
-
mergeSegments
private java.util.Collection<FieldMetadata> mergeSegments(MergeState mergeState, NormsProducer normsProducer, java.util.List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) throws java.io.IOException
- Throws:
java.io.IOException
-
createMergingFieldTermsMap
private java.util.Map<java.lang.String,STUniformSplitTermsWriter.MergingFieldTerms> createMergingFieldTermsMap(java.util.List<FieldMetadata> fieldMetadataList, int numSegments)
-
createSegmentTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.SegmentTerms> createSegmentTermsQueue(java.util.List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList) throws java.io.IOException
- Throws:
java.io.IOException
-
combineSegmentsFields
private void combineSegmentsFields(java.util.List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> groupedSegmentTerms, java.util.Map<java.lang.String,java.util.List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap)
-
combinePostingsPerField
private void combinePostingsPerField(BytesRef term, java.util.Map<java.lang.String,STUniformSplitTermsWriter.MergingFieldTerms> fieldTermsMap, java.util.Map<java.lang.String,java.util.List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap, java.util.List<STUniformSplitTermsWriter.MergingFieldTerms> groupedFieldTerms)
-
-