- java.lang.Object
-
- org.apache.lucene.index.TermsHashPerField
-
- All Implemented Interfaces:
java.lang.Comparable<TermsHashPerField>
- Direct Known Subclasses:
FreqProxTermsWriterPerField
,TermVectorsConsumerPerField
abstract class TermsHashPerField extends java.lang.Object implements java.lang.Comparable<TermsHashPerField>
This class stores streams of information per term without knowing the size of the stream ahead of time. Each stream typically encodes one level of information like term frequency per document or term proximity. Internally this class allocates a linked list of slices that can be read by aByteSliceReader
for each term. Terms are first deduplicated in aBytesRefHash
once this is done internal data-structures point to the current offset of each stream that can be written to.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
TermsHashPerField.PostingsBytesStartArray
-
Field Summary
Fields Modifier and Type Field Description (package private) ByteBlockPool
bytePool
private BytesRefHash
bytesHash
private boolean
doNextCall
private java.lang.String
fieldName
private static int
HASH_INIT_SIZE
(package private) IndexOptions
indexOptions
private IntBlockPool
intPool
private int
lastDocID
private TermsHashPerField
nextPerField
(package private) ParallelPostingsArray
postingsArray
private ByteSlicePool
slicePool
private int[]
sortedTermIDs
private int
streamAddressOffset
private int
streamCount
private int[]
termStreamAddressBuffer
-
Constructor Summary
Constructors Constructor Description TermsHashPerField(int streamCount, IntBlockPool intPool, ByteBlockPool bytePool, ByteBlockPool termBytePool, Counter bytesUsed, TermsHashPerField nextPerField, java.lang.String fieldName, IndexOptions indexOptions)
streamCount: how many streams this field stores per term.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description private void
add(int textStart, int docID)
(package private) void
add(BytesRef termBytes, int docID)
Called once per inverted token.(package private) abstract void
addTerm(int termID, int docID)
Called when a previously seen term is seen again.private boolean
assertDocId(int docId)
int
compareTo(TermsHashPerField other)
(package private) abstract ParallelPostingsArray
createPostingsArray(int size)
Creates a new postings array of the specified size.(package private) void
finish()
Finish adding all instances of this field to the current document.(package private) java.lang.String
getFieldName()
(package private) TermsHashPerField
getNextPerField()
(package private) int
getNumTerms()
(package private) int[]
getSortedTermIDs()
Returns the sorted term IDs.(package private) void
initReader(ByteSliceReader reader, int termID, int stream)
private void
initStreamSlices(int termID, int docID)
Called when we first encounter a new term.(package private) abstract void
newPostingsArray()
Called when the postings array is initialized or resized.(package private) abstract void
newTerm(int termID, int docID)
Called when a term is seen for the first time.private int
positionStreamSlice(int termID, int docID)
(package private) void
reinitHash()
(package private) void
reset()
(package private) void
sortTerms()
Collapse the hash table and sort in-place; also sets this.sortedTermIDs to the results This method must not be called twice unlessreset()
orreinitHash()
was called.(package private) boolean
start(IndexableField field, boolean first)
Start adding a new field instance; first is true if this is the first time this field name was seen in the document.(package private) void
writeByte(int stream, byte b)
(package private) void
writeBytes(int stream, byte[] b, int offset, int len)
(package private) void
writeVInt(int stream, int i)
-
-
-
Field Detail
-
HASH_INIT_SIZE
private static final int HASH_INIT_SIZE
- See Also:
- Constant Field Values
-
nextPerField
private final TermsHashPerField nextPerField
-
intPool
private final IntBlockPool intPool
-
bytePool
final ByteBlockPool bytePool
-
slicePool
private final ByteSlicePool slicePool
-
termStreamAddressBuffer
private int[] termStreamAddressBuffer
-
streamAddressOffset
private int streamAddressOffset
-
streamCount
private final int streamCount
-
fieldName
private final java.lang.String fieldName
-
indexOptions
final IndexOptions indexOptions
-
bytesHash
private final BytesRefHash bytesHash
-
postingsArray
ParallelPostingsArray postingsArray
-
lastDocID
private int lastDocID
-
sortedTermIDs
private int[] sortedTermIDs
-
doNextCall
private boolean doNextCall
-
-
Constructor Detail
-
TermsHashPerField
TermsHashPerField(int streamCount, IntBlockPool intPool, ByteBlockPool bytePool, ByteBlockPool termBytePool, Counter bytesUsed, TermsHashPerField nextPerField, java.lang.String fieldName, IndexOptions indexOptions)
streamCount: how many streams this field stores per term. E.g. doc(+freq) is 1 stream, prox+offset is a second.
-
-
Method Detail
-
reset
void reset()
-
initReader
final void initReader(ByteSliceReader reader, int termID, int stream)
-
sortTerms
final void sortTerms()
Collapse the hash table and sort in-place; also sets this.sortedTermIDs to the results This method must not be called twice unlessreset()
orreinitHash()
was called.
-
getSortedTermIDs
final int[] getSortedTermIDs()
Returns the sorted term IDs.sortTerms()
must be called before
-
reinitHash
final void reinitHash()
-
add
private void add(int textStart, int docID) throws java.io.IOException
- Throws:
java.io.IOException
-
initStreamSlices
private void initStreamSlices(int termID, int docID) throws java.io.IOException
Called when we first encounter a new term. We must allocate slies to store the postings (vInt compressed doc/freq/prox), and also the int pointers to where (in ourByteBlockPool
storage) the postings for this term begin.- Throws:
java.io.IOException
-
assertDocId
private boolean assertDocId(int docId)
-
add
void add(BytesRef termBytes, int docID) throws java.io.IOException
Called once per inverted token. This is the primary entry point (for first TermsHash); postings use this API.- Throws:
java.io.IOException
-
positionStreamSlice
private int positionStreamSlice(int termID, int docID) throws java.io.IOException
- Throws:
java.io.IOException
-
writeByte
final void writeByte(int stream, byte b)
-
writeBytes
final void writeBytes(int stream, byte[] b, int offset, int len)
-
writeVInt
final void writeVInt(int stream, int i)
-
getNextPerField
final TermsHashPerField getNextPerField()
-
getFieldName
final java.lang.String getFieldName()
-
compareTo
public final int compareTo(TermsHashPerField other)
- Specified by:
compareTo
in interfacejava.lang.Comparable<TermsHashPerField>
-
finish
void finish() throws java.io.IOException
Finish adding all instances of this field to the current document.- Throws:
java.io.IOException
-
getNumTerms
final int getNumTerms()
-
start
boolean start(IndexableField field, boolean first)
Start adding a new field instance; first is true if this is the first time this field name was seen in the document.
-
newTerm
abstract void newTerm(int termID, int docID) throws java.io.IOException
Called when a term is seen for the first time.- Throws:
java.io.IOException
-
addTerm
abstract void addTerm(int termID, int docID) throws java.io.IOException
Called when a previously seen term is seen again.- Throws:
java.io.IOException
-
newPostingsArray
abstract void newPostingsArray()
Called when the postings array is initialized or resized.
-
createPostingsArray
abstract ParallelPostingsArray createPostingsArray(int size)
Creates a new postings array of the specified size.
-
-