- java.lang.Object
-
- org.apache.lucene.index.OrdinalMap
-
- All Implemented Interfaces:
Accountable
public class OrdinalMap extends java.lang.Object implements Accountable
Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints representation.NOTE: this is a costly operation, as it must merge sort all terms, and may require non-trivial RAM once done. It's better to operate in segment-private ordinal space instead when possible.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
OrdinalMap.SegmentMap
private static class
OrdinalMap.TermsEnumPriorityQueue
-
Field Summary
Fields Modifier and Type Field Description private static long
BASE_RAM_BYTES_USED
(package private) LongValues
firstSegments
(package private) LongValues
globalOrdDeltas
IndexReader.CacheKey
owner
Cache key of whoever asked for this awful thing(package private) long
ramBytesUsed
(package private) OrdinalMap.SegmentMap
segmentMap
(package private) LongValues[]
segmentToGlobalOrds
(package private) long
valueCount
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description OrdinalMap(IndexReader.CacheKey owner, TermsEnum[] subs, OrdinalMap.SegmentMap segmentMap, float acceptableOverheadRatio)
Here is how the OrdinalMap encodes the mapping from global ords to local segment ords.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static OrdinalMap
build(IndexReader.CacheKey owner, SortedDocValues[] values, float acceptableOverheadRatio)
Create an ordinal map that uses the number of unique values of eachSortedDocValues
instance as a weight.static OrdinalMap
build(IndexReader.CacheKey owner, SortedSetDocValues[] values, float acceptableOverheadRatio)
Create an ordinal map that uses the number of unique values of eachSortedSetDocValues
instance as a weight.static OrdinalMap
build(IndexReader.CacheKey owner, TermsEnum[] subs, long[] weights, float acceptableOverheadRatio)
Creates an ordinal map that allows mapping ords to/from a merged space fromsubs
.java.util.Collection<Accountable>
getChildResources()
Returns nested resources of this class.int
getFirstSegmentNumber(long globalOrd)
Given a global ordinal, returns the index of the first segment that contains this term.long
getFirstSegmentOrd(long globalOrd)
Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment returngetFirstSegmentNumber(long)
).LongValues
getGlobalOrds(int segmentIndex)
Given a segment number, return aLongValues
instance that maps segment ordinals to global ordinals.long
getValueCount()
Returns the total number of unique terms in global ord space.long
ramBytesUsed()
Return the memory usage of this object in bytes.
-
-
-
Field Detail
-
BASE_RAM_BYTES_USED
private static final long BASE_RAM_BYTES_USED
-
owner
public final IndexReader.CacheKey owner
Cache key of whoever asked for this awful thing
-
valueCount
final long valueCount
-
globalOrdDeltas
final LongValues globalOrdDeltas
-
firstSegments
final LongValues firstSegments
-
segmentToGlobalOrds
final LongValues[] segmentToGlobalOrds
-
segmentMap
final OrdinalMap.SegmentMap segmentMap
-
ramBytesUsed
final long ramBytesUsed
-
-
Constructor Detail
-
OrdinalMap
OrdinalMap(IndexReader.CacheKey owner, TermsEnum[] subs, OrdinalMap.SegmentMap segmentMap, float acceptableOverheadRatio) throws java.io.IOException
Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume we have the following global mapping for a doc values field:
bar -> 0, cat -> 1, dog -> 2, foo -> 3
And our index is split into 2 segments with the following local mappings for that same doc values field:
Segment 0: bar -> 0, foo -> 1
Segment 1: cat -> 0, dog -> 1
We will then encode delta between the local and global mapping in a packed 2d array keyed by (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap:
[[0, 2], [1, 1]]The general algorithm for creating an OrdinalMap (skipping over some implementation details and optimizations) is as follows:
[1] Create and populate a PQ with (
TermsEnum
, index) tuples where index is the position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself will be ordered byTermsEnum.term()
[2] We will iterate through every term in the index now. In order to do so, we will start with the first term at the top of the PQ . We keep track of a global ord, and track the difference between the global ord and
TermsEnum.ord()
in ordDeltas, which maps:
(segmentIndex,TermsEnum.ord()
) -> globalTermOrdinal -TermsEnum.ord()
We then callBytesRefIterator.next()
then update the PQ to iterate (remember the PQ maintains and order based onTermsEnum.term()
which changes on the next() calls). If the current term exists in some other segment, the top of the queue will contain that segment. If not, the top of the queue will contain a segment with the next term in the index and the global ord will also be incremented.[3] We use some information gathered in the previous step to perform optimizations on memory usage and building time in the following steps, for more detail on those, look at the code.
[4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) -> globalOrd. Using the information we tracked in ordDeltas, we can construct this information relatively easily.
- Parameters:
owner
- For caching purposessubs
- A TermsEnum[], where each index corresponds to a segmentsegmentMap
- Provides two maps, newToOld which lists segments in descending 'weight' order (seeOrdinalMap.SegmentMap
for more details) and a oldToNew map which maps each original segment index to their position in newToOldacceptableOverheadRatio
- Acceptable overhead memory usage for some packed data structures- Throws:
java.io.IOException
- throws IOException
-
-
Method Detail
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, SortedDocValues[] values, float acceptableOverheadRatio) throws java.io.IOException
Create an ordinal map that uses the number of unique values of eachSortedDocValues
instance as a weight.- Throws:
java.io.IOException
- See Also:
build(IndexReader.CacheKey, TermsEnum[], long[], float)
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, SortedSetDocValues[] values, float acceptableOverheadRatio) throws java.io.IOException
Create an ordinal map that uses the number of unique values of eachSortedSetDocValues
instance as a weight.- Throws:
java.io.IOException
- See Also:
build(IndexReader.CacheKey, TermsEnum[], long[], float)
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, TermsEnum[] subs, long[] weights, float acceptableOverheadRatio) throws java.io.IOException
Creates an ordinal map that allows mapping ords to/from a merged space fromsubs
.- Parameters:
owner
- a cache keysubs
- TermsEnums that supportTermsEnum.ord()
. They need not be dense (e.g. can be FilteredTermsEnums}.weights
- a weight for each sub. This is ideally correlated with the number of unique terms that each sub introduces compared to the other subs- Throws:
java.io.IOException
- if an I/O error occurred.
-
getGlobalOrds
public LongValues getGlobalOrds(int segmentIndex)
Given a segment number, return aLongValues
instance that maps segment ordinals to global ordinals.
-
getFirstSegmentOrd
public long getFirstSegmentOrd(long globalOrd)
Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment returngetFirstSegmentNumber(long)
).
-
getFirstSegmentNumber
public int getFirstSegmentNumber(long globalOrd)
Given a global ordinal, returns the index of the first segment that contains this term.
-
getValueCount
public long getValueCount()
Returns the total number of unique terms in global ord space.
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
Accountables
-
-