Class OrdinalMap

  • All Implemented Interfaces:
    Accountable

    public class OrdinalMap
    extends java.lang.Object
    implements Accountable
    Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints representation.

    NOTE: this is a costly operation, as it must merge sort all terms, and may require non-trivial RAM once done. It's better to operate in segment-private ordinal space instead when possible.

    • Field Detail

      • BASE_RAM_BYTES_USED

        private static final long BASE_RAM_BYTES_USED
      • valueCount

        final long valueCount
      • globalOrdDeltas

        final LongValues globalOrdDeltas
      • segmentToGlobalOrds

        final LongValues[] segmentToGlobalOrds
      • ramBytesUsed

        final long ramBytesUsed
    • Constructor Detail

      • OrdinalMap

        OrdinalMap​(IndexReader.CacheKey owner,
                   TermsEnum[] subs,
                   OrdinalMap.SegmentMap segmentMap,
                   float acceptableOverheadRatio)
            throws java.io.IOException
        Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume we have the following global mapping for a doc values field:
        bar -> 0, cat -> 1, dog -> 2, foo -> 3
        And our index is split into 2 segments with the following local mappings for that same doc values field:
        Segment 0: bar -> 0, foo -> 1
        Segment 1: cat -> 0, dog -> 1
        We will then encode delta between the local and global mapping in a packed 2d array keyed by (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap:
        [[0, 2], [1, 1]]

        The general algorithm for creating an OrdinalMap (skipping over some implementation details and optimizations) is as follows:

        [1] Create and populate a PQ with (TermsEnum, index) tuples where index is the position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself will be ordered by TermsEnum.term()

        [2] We will iterate through every term in the index now. In order to do so, we will start with the first term at the top of the PQ . We keep track of a global ord, and track the difference between the global ord and TermsEnum.ord() in ordDeltas, which maps:
        (segmentIndex, TermsEnum.ord()) -> globalTermOrdinal - TermsEnum.ord()
        We then call BytesRefIterator.next() then update the PQ to iterate (remember the PQ maintains and order based on TermsEnum.term() which changes on the next() calls). If the current term exists in some other segment, the top of the queue will contain that segment. If not, the top of the queue will contain a segment with the next term in the index and the global ord will also be incremented.

        [3] We use some information gathered in the previous step to perform optimizations on memory usage and building time in the following steps, for more detail on those, look at the code.

        [4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) -> globalOrd. Using the information we tracked in ordDeltas, we can construct this information relatively easily.

        Parameters:
        owner - For caching purposes
        subs - A TermsEnum[], where each index corresponds to a segment
        segmentMap - Provides two maps, newToOld which lists segments in descending 'weight' order (see OrdinalMap.SegmentMap for more details) and a oldToNew map which maps each original segment index to their position in newToOld
        acceptableOverheadRatio - Acceptable overhead memory usage for some packed data structures
        Throws:
        java.io.IOException - throws IOException
    • Method Detail

      • build

        public static OrdinalMap build​(IndexReader.CacheKey owner,
                                       TermsEnum[] subs,
                                       long[] weights,
                                       float acceptableOverheadRatio)
                                throws java.io.IOException
        Creates an ordinal map that allows mapping ords to/from a merged space from subs.
        Parameters:
        owner - a cache key
        subs - TermsEnums that support TermsEnum.ord(). They need not be dense (e.g. can be FilteredTermsEnums}.
        weights - a weight for each sub. This is ideally correlated with the number of unique terms that each sub introduces compared to the other subs
        Throws:
        java.io.IOException - if an I/O error occurred.
      • getGlobalOrds

        public LongValues getGlobalOrds​(int segmentIndex)
        Given a segment number, return a LongValues instance that maps segment ordinals to global ordinals.
      • getFirstSegmentOrd

        public long getFirstSegmentOrd​(long globalOrd)
        Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment return getFirstSegmentNumber(long)).
      • getFirstSegmentNumber

        public int getFirstSegmentNumber​(long globalOrd)
        Given a global ordinal, returns the index of the first segment that contains this term.
      • getValueCount

        public long getValueCount()
        Returns the total number of unique terms in global ord space.
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable
      • getChildResources

        public java.util.Collection<Accountable> getChildResources()
        Description copied from interface: Accountable
        Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).
        Specified by:
        getChildResources in interface Accountable
        See Also:
        Accountables