Class BKDWriter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class BKDWriter
    extends java.lang.Object
    implements java.io.Closeable
    Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <= config.maxPointsInLeafNode. The tree is partially balanced, which means the leaf nodes will have the requested config.maxPointsInLeafNode except one that might have less. Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a cell boundary may be in either cell.

    The number of dimensions can be 1 to 8, but every byte[] value is fixed length.

    This consumes heap during writing: it allocates a Long[numLeaves], a byte[numLeaves*(1+config.bytesPerDim)] and then uses up to the specified maxMBSortInHeap heap space for writing.

    NOTE: This can write at most Integer.MAX_VALUE * config.maxPointsInLeafNode / config.bytesPerDim total points.

    • Field Detail

      • VERSION_LEAF_STORES_BOUNDS

        public static final int VERSION_LEAF_STORES_BOUNDS
        See Also:
        Constant Field Values
      • VERSION_SELECTIVE_INDEXING

        public static final int VERSION_SELECTIVE_INDEXING
        See Also:
        Constant Field Values
      • VERSION_LOW_CARDINALITY_LEAVES

        public static final int VERSION_LOW_CARDINALITY_LEAVES
        See Also:
        Constant Field Values
      • SPLITS_BEFORE_EXACT_BOUNDS

        private static final int SPLITS_BEFORE_EXACT_BOUNDS
        Number of splits before we compute the exact bounding box of an inner node.
        See Also:
        Constant Field Values
      • DEFAULT_MAX_MB_SORT_IN_HEAP

        public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
        Default maximum heap to use, before spilling to (slower) disk
        See Also:
        Constant Field Values
      • config

        protected final BKDConfig config
        BKD tree configuration
      • tempFileNamePrefix

        final java.lang.String tempFileNamePrefix
      • maxMBSortInHeap

        final double maxMBSortInHeap
      • scratchDiff

        final byte[] scratchDiff
      • scratch

        final byte[] scratch
      • scratchBytesRef1

        final BytesRef scratchBytesRef1
      • scratchBytesRef2

        final BytesRef scratchBytesRef2
      • commonPrefixLengths

        final int[] commonPrefixLengths
      • finished

        private boolean finished
      • maxPointsSortInHeap

        private final int maxPointsSortInHeap
      • minPackedValue

        protected final byte[] minPackedValue
        Minimum per-dim values, packed
      • maxPackedValue

        protected final byte[] maxPackedValue
        Maximum per-dim values, packed
      • pointCount

        protected long pointCount
      • totalPointCount

        private final long totalPointCount
        An upper bound on how many points the caller will add (includes deletions)
      • maxDoc

        private final int maxDoc
    • Constructor Detail

      • BKDWriter

        public BKDWriter​(int maxDoc,
                         Directory tempDir,
                         java.lang.String tempFileNamePrefix,
                         BKDConfig config,
                         double maxMBSortInHeap,
                         long totalPointCount)
    • Method Detail

      • verifyParams

        private static void verifyParams​(double maxMBSortInHeap,
                                         long totalPointCount)
      • initPointWriter

        private void initPointWriter()
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • add

        public void add​(byte[] packedValue,
                        int docID)
                 throws java.io.IOException
        Throws:
        java.io.IOException
      • writeField

        public IORunnable writeField​(IndexOutput metaOut,
                                     IndexOutput indexOut,
                                     IndexOutput dataOut,
                                     java.lang.String fieldName,
                                     MutablePointTree reader)
                              throws java.io.IOException
        Write a field from a MutablePointTree. This way of writing points is faster than regular writes with add(byte[], int) since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.
        Throws:
        java.io.IOException
      • computePackedValueBounds

        private void computePackedValueBounds​(MutablePointTree values,
                                              int from,
                                              int to,
                                              byte[] minPackedValue,
                                              byte[] maxPackedValue,
                                              BytesRef scratch)
      • merge

        public IORunnable merge​(IndexOutput metaOut,
                                IndexOutput indexOut,
                                IndexOutput dataOut,
                                java.util.List<MergeState.DocMap> docMaps,
                                java.util.List<PointValues> readers)
                         throws java.io.IOException
        More efficient bulk-add for incoming PointValuess. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.
        Throws:
        java.io.IOException
      • getNumLeftLeafNodes

        private int getNumLeftLeafNodes​(int numLeaves)
      • checkMaxLeafNodeCount

        private void checkMaxLeafNodeCount​(int numLeaves)
      • finish

        public IORunnable finish​(IndexOutput metaOut,
                                 IndexOutput indexOut,
                                 IndexOutput dataOut)
                          throws java.io.IOException
        Writes the BKD tree to the provided IndexOutputs and returns a Runnable that writes the index of the tree if at least one point has been added, or null otherwise.
        Throws:
        java.io.IOException
      • makeWriter

        private IORunnable makeWriter​(IndexOutput metaOut,
                                      IndexOutput indexOut,
                                      byte[] splitDimensionValues,
                                      long[] leafBlockFPs,
                                      long dataStartFP)
      • packIndex

        private byte[] packIndex​(BKDWriter.BKDTreeLeafNodes leafNodes)
                          throws java.io.IOException
        Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.
        Throws:
        java.io.IOException
      • appendBlock

        private int appendBlock​(ByteBuffersDataOutput writeBuffer,
                                java.util.List<byte[]> blocks)
        Appends the current contents of writeBuffer as another block on the growing in-memory file
      • recursePackIndex

        private int recursePackIndex​(ByteBuffersDataOutput writeBuffer,
                                     BKDWriter.BKDTreeLeafNodes leafNodes,
                                     long minBlockFP,
                                     java.util.List<byte[]> blocks,
                                     byte[] lastSplitValues,
                                     boolean[] negativeDeltas,
                                     boolean isLeft,
                                     int leavesOffset,
                                     int numLeaves)
                              throws java.io.IOException
        lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner node
        Throws:
        java.io.IOException
      • writeIndex

        private void writeIndex​(IndexOutput metaOut,
                                IndexOutput indexOut,
                                int countPerLeaf,
                                int numLeaves,
                                byte[] packedIndex,
                                long dataStartFP)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLeafBlockDocs

        private void writeLeafBlockDocs​(DataOutput out,
                                        int[] docIDs,
                                        int start,
                                        int count)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLeafBlockPackedValues

        private void writeLeafBlockPackedValues​(DataOutput out,
                                                int[] commonPrefixLengths,
                                                int count,
                                                int sortedDim,
                                                java.util.function.IntFunction<BytesRef> packedValues,
                                                int leafCardinality)
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • writeLowCardinalityLeafBlockPackedValues

        private void writeLowCardinalityLeafBlockPackedValues​(DataOutput out,
                                                              int[] commonPrefixLengths,
                                                              int count,
                                                              java.util.function.IntFunction<BytesRef> packedValues)
                                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • writeHighCardinalityLeafBlockPackedValues

        private void writeHighCardinalityLeafBlockPackedValues​(DataOutput out,
                                                               int[] commonPrefixLengths,
                                                               int count,
                                                               int sortedDim,
                                                               java.util.function.IntFunction<BytesRef> packedValues,
                                                               int compressedByteOffset)
                                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • writeActualBounds

        private void writeActualBounds​(DataOutput out,
                                       int[] commonPrefixLengths,
                                       int count,
                                       java.util.function.IntFunction<BytesRef> packedValues)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • computeMinMax

        private static BytesRef[] computeMinMax​(int count,
                                                java.util.function.IntFunction<BytesRef> packedValues,
                                                int offset,
                                                int length)
        Return an array that contains the min and max values for the [offset, offset+length] interval of the given BytesRefs.
      • writeLeafBlockPackedValuesRange

        private void writeLeafBlockPackedValuesRange​(DataOutput out,
                                                     int[] commonPrefixLengths,
                                                     int start,
                                                     int end,
                                                     java.util.function.IntFunction<BytesRef> packedValues)
                                              throws java.io.IOException
        Throws:
        java.io.IOException
      • runLen

        private static int runLen​(java.util.function.IntFunction<BytesRef> packedValues,
                                  int start,
                                  int end,
                                  int byteOffset)
      • writeCommonPrefixes

        private void writeCommonPrefixes​(DataOutput out,
                                         int[] commonPrefixes,
                                         byte[] packedValue)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • verifyChecksum

        private java.lang.Error verifyChecksum​(java.lang.Throwable priorException,
                                               PointWriter writer)
                                        throws java.io.IOException
        Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
        Throws:
        java.io.IOException
      • split

        protected int split​(byte[] minPackedValue,
                            byte[] maxPackedValue,
                            int[] parentSplits)
        Pick the next dimension to split.
        Parameters:
        minPackedValue - the min values for all dimensions
        maxPackedValue - the max values for all dimensions
        parentSplits - how many times each dim has been split on the parent levels
        Returns:
        the dimension to split
      • switchToHeap

        private HeapPointWriter switchToHeap​(PointWriter source)
                                      throws java.io.IOException
        Pull a partition back into heap once the point count is low enough while recursing.
        Throws:
        java.io.IOException
      • build

        private void build​(int leavesOffset,
                           int numLeaves,
                           MutablePointTree reader,
                           int from,
                           int to,
                           IndexOutput out,
                           byte[] minPackedValue,
                           byte[] maxPackedValue,
                           int[] parentSplits,
                           byte[] splitPackedValues,
                           byte[] splitDimensionValues,
                           long[] leafBlockFPs,
                           int[] spareDocIds)
                    throws java.io.IOException
        Throws:
        java.io.IOException
      • computePackedValueBounds

        private void computePackedValueBounds​(BKDRadixSelector.PathSlice slice,
                                              byte[] minPackedValue,
                                              byte[] maxPackedValue)
                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • build

        private void build​(int leavesOffset,
                           int numLeaves,
                           BKDRadixSelector.PathSlice points,
                           IndexOutput out,
                           BKDRadixSelector radixSelector,
                           byte[] minPackedValue,
                           byte[] maxPackedValue,
                           int[] parentSplits,
                           byte[] splitPackedValues,
                           byte[] splitDimensionValues,
                           long[] leafBlockFPs,
                           int[] spareDocIds)
                    throws java.io.IOException
        The point writer contains the data that is going to be splitted using radix selection. /* This method is used when we are merging previously written segments, in the numDims > 1 case.
        Throws:
        java.io.IOException
      • computeCommonPrefixLength

        private void computeCommonPrefixLength​(HeapPointWriter heapPointWriter,
                                               byte[] commonPrefix,
                                               int from,
                                               int to)
      • valuesInOrderAndBounds

        private static boolean valuesInOrderAndBounds​(BKDConfig config,
                                                      int count,
                                                      int sortedDim,
                                                      byte[] minPackedValue,
                                                      byte[] maxPackedValue,
                                                      java.util.function.IntFunction<BytesRef> values,
                                                      int[] docs,
                                                      int docsOffset)
      • valueInOrder

        private static boolean valueInOrder​(BKDConfig config,
                                            long ord,
                                            int sortedDim,
                                            byte[] lastPackedValue,
                                            byte[] packedValue,
                                            int packedValueOffset,
                                            int doc,
                                            int lastDoc)
      • valueInBounds

        private static boolean valueInBounds​(BKDConfig config,
                                             BytesRef packedValue,
                                             byte[] minPackedValue,
                                             byte[] maxPackedValue)