Class OfflineSorter


  • public class OfflineSorter
    extends java.lang.Object
    On-disk sorting of byte arrays. Each byte array (entry) is composed of the following fields:
    • (two bytes) length of the following byte array,
    • exactly the above count of bytes for the sequence to be sorted.
    See Also:
    sort(String)
    • Field Detail

      • MIN_BUFFER_SIZE_MB

        public static final long MIN_BUFFER_SIZE_MB
        Minimum recommended buffer size for sorting.
        See Also:
        Constant Field Values
      • ABSOLUTE_MIN_SORT_BUFFER_SIZE

        public static final long ABSOLUTE_MIN_SORT_BUFFER_SIZE
        Absolute minimum required buffer size for sorting.
        See Also:
        Constant Field Values
      • MIN_BUFFER_SIZE_MSG

        private static final java.lang.String MIN_BUFFER_SIZE_MSG
        See Also:
        Constant Field Values
      • MAX_TEMPFILES

        public static final int MAX_TEMPFILES
        Maximum number of temporary files before doing an intermediate merge.
        See Also:
        Constant Field Values
      • valueLength

        private final int valueLength
      • tempFileNamePrefix

        private final java.lang.String tempFileNamePrefix
      • exec

        private final java.util.concurrent.ExecutorService exec
      • partitionsInRAM

        private final java.util.concurrent.Semaphore partitionsInRAM
      • maxTempFiles

        private final int maxTempFiles
      • comparator

        private final java.util.Comparator<BytesRef> comparator
      • DEFAULT_COMPARATOR

        public static final java.util.Comparator<BytesRef> DEFAULT_COMPARATOR
        Default comparator: sorts in binary (codepoint) order
    • Constructor Detail

      • OfflineSorter

        public OfflineSorter​(Directory dir,
                             java.lang.String tempFileNamePrefix,
                             java.util.Comparator<BytesRef> comparator)
                      throws java.io.IOException
        Defaults constructor with a custom comparator.
        Throws:
        java.io.IOException
        See Also:
        OfflineSorter.BufferSize.automatic()
      • OfflineSorter

        public OfflineSorter​(Directory dir,
                             java.lang.String tempFileNamePrefix,
                             java.util.Comparator<BytesRef> comparator,
                             OfflineSorter.BufferSize ramBufferSize,
                             int maxTempfiles,
                             int valueLength,
                             java.util.concurrent.ExecutorService exec,
                             int maxPartitionsInRAM)
        All-details constructor. If valueLength is -1 (the default), the length of each value differs; otherwise, all values have the specified length. If you pass a non-null ExecutorService then it will be used to run sorting operations that can be run concurrently, and maxPartitionsInRAM is the maximum concurrent in-memory partitions. Thus the maximum possible RAM used by this class while sorting is maxPartitionsInRAM * ramBufferSize.
    • Method Detail

      • getDirectory

        public Directory getDirectory()
        Returns the Directory we use to create temp files.
      • sort

        public java.lang.String sort​(java.lang.String inputFileName)
                              throws java.io.IOException
        Sort input to a new temp file, returning its name.
        Throws:
        java.io.IOException
      • verifyChecksum

        private void verifyChecksum​(java.lang.Throwable priorException,
                                    OfflineSorter.ByteSequencesReader reader)
                             throws java.io.IOException
        Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
        Throws:
        java.io.IOException
      • mergePartitions

        void mergePartitions​(Directory trackingDir,
                             java.util.List<java.util.concurrent.Future<OfflineSorter.Partition>> segments)
                      throws java.io.IOException
        Merge the most recent maxTempFile partitions into a new partition.
        Throws:
        java.io.IOException
      • readPartition

        OfflineSorter.Partition readPartition​(OfflineSorter.ByteSequencesReader reader)
                                       throws java.io.IOException,
                                              java.lang.InterruptedException
        Read in a single partition of data, setting isExhausted[0] to true if there are no more items.
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • getWriter

        protected OfflineSorter.ByteSequencesWriter getWriter​(IndexOutput out,
                                                              long itemCount)
                                                       throws java.io.IOException
        Subclasses can override to change how byte sequences are written to disk.
        Throws:
        java.io.IOException
      • getComparator

        public java.util.Comparator<BytesRef> getComparator()
        Returns the comparator in use to sort entries