Class BytesRefHash

  • All Implemented Interfaces:
    Accountable

    public final class BytesRefHash
    extends java.lang.Object
    implements Accountable
    BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

    Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

    • Field Detail

      • BASE_RAM_BYTES

        private static final long BASE_RAM_BYTES
      • bytesStart

        int[] bytesStart
      • hashSize

        private int hashSize
      • hashHalfSize

        private int hashHalfSize
      • hashMask

        private int hashMask
      • count

        private int count
      • lastCount

        private int lastCount
      • ids

        private int[] ids
      • bytesUsed

        private final Counter bytesUsed
    • Method Detail

      • get

        public BytesRef get​(int bytesID,
                            BytesRef ref)
        Populates and returns a BytesRef with the bytes for the given bytesID.

        Note: the given bytesID must be a positive integer less than the current size (size())

        Parameters:
        bytesID - the id
        ref - the BytesRef to populate
        Returns:
        the given BytesRef instance populated with the bytes for the given bytesID
      • compact

        public int[] compact()
        Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit of size() - 1

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      • sort

        public int[] sort()
        Returns the values array sorted by the referenced byte values.

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      • shrink

        private boolean shrink​(int targetSize)
      • clear

        public void clear​(boolean resetPool)
        Clears the BytesRef which maps to the given BytesRef
      • clear

        public void clear()
      • close

        public void close()
        Closes the BytesRefHash and releases all internally used memory
      • find

        public int find​(BytesRef bytes)
        Returns the id of the given BytesRef.
        Parameters:
        bytes - the bytes to look for
        Returns:
        the id of the given bytes, or -1 if there is no mapping for the given bytes.
      • findHash

        private int findHash​(BytesRef bytes)
      • addByPoolOffset

        public int addByPoolOffset​(int offset)
        Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
      • rehash

        private void rehash​(int newSize,
                            boolean hashOnData)
        Called when hash is too small (> 50% occupied) or too large (< 20% occupied).
      • doHash

        static int doHash​(byte[] bytes,
                          int offset,
                          int length)
      • reinit

        public void reinit()
        reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.
      • byteStart

        public int byteStart​(int bytesID)
        Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID
        Parameters:
        bytesID - the id to look up
        Returns:
        the bytesStart offset into the internally used ByteBlockPool for the given id
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable