- java.lang.Object
-
- org.apache.lucene.util.hnsw.HnswGraphBuilder
-
- All Implemented Interfaces:
HnswBuilder
- Direct Known Subclasses:
HnswConcurrentMergeBuilder.ConcurrentMergeWorker
,InitializedHnswGraphBuilder
public class HnswGraphBuilder extends java.lang.Object implements HnswBuilder
Builder for HNSW graph. SeeHnswGraph
for a gloss on the algorithm and the meaning of the hyper-parameters.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HnswGraphBuilder.GraphBuilderKnnCollector
A restricted, specialized knnCollector that can be used when building a graph.
-
Field Summary
Fields Modifier and Type Field Description private HnswGraphBuilder.GraphBuilderKnnCollector
beamCandidates
static int
DEFAULT_BEAM_WIDTH
Default number of the size of the queue maintained while searching during a graph construction.static int
DEFAULT_MAX_CONN
Default number of maximum connections per nodeprivate static long
DEFAULT_RAND_SEED
Default random seed for level generation *private HnswGraphBuilder.GraphBuilderKnnCollector
entryCandidates
private HnswGraphSearcher
graphSearcher
protected OnHeapHnswGraph
hnsw
static java.lang.String
HNSW_COMPONENT
A name for the HNSW component for the info-stream *private InfoStream
infoStream
private int
M
private double
ml
private java.util.SplittableRandom
random
static long
randSeed
Random seed for level generation; public to expose for testing *private RandomVectorScorerSupplier
scorerSupplier
-
Constructor Summary
Constructors Modifier Constructor Description protected
HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, int graphSize)
Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.protected
HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, OnHeapHnswGraph hnsw)
protected
HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, OnHeapHnswGraph hnsw, HnswGraphSearcher graphSearcher)
Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addDiverseNeighbors(int level, int node, NeighborArray candidates)
void
addGraphNode(int node)
Inserts a doc with vector value to the graphprivate void
addVectors(int maxOrd)
protected void
addVectors(int minOrd, int maxOrd)
add vectors in range [minOrd, maxOrd)OnHeapHnswGraph
build(int maxOrd)
Adds all nodes to the graph up to the providedmaxOrd
.static HnswGraphBuilder
create(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed)
static HnswGraphBuilder
create(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, int graphSize)
private boolean
diversityCheck(int candidate, float score, NeighborArray neighbors)
OnHeapHnswGraph
getGraph()
private static int
getRandomGraphLevel(double ml, java.util.SplittableRandom random)
private static void
popToScratch(HnswGraphBuilder.GraphBuilderKnnCollector candidates, NeighborArray scratch)
private long
printGraphBuildStatus(int node, long start, long t)
private boolean[]
selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel)
This method will select neighbors to add and return a mask telling the caller which candidates are selectedvoid
setInfoStream(InfoStream infoStream)
Set info-stream to output debugging information
-
-
-
Field Detail
-
DEFAULT_MAX_CONN
public static final int DEFAULT_MAX_CONN
Default number of maximum connections per node- See Also:
- Constant Field Values
-
DEFAULT_BEAM_WIDTH
public static final int DEFAULT_BEAM_WIDTH
Default number of the size of the queue maintained while searching during a graph construction.- See Also:
- Constant Field Values
-
DEFAULT_RAND_SEED
private static final long DEFAULT_RAND_SEED
Default random seed for level generation *- See Also:
- Constant Field Values
-
HNSW_COMPONENT
public static final java.lang.String HNSW_COMPONENT
A name for the HNSW component for the info-stream *- See Also:
- Constant Field Values
-
randSeed
public static long randSeed
Random seed for level generation; public to expose for testing *
-
M
private final int M
-
ml
private final double ml
-
random
private final java.util.SplittableRandom random
-
scorerSupplier
private final RandomVectorScorerSupplier scorerSupplier
-
graphSearcher
private final HnswGraphSearcher graphSearcher
-
entryCandidates
private final HnswGraphBuilder.GraphBuilderKnnCollector entryCandidates
-
beamCandidates
private final HnswGraphBuilder.GraphBuilderKnnCollector beamCandidates
-
hnsw
protected final OnHeapHnswGraph hnsw
-
infoStream
private InfoStream infoStream
-
-
Constructor Detail
-
HnswGraphBuilder
protected HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, int graphSize) throws java.io.IOException
Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
scorerSupplier
- a supplier to create vector scorer from ordinals.M
- – graph fanout parameter used to calculate the maximum number of connections a node can have – M on upper layers, and M * 2 on the lowest level.beamWidth
- the size of the beam search to use when finding nearest neighbors.seed
- the seed for a random number generator used during graph construction. Provide this to ensure repeatable construction.graphSize
- size of graph, if unknown, pass in -1- Throws:
java.io.IOException
-
HnswGraphBuilder
protected HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, OnHeapHnswGraph hnsw) throws java.io.IOException
- Throws:
java.io.IOException
-
HnswGraphBuilder
protected HnswGraphBuilder(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, OnHeapHnswGraph hnsw, HnswGraphSearcher graphSearcher) throws java.io.IOException
Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
scorerSupplier
- a supplier to create vector scorer from ordinals.M
- – graph fanout parameter used to calculate the maximum number of connections a node can have – M on upper layers, and M * 2 on the lowest level.beamWidth
- the size of the beam search to use when finding nearest neighbors.seed
- the seed for a random number generator used during graph construction. Provide this to ensure repeatable construction.hnsw
- the graph to build, can be previously initialized- Throws:
java.io.IOException
-
-
Method Detail
-
create
public static HnswGraphBuilder create(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed) throws java.io.IOException
- Throws:
java.io.IOException
-
create
public static HnswGraphBuilder create(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, int graphSize) throws java.io.IOException
- Throws:
java.io.IOException
-
build
public OnHeapHnswGraph build(int maxOrd) throws java.io.IOException
Description copied from interface:HnswBuilder
Adds all nodes to the graph up to the providedmaxOrd
.- Specified by:
build
in interfaceHnswBuilder
- Parameters:
maxOrd
- The maximum ordinal (excluded) of the nodes to be added.- Throws:
java.io.IOException
-
setInfoStream
public void setInfoStream(InfoStream infoStream)
Description copied from interface:HnswBuilder
Set info-stream to output debugging information- Specified by:
setInfoStream
in interfaceHnswBuilder
-
getGraph
public OnHeapHnswGraph getGraph()
- Specified by:
getGraph
in interfaceHnswBuilder
-
addVectors
protected void addVectors(int minOrd, int maxOrd) throws java.io.IOException
add vectors in range [minOrd, maxOrd)- Throws:
java.io.IOException
-
addVectors
private void addVectors(int maxOrd) throws java.io.IOException
- Throws:
java.io.IOException
-
addGraphNode
public void addGraphNode(int node) throws java.io.IOException
Description copied from interface:HnswBuilder
Inserts a doc with vector value to the graph- Specified by:
addGraphNode
in interfaceHnswBuilder
- Throws:
java.io.IOException
-
printGraphBuildStatus
private long printGraphBuildStatus(int node, long start, long t)
-
addDiverseNeighbors
private void addDiverseNeighbors(int level, int node, NeighborArray candidates) throws java.io.IOException
- Throws:
java.io.IOException
-
selectAndLinkDiverse
private boolean[] selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) throws java.io.IOException
This method will select neighbors to add and return a mask telling the caller which candidates are selected- Throws:
java.io.IOException
-
popToScratch
private static void popToScratch(HnswGraphBuilder.GraphBuilderKnnCollector candidates, NeighborArray scratch)
-
diversityCheck
private boolean diversityCheck(int candidate, float score, NeighborArray neighbors) throws java.io.IOException
- Parameters:
candidate
- the vector of a new candidate neighbor of a node nscore
- the score of the new candidate and node n, to be compared with scores of the candidate and n's neighborsneighbors
- the neighbors selected so far- Returns:
- whether the candidate is diverse given the existing neighbors
- Throws:
java.io.IOException
-
getRandomGraphLevel
private static int getRandomGraphLevel(double ml, java.util.SplittableRandom random)
-
-