Class UnifiedHighlighter
- java.lang.Object
-
- org.apache.lucene.search.uhighlight.UnifiedHighlighter
-
public class UnifiedHighlighter extends java.lang.Object
A Highlighter that can get offsets from either postings (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
), term vectors (FieldType.setStoreTermVectorOffsets(boolean)
), or via re-analyzing text.This highlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a
BreakIterator
to find passages in the text; by default it breaks usinggetSentenceInstance(Locale.ROOT)
. It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into aPassage
, and then scores each Passage using a separatePassageScorer
. Passages are finally formatted into highlighted snippets with aPassageFormatter
.You can customize the behavior by calling some of the setters, or by subclassing and overriding some methods. Some important hooks:
getBreakIterator(String)
: Customize how the text is divided into passages.getScorer(String)
: Customize how passages are ranked.getFormatter(String)
: Customize how snippets are formatted.getPassageSortComparator(String)
: Customize how snippets are formatted.
This is thread-safe, notwithstanding the setters.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UnifiedHighlighter.Builder
Builder for UnifiedHighlighter.static class
UnifiedHighlighter.HighlightFlag
Flags for controlling highlighting behavior.protected static class
UnifiedHighlighter.LimitedStoredFieldVisitor
Fetches stored fields for highlighting.static class
UnifiedHighlighter.OffsetSource
Source of term offsets; essential for highlighting.private static class
UnifiedHighlighter.TermVectorReusingLeafReader
Wraps an IndexReader that remembers/caches the last call toTermVectors.get(int)
so that if the next call has the same ID, then it is reused.
-
Field Summary
Fields Modifier and Type Field Description private java.util.function.Supplier<java.text.BreakIterator>
breakIterator
private int
cacheFieldValCharsThreshold
private static java.util.function.Supplier<java.text.BreakIterator>
DEFAULT_BREAK_ITERATOR
static int
DEFAULT_CACHE_CHARS_THRESHOLD
private static boolean
DEFAULT_ENABLE_HIGHLIGHT_PHRASES_STRICTLY
private static boolean
DEFAULT_ENABLE_MULTI_TERM_QUERY
private static boolean
DEFAULT_ENABLE_RELEVANCY_OVER_SPEED
private static boolean
DEFAULT_ENABLE_WEIGHT_MATCHES
private static int
DEFAULT_MAX_HIGHLIGHT_PASSAGES
static int
DEFAULT_MAX_LENGTH
private static PassageFormatter
DEFAULT_PASSAGE_FORMATTER
private static PassageScorer
DEFAULT_PASSAGE_SCORER
private static java.util.Comparator<Passage>
DEFAULT_PASSAGE_SORT_COMPARATOR
protected FieldInfos
fieldInfos
private java.util.function.Predicate<java.lang.String>
fieldMatcher
private java.util.Set<UnifiedHighlighter.HighlightFlag>
flags
private PassageFormatter
formatter
private boolean
handleMultiTermQuery
private boolean
highlightPhrasesStrictly
protected Analyzer
indexAnalyzer
private java.util.function.Function<java.lang.String,java.util.Set<java.lang.String>>
maskedFieldsFunc
private int
maxLength
private int
maxNoHighlightPassages
protected static char
MULTIVAL_SEP_CHAR
private boolean
passageRelevancyOverSpeed
private java.util.Comparator<Passage>
passageSortComparator
private PassageScorer
scorer
protected IndexSearcher
searcher
private boolean
weightMatches
protected static LabelledCharArrayMatcher[]
ZERO_LEN_AUTOMATA_ARRAY
-
Constructor Summary
Constructors Constructor Description UnifiedHighlighter(IndexSearcher indexSearcher, Analyzer indexAnalyzer)
Deprecated.UnifiedHighlighter(UnifiedHighlighter.Builder builder)
Constructs the highlighter with the givenUnifiedHighlighter.Builder
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description private DocIdSetIterator
asDocIdSetIterator(int[] sortedDocIds)
static UnifiedHighlighter.Builder
builder(IndexSearcher searcher, Analyzer indexAnalyzer)
static UnifiedHighlighter.Builder
builderWithoutSearcher(Analyzer indexAnalyzer)
Creates aUnifiedHighlighter.Builder
object in which you can only usehighlightWithoutSearcher(String, Query, String, int)
for highlighting.private int
calculateOptimalCacheCharsThreshold(int numTermVectors, int numPostings)
When cacheCharsThreshold is 0, loadFieldValues() only fetches one document at a time.private void
copyAndSortDocIdsWithIndex(int[] docIdsIn, int[] docIds, int[] docInIndexes)
private void
copyAndSortFieldsWithMaxPassages(java.lang.String[] fieldsIn, int[] maxPassagesIn, java.lang.String[] fields, int[] maxPassages)
protected java.util.Set<UnifiedHighlighter.HighlightFlag>
evaluateFlags(boolean shouldHandleMultiTermQuery, boolean shouldHighlightPhrasesStrictly, boolean shouldPassageRelevancyOverSpeed, boolean shouldEnableWeightMatches)
This method returns the set of ofUnifiedHighlighter.HighlightFlag
s, which will be applied to the UH object.protected java.util.Set<UnifiedHighlighter.HighlightFlag>
evaluateFlags(UnifiedHighlighter uh)
Deprecated.protected java.util.Set<UnifiedHighlighter.HighlightFlag>
evaluateFlags(UnifiedHighlighter.Builder uhBuilder)
Evaluate the highlight flags and set theflags
variable.protected static java.util.Set<Term>
extractTerms(Query query)
Extracts matching termsprotected static BytesRef[]
filterExtractedTerms(java.util.function.Predicate<java.lang.String> fieldMatcher, java.util.Set<Term> queryTerms)
protected LabelledCharArrayMatcher[]
getAutomata(java.lang.String field, Query query, java.util.Set<UnifiedHighlighter.HighlightFlag> highlightFlags)
protected java.text.BreakIterator
getBreakIterator(java.lang.String field)
Returns theBreakIterator
to use for dividing text into passages.int
getCacheFieldValCharsThreshold()
Limits the amount of field value pre-fetching until this threshold is passed.protected FieldHighlighter
getFieldHighlighter(java.lang.String field, Query query, java.util.Set<Term> allTerms, int maxPassages)
protected FieldInfo
getFieldInfo(java.lang.String field)
Called by the default implementation ofgetOffsetSource(String)
.protected java.util.function.Predicate<java.lang.String>
getFieldMatcher(java.lang.String field)
Returns the predicate to use for extracting the query part that must be highlighted.protected java.util.Set<UnifiedHighlighter.HighlightFlag>
getFlags(java.lang.String field)
Returns theUnifiedHighlighter.HighlightFlag
s applicable for the current UH instance.protected PassageFormatter
getFormatter(java.lang.String field)
Returns thePassageFormatter
to use for formatting passages into highlighted snippets.protected UHComponents
getHighlightComponents(java.lang.String field, Query query, java.util.Set<Term> allTerms)
Analyzer
getIndexAnalyzer()
...IndexSearcher
getIndexSearcher()
...protected java.util.Set<java.lang.String>
getMaskedFields(java.lang.String field)
int
getMaxLength()
The maximum content size to process.protected int
getMaxNoHighlightPassages(java.lang.String field)
Returns the number of leading passages (as delineated by theBreakIterator
) when no highlights could be found.protected UnifiedHighlighter.OffsetSource
getOffsetSource(java.lang.String field)
Determine the offset source for the specified field.protected FieldOffsetStrategy
getOffsetStrategy(UnifiedHighlighter.OffsetSource offsetSource, UHComponents components)
protected UnifiedHighlighter.OffsetSource
getOptimizedOffsetSource(UHComponents components)
protected java.util.Comparator<Passage>
getPassageSortComparator(java.lang.String field)
Returns theComparator
to use for finally sorting passages.protected PhraseHelper
getPhraseHelper(java.lang.String field, Query query, java.util.Set<UnifiedHighlighter.HighlightFlag> highlightFlags)
protected PassageScorer
getScorer(java.lang.String field)
Returns thePassageScorer
to use for ranking passages.protected boolean
hasUnrecognizedQuery(java.util.function.Predicate<java.lang.String> fieldMatcher, Query query)
java.lang.String[]
highlight(java.lang.String field, Query query, TopDocs topDocs)
Highlights the top passages from a single field.java.lang.String[]
highlight(java.lang.String field, Query query, TopDocs topDocs, int maxPassages)
Highlights the top-N passages from a single field.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fieldsIn, Query query, int[] docidsIn, int[] maxPassagesIn)
Highlights the top-N passages from multiple fields, for the provided int[] docids.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fields, Query query, TopDocs topDocs)
Highlights the top passages from multiple fields.java.util.Map<java.lang.String,java.lang.String[]>
highlightFields(java.lang.String[] fields, Query query, TopDocs topDocs, int[] maxPassages)
Highlights the top-N passages from multiple fields.protected java.util.Map<java.lang.String,java.lang.Object[]>
highlightFieldsAsObjects(java.lang.String[] fieldsIn, Query query, int[] docIdsIn, int[] maxPassagesIn)
Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom Object as returned by thePassageFormatter
.java.lang.Object
highlightWithoutSearcher(java.lang.String field, Query query, java.lang.String content, int maxPassages)
Highlights text passed as a parameter.protected java.util.List<java.lang.CharSequence[]>
loadFieldValues(java.lang.String[] fields, DocIdSetIterator docIter, int cacheCharsThreshold)
Loads the String values for each docId by field to be highlighted.protected FieldHighlighter
newFieldHighlighter(java.lang.String field, FieldOffsetStrategy fieldOffsetStrategy, java.text.BreakIterator breakIterator, PassageScorer passageScorer, int maxPassages, int maxNoHighlightPassages, PassageFormatter passageFormatter, java.util.Comparator<Passage> passageSortComparator)
protected UnifiedHighlighter.LimitedStoredFieldVisitor
newLimitedStoredFieldsVisitor(java.lang.String[] fields)
protected java.util.Collection<Query>
preSpanQueryRewrite(Query query)
When highlighting phrases accurately, we may need to handle custom queries that aren't supported in theWeightedSpanTermExtractor
as called by thePhraseHelper
.protected java.lang.Boolean
requiresRewrite(SpanQuery spanQuery)
When highlighting phrases accurately, we need to know whichSpanQuery
's need to haveQuery.rewrite(IndexSearcher)
called on them.void
setBreakIterator(java.util.function.Supplier<java.text.BreakIterator> breakIterator)
Deprecated.void
setCacheFieldValCharsThreshold(int cacheFieldValCharsThreshold)
Deprecated.void
setFieldMatcher(java.util.function.Predicate<java.lang.String> predicate)
Deprecated.void
setFormatter(PassageFormatter formatter)
Deprecated.void
setHandleMultiTermQuery(boolean handleMtq)
Deprecated.void
setHighlightPhrasesStrictly(boolean highlightPhrasesStrictly)
Deprecated.void
setMaxLength(int maxLength)
Deprecated.void
setMaxNoHighlightPassages(int defaultMaxNoHighlightPassages)
Deprecated.void
setPassageRelevancyOverSpeed(boolean passageRelevancyOverSpeed)
Deprecated.void
setScorer(PassageScorer scorer)
Deprecated.void
setWeightMatches(boolean weightMatches)
Deprecated.protected boolean
shouldHandleMultiTermQuery(java.lang.String field)
Deprecated.protected boolean
shouldHighlightPhrasesStrictly(java.lang.String field)
Deprecated.protected boolean
shouldPreferPassageRelevancyOverSpeed(java.lang.String field)
Deprecated.
-
-
-
Field Detail
-
MULTIVAL_SEP_CHAR
protected static final char MULTIVAL_SEP_CHAR
- See Also:
- Constant Field Values
-
DEFAULT_MAX_LENGTH
public static final int DEFAULT_MAX_LENGTH
- See Also:
- Constant Field Values
-
DEFAULT_CACHE_CHARS_THRESHOLD
public static final int DEFAULT_CACHE_CHARS_THRESHOLD
- See Also:
- Constant Field Values
-
ZERO_LEN_AUTOMATA_ARRAY
protected static final LabelledCharArrayMatcher[] ZERO_LEN_AUTOMATA_ARRAY
-
DEFAULT_ENABLE_MULTI_TERM_QUERY
private static final boolean DEFAULT_ENABLE_MULTI_TERM_QUERY
- See Also:
- Constant Field Values
-
DEFAULT_ENABLE_HIGHLIGHT_PHRASES_STRICTLY
private static final boolean DEFAULT_ENABLE_HIGHLIGHT_PHRASES_STRICTLY
- See Also:
- Constant Field Values
-
DEFAULT_ENABLE_WEIGHT_MATCHES
private static final boolean DEFAULT_ENABLE_WEIGHT_MATCHES
- See Also:
- Constant Field Values
-
DEFAULT_ENABLE_RELEVANCY_OVER_SPEED
private static final boolean DEFAULT_ENABLE_RELEVANCY_OVER_SPEED
- See Also:
- Constant Field Values
-
DEFAULT_BREAK_ITERATOR
private static final java.util.function.Supplier<java.text.BreakIterator> DEFAULT_BREAK_ITERATOR
-
DEFAULT_PASSAGE_SCORER
private static final PassageScorer DEFAULT_PASSAGE_SCORER
-
DEFAULT_PASSAGE_FORMATTER
private static final PassageFormatter DEFAULT_PASSAGE_FORMATTER
-
DEFAULT_MAX_HIGHLIGHT_PASSAGES
private static final int DEFAULT_MAX_HIGHLIGHT_PASSAGES
- See Also:
- Constant Field Values
-
DEFAULT_PASSAGE_SORT_COMPARATOR
private static final java.util.Comparator<Passage> DEFAULT_PASSAGE_SORT_COMPARATOR
-
searcher
protected final IndexSearcher searcher
-
indexAnalyzer
protected final Analyzer indexAnalyzer
-
fieldInfos
protected volatile FieldInfos fieldInfos
-
fieldMatcher
private java.util.function.Predicate<java.lang.String> fieldMatcher
-
maskedFieldsFunc
private final java.util.function.Function<java.lang.String,java.util.Set<java.lang.String>> maskedFieldsFunc
-
flags
private java.util.Set<UnifiedHighlighter.HighlightFlag> flags
-
handleMultiTermQuery
private boolean handleMultiTermQuery
-
highlightPhrasesStrictly
private boolean highlightPhrasesStrictly
-
weightMatches
private boolean weightMatches
-
passageRelevancyOverSpeed
private boolean passageRelevancyOverSpeed
-
maxLength
private int maxLength
-
breakIterator
private java.util.function.Supplier<java.text.BreakIterator> breakIterator
-
scorer
private PassageScorer scorer
-
formatter
private PassageFormatter formatter
-
maxNoHighlightPassages
private int maxNoHighlightPassages
-
cacheFieldValCharsThreshold
private int cacheFieldValCharsThreshold
-
passageSortComparator
private java.util.Comparator<Passage> passageSortComparator
-
-
Constructor Detail
-
UnifiedHighlighter
@Deprecated public UnifiedHighlighter(IndexSearcher indexSearcher, Analyzer indexAnalyzer)
Deprecated.Constructs the highlighter with the given index searcher and analyzer.- Parameters:
indexSearcher
- Usually required, unlesshighlightWithoutSearcher(String, Query, String, int)
is used, in which case this needs to be null.indexAnalyzer
- Required, even if in some circumstances it isn't used.
-
UnifiedHighlighter
public UnifiedHighlighter(UnifiedHighlighter.Builder builder)
Constructs the highlighter with the givenUnifiedHighlighter.Builder
.- Parameters:
builder
- - aUnifiedHighlighter.Builder
object.
-
-
Method Detail
-
setHandleMultiTermQuery
@Deprecated public void setHandleMultiTermQuery(boolean handleMtq)
Deprecated.
-
setHighlightPhrasesStrictly
@Deprecated public void setHighlightPhrasesStrictly(boolean highlightPhrasesStrictly)
Deprecated.
-
setPassageRelevancyOverSpeed
@Deprecated public void setPassageRelevancyOverSpeed(boolean passageRelevancyOverSpeed)
Deprecated.
-
setMaxLength
@Deprecated public void setMaxLength(int maxLength)
Deprecated.
-
setBreakIterator
@Deprecated public void setBreakIterator(java.util.function.Supplier<java.text.BreakIterator> breakIterator)
Deprecated.
-
setScorer
@Deprecated public void setScorer(PassageScorer scorer)
Deprecated.
-
setFormatter
@Deprecated public void setFormatter(PassageFormatter formatter)
Deprecated.
-
setMaxNoHighlightPassages
@Deprecated public void setMaxNoHighlightPassages(int defaultMaxNoHighlightPassages)
Deprecated.
-
setCacheFieldValCharsThreshold
@Deprecated public void setCacheFieldValCharsThreshold(int cacheFieldValCharsThreshold)
Deprecated.
-
setFieldMatcher
@Deprecated public void setFieldMatcher(java.util.function.Predicate<java.lang.String> predicate)
Deprecated.
-
setWeightMatches
@Deprecated public void setWeightMatches(boolean weightMatches)
Deprecated.
-
shouldHandleMultiTermQuery
@Deprecated protected boolean shouldHandleMultiTermQuery(java.lang.String field)
Deprecated.Returns whetherMultiTermQuery
derivatives will be highlighted. By default it's enabled. MTQ highlighting can be expensive, particularly when using offsets in postings.
-
shouldHighlightPhrasesStrictly
@Deprecated protected boolean shouldHighlightPhrasesStrictly(java.lang.String field)
Deprecated.Returns whether position sensitive queries (e.g. phrases andSpanQuery
ies) should be highlighted strictly based on query matches (slower) versus any/all occurrences of the underlying terms. By default it's enabled, but there's no overhead if such queries aren't used.
-
shouldPreferPassageRelevancyOverSpeed
@Deprecated protected boolean shouldPreferPassageRelevancyOverSpeed(java.lang.String field)
Deprecated.
-
builder
public static UnifiedHighlighter.Builder builder(IndexSearcher searcher, Analyzer indexAnalyzer)
- Parameters:
searcher
- - aIndexSearcher
object.indexAnalyzer
- - aAnalyzer
object.- Returns:
- a
UnifiedHighlighter.Builder
object
-
builderWithoutSearcher
public static UnifiedHighlighter.Builder builderWithoutSearcher(Analyzer indexAnalyzer)
Creates aUnifiedHighlighter.Builder
object in which you can only usehighlightWithoutSearcher(String, Query, String, int)
for highlighting.- Parameters:
indexAnalyzer
- - aAnalyzer
object.- Returns:
- a
UnifiedHighlighter.Builder
object
-
extractTerms
protected static java.util.Set<Term> extractTerms(Query query)
Extracts matching terms
-
evaluateFlags
protected java.util.Set<UnifiedHighlighter.HighlightFlag> evaluateFlags(boolean shouldHandleMultiTermQuery, boolean shouldHighlightPhrasesStrictly, boolean shouldPassageRelevancyOverSpeed, boolean shouldEnableWeightMatches)
This method returns the set of ofUnifiedHighlighter.HighlightFlag
s, which will be applied to the UH object. The output depends on the values provided toUnifiedHighlighter.Builder.withHandleMultiTermQuery(boolean)
,UnifiedHighlighter.Builder.withHighlightPhrasesStrictly(boolean)
,UnifiedHighlighter.Builder.withPassageRelevancyOverSpeed(boolean)
andUnifiedHighlighter.Builder.withWeightMatches(boolean)
ORsetHandleMultiTermQuery(boolean)
,setHighlightPhrasesStrictly(boolean)
,setPassageRelevancyOverSpeed(boolean)
andsetWeightMatches(boolean)
- Parameters:
shouldHandleMultiTermQuery
- - flag for adding Multi-term queryshouldHighlightPhrasesStrictly
- - flag for adding phrase highlightingshouldPassageRelevancyOverSpeed
- - flag for adding passage relevancyshouldEnableWeightMatches
- - flag for enabling weight matches- Returns:
- a set of
UnifiedHighlighter.HighlightFlag
s.
-
evaluateFlags
protected java.util.Set<UnifiedHighlighter.HighlightFlag> evaluateFlags(UnifiedHighlighter.Builder uhBuilder)
Evaluate the highlight flags and set theflags
variable. This is called only once when the Builder object is used to create a UH object.- Parameters:
uhBuilder
- -UnifiedHighlighter.Builder
object.- Returns:
UnifiedHighlighter.HighlightFlag
s.
-
evaluateFlags
@Deprecated protected java.util.Set<UnifiedHighlighter.HighlightFlag> evaluateFlags(UnifiedHighlighter uh)
Deprecated.Evaluate the highlight flags and set theflags
variable. This is called every timegetFlags(String)
method is called. This is used in the builder and has been marked deprecated since it is used only for the mutable initialization of a UH object.- Parameters:
uh
- -UnifiedHighlighter
object.- Returns:
UnifiedHighlighter.HighlightFlag
s.
-
getFieldMatcher
protected java.util.function.Predicate<java.lang.String> getFieldMatcher(java.lang.String field)
Returns the predicate to use for extracting the query part that must be highlighted. By default only queries that target the current field are kept. (AKA requireFieldMatch)
-
getMaskedFields
protected java.util.Set<java.lang.String> getMaskedFields(java.lang.String field)
-
getFlags
protected java.util.Set<UnifiedHighlighter.HighlightFlag> getFlags(java.lang.String field)
Returns theUnifiedHighlighter.HighlightFlag
s applicable for the current UH instance.
-
getMaxLength
public int getMaxLength()
The maximum content size to process. Content will be truncated to this size before highlighting. Typically snippets closer to the beginning of the document better summarize its content.
-
getBreakIterator
protected java.text.BreakIterator getBreakIterator(java.lang.String field)
Returns theBreakIterator
to use for dividing text into passages. This returnsBreakIterator.getSentenceInstance(Locale)
by default; subclasses can override to customize.Note: this highlighter will call
BreakIterator.preceding(int)
andBreakIterator.next()
many times on it. The default generic JDK implementation ofpreceding
performs poorly.
-
getScorer
protected PassageScorer getScorer(java.lang.String field)
Returns thePassageScorer
to use for ranking passages.
-
getFormatter
protected PassageFormatter getFormatter(java.lang.String field)
Returns thePassageFormatter
to use for formatting passages into highlighted snippets.
-
getPassageSortComparator
protected java.util.Comparator<Passage> getPassageSortComparator(java.lang.String field)
Returns theComparator
to use for finally sorting passages.
-
getMaxNoHighlightPassages
protected int getMaxNoHighlightPassages(java.lang.String field)
Returns the number of leading passages (as delineated by theBreakIterator
) when no highlights could be found. If it's less than 0 (the default) then this defaults to themaxPassages
parameter given for each request. If this is 0 then the resulting highlight is null (not formatted).
-
getCacheFieldValCharsThreshold
public int getCacheFieldValCharsThreshold()
Limits the amount of field value pre-fetching until this threshold is passed. The highlighter internally highlights in batches of documents sized on the sum field value length (in chars) of the fields to be highlighted (bounded bygetMaxLength()
for each field). By setting this to 0, you can force documents to be fetched and highlighted one at a time, which you usually shouldn't do. The default is 524288 chars which translates to about a megabyte. However, note that the highlighter sometimes ignores this and highlights one document at a time (without caching a bunch of documents in advance) when it can detect there's no point in it -- such as when all fields will be highlighted via re-analysis as one example.
-
getIndexSearcher
public IndexSearcher getIndexSearcher()
... as passed in from constructor.
-
getIndexAnalyzer
public Analyzer getIndexAnalyzer()
... as passed in from constructor.
-
getOffsetSource
protected UnifiedHighlighter.OffsetSource getOffsetSource(java.lang.String field)
Determine the offset source for the specified field. The default algorithm is as follows:- This calls
getFieldInfo(String)
. Note this returns null if there is no searcher or if the field isn't found there. - If there's a field info it has
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
thenUnifiedHighlighter.OffsetSource.POSTINGS
is returned. - If there's a field info and
FieldInfo.hasVectors()
thenUnifiedHighlighter.OffsetSource.TERM_VECTORS
is returned (note we can't check here if the TV has offsets; if there isn't then an exception will get thrown down the line). - Fall-back:
UnifiedHighlighter.OffsetSource.ANALYSIS
is returned.
Note that the highlighter sometimes switches to something else based on the query, such as if you have
UnifiedHighlighter.OffsetSource.POSTINGS_WITH_TERM_VECTORS
but in fact don't need term vectors. - This calls
-
getFieldInfo
protected FieldInfo getFieldInfo(java.lang.String field)
Called by the default implementation ofgetOffsetSource(String)
. If there is no searcher then we simply always return null.
-
highlight
public java.lang.String[] highlight(java.lang.String field, Query query, TopDocs topDocs) throws java.io.IOException
Highlights the top passages from a single field.- Parameters:
field
- field name to highlight. Must have a stored string value and also be indexed with offsets.query
- query to highlight.topDocs
- TopDocs containing the summary result documents to highlight.- Returns:
- Array of formatted snippets corresponding to the documents in
topDocs
. If no highlights were found for a document, the first sentence for the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlight
public java.lang.String[] highlight(java.lang.String field, Query query, TopDocs topDocs, int maxPassages) throws java.io.IOException
Highlights the top-N passages from a single field.- Parameters:
field
- field name to highlight. Must have a stored string value.query
- query to highlight.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages used to form the highlighted snippets.- Returns:
- Array of formatted snippets corresponding to the documents in
topDocs
. If no highlights were found for a document, the firstmaxPassages
sentences from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields, Query query, TopDocs topDocs) throws java.io.IOException
Highlights the top passages from multiple fields.Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, topDocs)); } return m;
- Parameters:
fields
- field names to highlight. Must have a stored string value.query
- query to highlight.topDocs
- TopDocs containing the summary result documents to highlight.- Returns:
- Map keyed on field name, containing the array of formatted snippets corresponding to
the documents in
topDocs
. If no highlights were found for a document, the first sentence from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fields, Query query, TopDocs topDocs, int[] maxPassages) throws java.io.IOException
Highlights the top-N passages from multiple fields.Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, topDocs, maxPassages)); } return m;
- Parameters:
fields
- field names to highlight. Must have a stored string value.query
- query to highlight.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages per-field used to form the highlighted snippets.- Returns:
- Map keyed on field name, containing the array of formatted snippets corresponding to
the documents in
topDocs
. If no highlights were found for a document, the firstmaxPassages
sentences from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFields
public java.util.Map<java.lang.String,java.lang.String[]> highlightFields(java.lang.String[] fieldsIn, Query query, int[] docidsIn, int[] maxPassagesIn) throws java.io.IOException
Highlights the top-N passages from multiple fields, for the provided int[] docids.- Parameters:
fieldsIn
- field names to highlight. Must have a stored string value.query
- query to highlight.docidsIn
- containing the document IDs to highlight.maxPassagesIn
- The maximum number of top-N ranked passages per-field used to form the highlighted snippets.- Returns:
- Map keyed on field name, containing the array of formatted snippets corresponding to
the documents in
docidsIn
. If no highlights were found for a document, the firstmaxPassages
from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
highlightFieldsAsObjects
protected java.util.Map<java.lang.String,java.lang.Object[]> highlightFieldsAsObjects(java.lang.String[] fieldsIn, Query query, int[] docIdsIn, int[] maxPassagesIn) throws java.io.IOException
Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom Object as returned by thePassageFormatter
. Use this API to render to something other than String.- Parameters:
fieldsIn
- field names to highlight. Must have a stored string value.query
- query to highlight.docIdsIn
- containing the document IDs to highlight.maxPassagesIn
- The maximum number of top-N ranked passages per-field used to form the highlighted snippets.- Returns:
- Map keyed on field name, containing the array of formatted snippets corresponding to
the documents in
docIdsIn
. If no highlights were found for a document, the firstmaxPassages
from the field will be returned. - Throws:
java.io.IOException
- if an I/O error occurred during processingjava.lang.IllegalArgumentException
- iffield
was indexed withoutIndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
-
calculateOptimalCacheCharsThreshold
private int calculateOptimalCacheCharsThreshold(int numTermVectors, int numPostings)
When cacheCharsThreshold is 0, loadFieldValues() only fetches one document at a time. We override it to be 0 in two circumstances:
-
copyAndSortFieldsWithMaxPassages
private void copyAndSortFieldsWithMaxPassages(java.lang.String[] fieldsIn, int[] maxPassagesIn, java.lang.String[] fields, int[] maxPassages)
-
copyAndSortDocIdsWithIndex
private void copyAndSortDocIdsWithIndex(int[] docIdsIn, int[] docIds, int[] docInIndexes)
-
highlightWithoutSearcher
public java.lang.Object highlightWithoutSearcher(java.lang.String field, Query query, java.lang.String content, int maxPassages) throws java.io.IOException
Highlights text passed as a parameter. This requires theIndexSearcher
provided to this highlighter is null. This use-case is more rare. Naturally, the mode of operation will beUnifiedHighlighter.OffsetSource.ANALYSIS
. The result of this method is whatever thePassageFormatter
returns. For theDefaultPassageFormatter
and assumingcontent
has non-zero length, the result will be a non-null string -- so it's safe to callObject.toString()
on it in that case.- Parameters:
field
- field name to highlight (as found in the query).query
- query to highlight.content
- text to highlight.maxPassages
- The maximum number of top-N ranked passages used to form the highlighted snippets.- Returns:
- result of the
PassageFormatter
-- probably a String. Might be null. - Throws:
java.io.IOException
- if an I/O error occurred during processing
-
getFieldHighlighter
protected FieldHighlighter getFieldHighlighter(java.lang.String field, Query query, java.util.Set<Term> allTerms, int maxPassages)
-
newFieldHighlighter
protected FieldHighlighter newFieldHighlighter(java.lang.String field, FieldOffsetStrategy fieldOffsetStrategy, java.text.BreakIterator breakIterator, PassageScorer passageScorer, int maxPassages, int maxNoHighlightPassages, PassageFormatter passageFormatter, java.util.Comparator<Passage> passageSortComparator)
-
getHighlightComponents
protected UHComponents getHighlightComponents(java.lang.String field, Query query, java.util.Set<Term> allTerms)
-
hasUnrecognizedQuery
protected boolean hasUnrecognizedQuery(java.util.function.Predicate<java.lang.String> fieldMatcher, Query query)
-
filterExtractedTerms
protected static BytesRef[] filterExtractedTerms(java.util.function.Predicate<java.lang.String> fieldMatcher, java.util.Set<Term> queryTerms)
-
getPhraseHelper
protected PhraseHelper getPhraseHelper(java.lang.String field, Query query, java.util.Set<UnifiedHighlighter.HighlightFlag> highlightFlags)
-
getAutomata
protected LabelledCharArrayMatcher[] getAutomata(java.lang.String field, Query query, java.util.Set<UnifiedHighlighter.HighlightFlag> highlightFlags)
-
getOptimizedOffsetSource
protected UnifiedHighlighter.OffsetSource getOptimizedOffsetSource(UHComponents components)
-
getOffsetStrategy
protected FieldOffsetStrategy getOffsetStrategy(UnifiedHighlighter.OffsetSource offsetSource, UHComponents components)
-
requiresRewrite
protected java.lang.Boolean requiresRewrite(SpanQuery spanQuery)
When highlighting phrases accurately, we need to know whichSpanQuery
's need to haveQuery.rewrite(IndexSearcher)
called on them. It helps performance to avoid it if it's not needed. This method will be invoked on all SpanQuery instances recursively. If you have custom SpanQuery queries then override this to check instanceof and provide a definitive answer. If the query isn't your custom one, simply return null to have the default rules apply, which govern the ones included in Lucene.
-
preSpanQueryRewrite
protected java.util.Collection<Query> preSpanQueryRewrite(Query query)
When highlighting phrases accurately, we may need to handle custom queries that aren't supported in theWeightedSpanTermExtractor
as called by thePhraseHelper
. Should custom query types be needed, this method should be overriden to return a collection of queries if appropriate, or null if nothing to do. If the query is not custom, simply returning null will allow the default rules to apply.- Parameters:
query
- Query to be highlighted- Returns:
- A Collection of Query object(s) if needs to be rewritten, otherwise null.
-
asDocIdSetIterator
private DocIdSetIterator asDocIdSetIterator(int[] sortedDocIds)
-
loadFieldValues
protected java.util.List<java.lang.CharSequence[]> loadFieldValues(java.lang.String[] fields, DocIdSetIterator docIter, int cacheCharsThreshold) throws java.io.IOException
Loads the String values for each docId by field to be highlighted. By default this loads from stored fields by the same name as given, but a subclass can change the source. The returned Strings must be identical to what was indexed (at least for postings or term-vectors offset sources). This method must load fields for at least one document from the givenDocIdSetIterator
but need not return all of them; by default the character lengths are summed and this method will return early whencacheCharsThreshold
is exceeded. Specifically if that number is 0, then only one document is fetched no matter what. Values in the array ofCharSequence
will be null if no value was found.- Throws:
java.io.IOException
-
newLimitedStoredFieldsVisitor
protected UnifiedHighlighter.LimitedStoredFieldVisitor newLimitedStoredFieldsVisitor(java.lang.String[] fields)
-
-