This phase tried to equalize scores across different queries and indices so that they are more comparable, but didn't alter the sort order of hits, and was also TF/IDF specific. Likewise, the query normalization phase of scoring will be removed. Since this was specific to one scoring model, TFIDFSimilarity, and since Lucene has now switched to the better Okapi BM25 scoring model by default, we have now fully removed coordination factors in 7.0 from both BooleanQuery and Similarity. However, this hack is only necessary for scoring models like TF/IDF which have "weak" term saturation such that many occurrences of a single term in a document would be more powerful than adding a single occurence of another term from the query. Finally, with index time boosts gone, length encoding is more accurate, and in particular the first nine length values (1 to 9) are distinct.īooleanQuery has long exposed a confusing scoring feature called the coordination factor ( coord), to reward hits containing a higher percentage of the search terms.
Furthermore, it is now straightforward to write your custom boost into your own doc values field and use function queries to apply the boost at search time. This has always been a fragile feature: it was encoded, along with the field's length, into a single byte value, and thus had very low precision. Index-time boosting, which lets you increase the a-priori score for a particular document versus other documents, is now deprecated and will be removed in 7.0. Our nightly sparse benchmarks, based on the NYC Trip Data corpus, show the impressive gains each of the above changes (and more!) accomplished. This is the same as other parts of the index like postings, stored fields, term vectors, etc., and it means users with very sparse doc values no longer see merges taking unreasonably long time or the index becoming unexpectedly huge while merging. With these changes, you finally only pay for what you actually use with doc values, in index size, indexing performance, etc.
APACHE LUCENE DOC FULL
This is only a subset of the new 7.0 only features for the full list please see the 7.0.0 section in the upcoming CHANGES.txt.
Of course, with every major release, we also do more mundane things like remove deprecated 6.x APIs, and drop support for old indices (written with Lucene 5.x or earlier). Remember that Lucene developers generally try hard to backport new features for the next non-major (feature) release, and the upcoming 6.5 already has many great changes, so a new major release is exciting because it means the 7.0-only features, which I now describe, are the particularly big ones that we felt could not be backported for 6.5. The Apache Lucene project will likely release its next major release, 7.0, in a few