Optimize FieldExistsQuery.count() when all docs have the field by iprithv · Pull Request #16111 · apache/lucene

iprithv · 2026-05-23T01:56:02Z

Optimize FieldExistsQuery.count() when all docs have the field.
Vectors return numDocs() directly.
Doc values only optimize with deletions when DocValuesSkipper is available.
points/terms are not used as proxies since they may not match doc values coverage exactly.

jainankitk · 2026-06-03T22:45:16Z

+              if (pointValues != null) {
+                return pointValues.getDocCount();
+              }


new logic falls through to super.count(context) == -1, which forces IndexSearcher to do a full scan to arrive at the same 0. Not a correctness regression, but it does take a constant-time path to a linear one for these edge cases. Probably fine since the case is rare in practice?

Yeah... losing the zero case for this and the Terms check is not ideal.

yes, actually. updated, thanks...when null will not do full scan now

msfroh

I noticed that you did the optimization just for the DocValuesSkipper case, but I was thinking you might be able to do something like:

// Don't do the `hasDeletions` check yet.
int count = -1;
if (fieldInfo.getPointDimensionCount() > 0) {
  PointValues pointValues = reader.getPointValues(field);
  count = (pointValues == null ? 0 : pointValues.getDocCount());
} else if (fieldInfo.getIndexOptions() != IndexOptions.NONE) {
  Terms terms = reader.terms(field);
  count =  (terms == null ? 0 : terms.getDocCount());
} else if (fieldInfo.docValuesSkipIndexType() != DocValuesSkipIndexType.NONE) {
  DocValuesSkipper docValuesSkipper = reader.getDocValuesSkipper(field);
  count = (docValuesSkipper == null ? 0 : docValuesSkipper.docCount());
}
if (count == 0) {
  // One of the above cases shows the field is not present on this leaf
  return 0;
} else if (count == reader.maxDoc()) {
  // All docs in the leaf (live or deleted) have the field. Return the count of live docs.
  return reader.numDocs();
} else if (count >=0 && reader.hasDeletions() == false) {
  // No deleted docs. The computed count can be trusted.
  return count;
}
// Some docs don't have the field and some docs are deleted. 
// Need to scan to get the correct intersection between field exists docs and live docs.
return super.count(context);

Honestly, I don't know if there's any reason why we couldn't extend this to cover the norms and vectors cases too. I think those three special cases (count == 0, count == maxDoc, no deletions) apply to any of the field types.

@iprithv -- what do you think?

msfroh · 2026-06-03T23:08:00Z

+            // DocValuesSkipper provides an exact doc count for doc values, so we can use it
+            // reliably even in the presence of deletions.


This comment is a bit confusing.

I interpreted it as "The DocValuesSkipper knows how many live docs have the field", which confused me, since it doesn't have access to the live docs (since the DocValuesProducer's SegmentReadState has SegmentInfos, but not SegmentCommitInfos).

Actually reading the code, it's clear that we're saying "If every document in the segment, live or deleted, has the field, then the count is the number of live docs."

ah yeah, updated the comment. thanks!

msfroh · 2026-06-03T23:09:39Z

+              if (pointValues != null) {
+                return pointValues.getDocCount();
+              }


Yeah... losing the zero case for this and the Terms check is not ideal.

iprithv · 2026-06-04T16:54:28Z

@iprithv -- what do you think?

makes sense, I kept norms as a special case though getDocCount() for norms actually returns postings count, not norms count (docs can have norms without postings when analyzer produces empty token stream). so the unified logic works for vectors/docValues/points/terms, but norms still only use the maxDoc shortcut. thanks!

I mistakenly left a whitespace when resolving conflicts.

…e#16111)

github-actions Bot added the module:core/search label May 23, 2026

iprithv force-pushed the optimize-fieldexists-count branch 2 times, most recently from 5ff0197 to 10b7760 Compare May 23, 2026 01:59

github-actions Bot added this to the 11.0.0 milestone May 23, 2026

iprithv force-pushed the optimize-fieldexists-count branch from 10b7760 to 0ac5a3b Compare May 28, 2026 23:43

github-actions Bot modified the milestones: 11.0.0, 10.5.0 May 28, 2026

iprithv force-pushed the optimize-fieldexists-count branch from 0ac5a3b to 2b4a435 Compare June 1, 2026 21:10

Optimize FieldExistsQuery.count() when all docs have the field

dc362c8

iprithv force-pushed the optimize-fieldexists-count branch from 2b4a435 to dc362c8 Compare June 2, 2026 07:42

jainankitk approved these changes Jun 3, 2026

View reviewed changes

msfroh reviewed Jun 3, 2026

View reviewed changes

iprithv added 2 commits June 4, 2026 22:15

review changes

b4482fd

Merge branch 'main' into optimize-fieldexists-count

f684cec

iprithv requested review from jainankitk and msfroh June 5, 2026 04:49

msfroh approved these changes Jun 16, 2026

View reviewed changes

msfroh added 2 commits June 16, 2026 12:20

Merge branch 'main' into optimize-fieldexists-count

9794d8a

Remove whitespace from CHANGES.txt

fec17de

I mistakenly left a whitespace when resolving conflicts.

msfroh merged commit 0be06f9 into apache:main Jun 16, 2026
13 checks passed

msfroh pushed a commit that referenced this pull request Jun 16, 2026

Optimize FieldExistsQuery.count() when all docs have the field (#16111)

6bbd645

gaobinlong pushed a commit to gaobinlong/lucene that referenced this pull request Jun 17, 2026

Optimize FieldExistsQuery.count() when all docs have the field (apach…

f025d46

…e#16111)

vijaykriishna pushed a commit to vijaykriishna/lucene that referenced this pull request Jun 20, 2026

Optimize FieldExistsQuery.count() when all docs have the field (apach…

a7d3279

…e#16111)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize FieldExistsQuery.count() when all docs have the field#16111

Optimize FieldExistsQuery.count() when all docs have the field#16111
msfroh merged 5 commits into
apache:mainfrom
iprithv:optimize-fieldexists-count

iprithv commented May 23, 2026 •

edited

Loading

Uh oh!

jainankitk Jun 3, 2026

Uh oh!

msfroh Jun 3, 2026

Uh oh!

iprithv Jun 4, 2026

Uh oh!

msfroh left a comment •

edited

Loading

Uh oh!

msfroh Jun 3, 2026

Uh oh!

iprithv Jun 4, 2026

Uh oh!

msfroh Jun 3, 2026

Uh oh!

iprithv commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// DocValuesSkipper provides an exact doc count for doc values, so we can use it
		// reliably even in the presence of deletions.

Conversation

iprithv commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jainankitk Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

msfroh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

msfroh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msfroh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

msfroh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iprithv commented May 23, 2026 •

edited

Loading

msfroh left a comment •

edited

Loading