Accepted
The BM25 search engine previously required all query terms to appear in a
document before scoring it. This implicit AND behaviour was good for precision
but limited recall: searching for err := root.Execute() with five tokens
would only return files containing every single token, missing files that were
clearly relevant but used slightly different phrasing.
Users with exploratory queries — e.g. looking for files related to two loosely-coupled concepts — had no way to broaden the result set short of running multiple independent searches and merging results manually.
Even after introducing explicit AND and OR, a second precision-oriented
workflow remained awkward: very specific AND queries with four or more terms
would often return zero results because the last one or two tokens were too
literal, even when the repository contained files matching the earlier, more
structural portion of the query. Users wanted a way to keep AND-style ranking
semantics while progressively relaxing the least important trailing tokens.
Two additional problems were identified with the pre-existing AND-only implementation:
Score ties broke alphabetically — BM25 scores are normalised per
directory to [0, 1], so the highest-scoring file in each directory always
receives 1.0. When several files tied at 1.0, the tiebreaker was
lexicographic path order. This meant cmd/idx/main.go could beat
internal/adapters/handlers/cli/command_runner.go even when the latter
contained the exact phrase being searched.
Partial-match penalty for OR was missing — a naive OR that simply unions the document sets would let a file matching only one of five terms (but with a very high TF for that term) outscore a file matching all five terms.
--operator flagAdd a --operator flag to idx search with two accepted values:
AND (default) — document must contain all query terms.OR — document must contain at least one query term.The flag is validated at the CLI layer; unsupported values return a descriptive
error. The value is propagated through ports.SearchOptions.Operator to the
core service and scoring functions, keeping the CLI adapter decoupled from
ranking logic.
To prevent high-TF single-term documents from outranking full-match documents in OR mode, each document’s BM25 score is multiplied by its term coverage fraction:
finalScore = bm25Score × (matchedTermCount / totalQueryTermCount)
A document matching all N terms receives a multiplier of 1.0 (no penalty).
A document matching only k < N terms receives k/N, ensuring full-match
documents always rank above partial-match documents with equal or lower raw
BM25. This multiplier is only applied in OR mode; AND mode is unchanged
because every matched document already satisfies full coverage by definition.
After BM25 + coverage, documents within the same directory still normalise to
1.0. A second tiebreaker was added: term concentration — the maximum
number of distinct query terms that co-occur on a single matched line.
Sort key priority (descending):
This ensures that a file containing the exact phrase err := root.Execute()
on one line ranks above files where the same tokens appear on separate,
unrelated lines.
Add an optional --relaxation >N flag for idx search that is only valid
with --operator AND.
When enabled for queries with more than three unique terms, search evaluates a sequence of decreasing AND prefixes by removing terms from right to left:
t1 t2 t3 t4 t5t1 t2 t3 t4t1 t2 t3Fallback stops once the candidate prefix would have N or fewer terms.
For example, --relaxation '>2' allows prefixes of length 3+ only.
Results from all evaluated prefixes are merged per document, keeping the best variant for that document. Ranking prioritises:
This preserves a precision-first search experience while still surfacing relevant near-miss results when the strict full query would otherwise return nothing.
The existing proximity bonus assumed all matched documents contained every
query term. With OR, a document may be matched through only one term, so
index.Terms[term].Docs[filePath] could be nil for the other terms.
A nil guard was added to minimumDistanceForTermPair to skip pairs where
either term is absent from the document.
N terms or fewer.