Accepted
BM25 scoring is based entirely on term frequency and inverse document
frequency within a file’s content. A file’s name is not a factor in the
raw BM25 score, so a query like func main may rank files that mention the
word “main” frequently in their bodies above main.go or main_test.go,
even though those files are almost certainly the most relevant results.
This is a precision problem: developers intuitively expect files whose names match query tokens to appear near the top of results, especially when the name is an exact lexical match.
The challenge is that file names may use several naming conventions:
bm25_index_service.go → tokens: bm25, index, service, goSearchScoringService → tokens: search, scoring, servicemain.go → tokens: main, goA simple substring check on the raw file name would fail for CamelCase and
produce false positives for short common tokens like go.
After BM25 scoring and normalisation, apply an additive filename match bonus to each result’s score before final sorting.
The bonus values are:
| Match type | Bonus |
|---|---|
Query term equals the full file stem (e.g. main matches main.go) |
+1.0 |
| Query term equals exactly one filename token after splitting | +1.0 |
| Query term is a substring of one filename token | +0.5 |
| No match | 0 |
Filename tokenisation is performed by domain.TokenizeFileName, which:
_, ., -, and / boundaries.The exact-token check is evaluated before the substring check so that a query
term that fully equals a token always receives 1.0 rather than being
incorrectly matched as a substring of itself with 0.5.
The bonus is applied in search_command_service.buildSearchResult:
score: score + fileNameMatchBonus(terms, fileName),
The final score is not re-normalised after adding the bonus, so the bonus can
push a result above 1.0. This is intentional: a file whose name is a
strong match should rank above files whose high BM25 score derives solely
from repeated term occurrences in content.
main.go and main_test.go now rank near the top for a query of func main.SearchScoringService.go receive a bonus when
querying scoring or search.fileNameMatchBonus) with
clear test coverage.1.0 / 0.5) was chosen heuristically; different
corpora may need tuning.domain.TokenizeFileName is shared with the indexing pipeline (ADR 0010)
to keep tokenisation consistent between retrieval and ranking.internal/core/services/search/search_output_scoring_internal_test.go.