idx

ADR 0009: Filename Partial-Match Bonus for Relevance Ranking

Status

Accepted

Context

BM25 scoring is based entirely on term frequency and inverse document frequency within a file’s content. A file’s name is not a factor in the raw BM25 score, so a query like func main may rank files that mention the word “main” frequently in their bodies above main.go or main_test.go, even though those files are almost certainly the most relevant results.

This is a precision problem: developers intuitively expect files whose names match query tokens to appear near the top of results, especially when the name is an exact lexical match.

The challenge is that file names may use several naming conventions:

A simple substring check on the raw file name would fail for CamelCase and produce false positives for short common tokens like go.

Decision

After BM25 scoring and normalisation, apply an additive filename match bonus to each result’s score before final sorting.

The bonus values are:

Match type Bonus
Query term equals the full file stem (e.g. main matches main.go) +1.0
Query term equals exactly one filename token after splitting +1.0
Query term is a substring of one filename token +0.5
No match 0

Filename tokenisation is performed by domain.TokenizeFileName, which:

  1. Splits the filename on _, ., -, and / boundaries.
  2. Further splits each part by CamelCase word boundaries using Unicode upper/lower transitions.
  3. Lowercases all tokens before comparison.

The exact-token check is evaluated before the substring check so that a query term that fully equals a token always receives 1.0 rather than being incorrectly matched as a substring of itself with 0.5.

The bonus is applied in search_command_service.buildSearchResult:

score: score + fileNameMatchBonus(terms, fileName),

The final score is not re-normalised after adding the bonus, so the bonus can push a result above 1.0. This is intentional: a file whose name is a strong match should rank above files whose high BM25 score derives solely from repeated term occurrences in content.

Decision Drivers

Consequences

Positive

Negative

Operational Notes