idx

ADR 0003: Separate Metadata Filters from BM25 Content

Status

Accepted

Context

The index stores document path metadata. Users want to search by path, but adding metadata tokens to the same BM25 corpus as file content changes document length, term frequency, and inverse document frequency.

The project already relies on BM25 for ranked content search, and ADR 0001 defines the corpus as file contents. Path metadata serves a different purpose: it constrains the result set, but it should not influence relevance statistics.

Decision

Path metadata is indexed in a separate metadata term map.

The BM25 corpus remains based on file content only.

Search accepts two independent inputs:

Metadata filters reduce the candidate document set but do not change BM25 score, document length, term frequency, or IDF.

Decision Drivers

Consequences

Positive

Negative

Operational Notes