96.20% token-native R@5.
99.00% fusion R@5.

ContextFit scored against the same public LongMemEval-S retrieval surface used by gbrain-evals: 500 questions, top-5 session retrieval, and a hit when any ground-truth answer_session_ids entry appears in the retrieved sessions.

96.20%Token-native R@5
99.00%Optional fusion R@5
97.60%gbrain-hybrid published R@5
96.60%MemPalace raw published R@5
500LongMemEval-S questions
This is retrieval recall, not answer accuracy. There is no answer-generation model and no LLM judge in this metric.

Result

SystemR@1R@3R@5R@10Hits@5EmbeddingsVector store
gbrain-hybrid--97.60%-488/500yeslocal
MemPalace raw--96.60%-483/500yeslocal
ContextFit token-native81.80%90.40%96.20%97.80%481/500nono
ContextFit + OpenAI fusion84.60%95.20%99.00%99.60%495/500yesno

Per-Type R@5

Question typeToken-nativeFusion
knowledge-update98.7%100.0%
multi-session95.5%100.0%
single-session-assistant100.0%100.0%
single-session-preference90.0%86.7%
single-session-user95.7%100.0%
temporal-reasoning95.5%99.2%

Artifacts

Token-native artifact: benchmarks/longmemeval_token_native_certificate_promotion_v5_typed_rescue_tight_20260524.json, SHA-256 c0e7ebc5d925549e1e3058b6100ab0786654c4d8c1bd4a99fe920c57f3ff2ea6.

Fusion artifact: benchmarks/longmemeval_fusion_selective_chunk_promotion_v5_typed_rescue_20260524.json, SHA-256 ababf7387cb18c9310e82c35a57594aeef3cd40a2b60d9a1f336f69b65d3dcd2.

Scoring Command

python3 - <<'PY'
import json
from pathlib import Path

files = [
    ("ContextFit token-native", "benchmarks/longmemeval_token_native_certificate_promotion_v5_typed_rescue_tight_20260524.json"),
    ("ContextFit fusion", "benchmarks/longmemeval_fusion_selective_chunk_promotion_v5_typed_rescue_20260524.json"),
]

for name, path in files:
    rows = json.loads(Path(path).read_text())["rows"]
    print(name)
    for k in (1, 3, 5, 10):
        hits = sum(r["best_rank"] is not None and r["best_rank"] <= k for r in rows)
        print(f"R@{k}: {hits}/{len(rows)} = {hits / len(rows) * 100:.2f}%")
PY