96.20% token-native R@5.
99.00% fusion R@5.
ContextFit scored against the same public LongMemEval-S retrieval surface used by gbrain-evals: 500 questions, top-5 session retrieval, and a hit when any ground-truth answer_session_ids entry appears in the retrieved sessions.
96.20%Token-native R@5
99.00%Optional fusion R@5
97.60%gbrain-hybrid published R@5
96.60%MemPalace raw published R@5
500LongMemEval-S questions
This is retrieval recall, not answer accuracy. There is no answer-generation model and no LLM judge in this metric.
Result
| System | R@1 | R@3 | R@5 | R@10 | Hits@5 | Embeddings | Vector store |
|---|---|---|---|---|---|---|---|
| gbrain-hybrid | - | - | 97.60% | - | 488/500 | yes | local |
| MemPalace raw | - | - | 96.60% | - | 483/500 | yes | local |
| ContextFit token-native | 81.80% | 90.40% | 96.20% | 97.80% | 481/500 | no | no |
| ContextFit + OpenAI fusion | 84.60% | 95.20% | 99.00% | 99.60% | 495/500 | yes | no |
Per-Type R@5
| Question type | Token-native | Fusion |
|---|---|---|
| knowledge-update | 98.7% | 100.0% |
| multi-session | 95.5% | 100.0% |
| single-session-assistant | 100.0% | 100.0% |
| single-session-preference | 90.0% | 86.7% |
| single-session-user | 95.7% | 100.0% |
| temporal-reasoning | 95.5% | 99.2% |
Artifacts
Token-native artifact: benchmarks/longmemeval_token_native_certificate_promotion_v5_typed_rescue_tight_20260524.json, SHA-256 c0e7ebc5d925549e1e3058b6100ab0786654c4d8c1bd4a99fe920c57f3ff2ea6.
Fusion artifact: benchmarks/longmemeval_fusion_selective_chunk_promotion_v5_typed_rescue_20260524.json, SHA-256 ababf7387cb18c9310e82c35a57594aeef3cd40a2b60d9a1f336f69b65d3dcd2.
Scoring Command
python3 - <<'PY'
import json
from pathlib import Path
files = [
("ContextFit token-native", "benchmarks/longmemeval_token_native_certificate_promotion_v5_typed_rescue_tight_20260524.json"),
("ContextFit fusion", "benchmarks/longmemeval_fusion_selective_chunk_promotion_v5_typed_rescue_20260524.json"),
]
for name, path in files:
rows = json.loads(Path(path).read_text())["rows"]
print(name)
for k in (1, 3, 5, 10):
hits = sum(r["best_rank"] is not None and r["best_rank"] <= k for r in rows)
print(f"R@{k}: {hits}/{len(rows)} = {hits / len(rows) * 100:.2f}%")
PY