Skip to content

feat: context compaction strategies for the react loop#996

Open
yelkurdi wants to merge 32 commits into
generative-computing:mainfrom
yelkurdi:context_compaction_for_react
Open

feat: context compaction strategies for the react loop#996
yelkurdi wants to merge 32 commits into
generative-computing:mainfrom
yelkurdi:context_compaction_for_react

Conversation

@yelkurdi

@yelkurdi yelkurdi commented May 1, 2026

Copy link
Copy Markdown
Contributor

Component PR

Use this template when adding or modifying components in mellea/stdlib/components/.

Description

Implementation Checklist

Protocol Compliance

  • parts() returns list of constituent parts (Components or CBlocks)
  • format_for_llm() returns TemplateRepresentation or string
  • _parse(computed: ModelOutputThunk) parses model output correctly into the specified Component return type

Content Blocks

  • CBlock used appropriately for text content
  • ImageBlock used for image content (if applicable)

Integration

  • Component exported in mellea/stdlib/components/__init__.py or, if you are adding a library of components, from your sub-module

Testing

  • Tests added to tests/components/
  • New code has 100% coverage
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Summary

Adds a generic, sync Compactor protocol to mellea.stdlib.context for shrinking a running Context, with three implementations — WindowCompactor, ThresholdCompactor, LLMSummarizeCompactor — plus composable pin predicates that decide which leading components survive. ReACT gains an optional compactor= parameter; ChatContext gains compactor= / window_size= for per-append compaction.

Empirically on the BCP benchmark with Granite 4.1-8b, LLM-summary compaction cuts inference cost by 23.7% and raises accuracy by 3.5 pp — a dual win, not a cost/quality trade-off. Backwards compatible: the defaults preserve existing behavior exactly.

Motivation

Long agentic loops pile up tool responses. Each react iteration re-sends the full history, so prompt-token cost grows quadratically and the loop can exhaust the context window before finishing. Compaction trims that history, cutting both dollar cost and the chance of hitting the context / timeout wall.

On BrowseCompPlus (BCP) with Granite 4.1-8b (131K context, 830 questions, loop_budget=400, per_question_timeout_s=1800, mean of 3 runs/strategy):

Compaction Est. cost @ 80% cache hit Accuracy Mean wall-clock per Q
none $1801.0M 15.6% 724 s
clear_all $1499.2M (−16.8%) 18.5% (+2.9 pp) 980 s
llm_summarize $1373.9M (−23.7%) 19.1% (+3.5 pp) 709 s

llm_summarize is the clear winner — cost −23.7% and accuracy +3.5 pp (+23% relative) at similar wall-clock, since the smaller prompt makes each iteration faster. clear_all also helps (−16.8% cost, +2.9 pp) but its +35% wall-clock is less attractive when the budget allows the extra summarization call.

Benchmark labels are the external eval-harness strategy names: clear_allWindowCompactor(size=0, pin_predicate=pin_react_initiator), llm_summarizeLLMSummarizeCompactor(pin_predicate=pin_react_initiator), both gated via ThresholdCompactor.

Hardware / infrastructure
  • Model: ibm-granite/granite-4.1-8b (bf16, native 131K context)
  • Node: single 8× NVIDIA H100 80 GiB host (IBM Bluevela LSF cluster, exclusive GPU allocation)
  • Inference: vLLM 0.19.1, 7 instances × TP=1 (1 GPU each), --enable-prefix-caching on. GPU 0 reserved for the BCP local search service.
  • Agent: Mellea react via OpenAIBackend → local vLLM, concurrency=56, loop_budget=400, per_question_timeout_s=1800, --compaction_threshold 50000 (~38% of 131K), --compaction_keep_n 5.
  • Judge: openai/gpt-oss-120b on the same node after agent teardown.
  • All 830 BCP questions; 3 independently seeded runs per strategy (means reported).

Measured with a separate BCP eval harness running on Mellea's react framework; the feature itself has no external runtime dependency.

Design

class Compactor(Protocol):
    def compact(self, ctx: T, *, backend: Backend | None = None) -> T: ...

A Compactor returns a fresh, compacted copy of a context (never mutates the input). Sync and generic in T (a Context subtype), so a chat compactor narrows to ChatContext -> ChatContext. Backend-calling compactors hide the async work behind the sync method. Two usage patterns: per-append (ChatContext(compactor=...), runs on every add()) and per-turn / manual (react(compactor=...) or a direct compact(ctx, backend=...) between turns).

InlineCompactor markerChatContext.add() calls compact() without a backend on every append, so ChatContext(compactor=...) accepts only InlineCompactor instances (structural Compactor satisfaction isn't enough), steering backend-calling compactors to react(compactor=...), ThresholdCompactor, or manual compact().

Pin predicates — a PinPredicate returns the index after the protected prefix (components[:idx] preserved, components[idx:] is the body): pin_nothing, pin_system, pin_system_and_initial_user, and pin_react_initiator (keeps the react goal + tool registration).

Compactors:

  • WindowCompactor(size, pin_predicate=pin_system) (inline) — keep prefix + last size body components; size=0 drops the body ("clear all").
  • ThresholdCompactor(inner, threshold) (inline) — forwards to inner.compact only when measured token size exceeds threshold; the composable token-gate.
  • LLMSummarizeCompactor(default_backend, keep_n=5, pin_predicate=pin_nothing, ...) — summarize older body into one [CONTEXT SUMMARY] message, keep last keep_n verbatim. Not inline (calls the backend each time); use via react/ThresholdCompactor/manual.

Token size (_last_usage_tokens) is read from the most recent ModelOutputThunk.generation.usage total_tokens. Helper react_summary_prompt(goal, max_tokens_hint) builds a research-flavoured template for LLMSummarizeCompactor.

Integration — in react, compaction runs once per turn after the is_final check so terminal turns skip it (saves a wasted summary call). Custom compactors used with react must return a ChatContext.

Non-goals

No mellea/core/ changes. The single-module mellea/stdlib/context.py is refactored into a mellea/stdlib/context/ package (chat.py, simple.py, compactor.py); public imports from mellea.stdlib.context are unchanged.

Test plan

  • uv run pytest test/stdlib/test_compactor.py62 tests, ~12 s, no external services
  • uv run pytest test/stdlib/frameworks/test_react_framework.pycompactor= integration

Covers each compactor's output/prefix preservation, all pin predicates, threshold gating (below/at/above, <=0 disables, no-thunk-yet), window sizing incl. size=0 and the window_size= sugar, the InlineCompactor marker check, and LLMSummarizeCompactor rendering + best-effort error handling (backend failure → unchanged; programming errors propagate). Runnable examples in docs/examples/context/.

Pitfalls

  1. No generation.usage → token-gated compaction silently no-ops. All mainline backends populate it (AGENTS.md §5); a new backend without it disables ThresholdCompactor.
  2. The token count lags by one turn. The threshold is checked against the token usage reported by the last completed model call, so anything added to the context after that call — most notably the current turn's tool responses — isn't counted until the next model call runs. In practice this is negligible, unless a single tool response is large enough to blow past the threshold on its own.
  3. Sync→async bridge in LLMSummarizeCompactor (#1258). compact() is sync but must call an async backend, so it bridges through _run_coro_blocking (a worker-thread asyncio.run). Two known failure modes: (a) inside a running loop (e.g. react()) the worker blocks the calling thread, stalling the outer loop's other tasks — callbacks, telemetry flushers, cancellation — until the summary finishes; (b) a backend whose async client (e.g. httpx.AsyncClient) is bound to mellea's shared loop can't be used from the fresh worker loop, surfacing as RuntimeError: This event loop is already running or a hung request. Both are flagged in the docstrings but can't be fixed without a protocol change. The resolution tracked in Async Compactor protocol to replace sync↔async bridge in LLMSummarizeCompactor #1258 is an async acompact() on the Compactor protocol that react() awaits directly — no bridge, no loop blocking, no cross-loop client hazard.
  4. Absolute token threshold, not a % of max context — a fraction-of-context threshold is a possible follow-up once backends report the limit reliably.
  5. Compaction fires after the is_final check — easy to swap in a refactor; guarded by a comment, ideally a placement test later.

Files

 mellea/stdlib/context/compactor.py              613 +  Compactor protocol, pin predicates, 3 compactors
 mellea/stdlib/context/{__init__,chat,simple}.py 199 +  package split + compactor=/window_size= on ChatContext
 mellea/stdlib/context.py                          82 -  (replaced by the package above)
 mellea/stdlib/components/react.py                 96 +  pin_react_initiator, react_summary_prompt
 mellea/stdlib/frameworks/react.py                 22 ±  optional compactor= integration
 mellea/stdlib/{session,plugins/hooks/session}.py  35 ±  interaction_count tracked out-of-band
 test/stdlib/test_compactor.py                    824 +  (62 tests)
 test/stdlib/frameworks/test_react_framework.py   224 +
 docs/examples/context/*.py                              4 runnable examples + README

@yelkurdi yelkurdi requested a review from a team as a code owner May 1, 2026 19:59
@yelkurdi yelkurdi requested review from markstur and nrfulton May 1, 2026 19:59
@github-actions github-actions Bot added the enhancement New feature or request label May 1, 2026
@yelkurdi

yelkurdi commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

please add @ramon-astudillo as an observer

@psschwei

psschwei commented May 4, 2026

Copy link
Copy Markdown
Member

(original PR body)

Summary

Adds an optional CompactionStrategy to mellea.stdlib.frameworks.react with three concrete implementations (ClearAll, KeepLastN, LLMSummarize) under a new module mellea.stdlib.compaction. Strategies fire when the running context's token count crosses a configurable threshold, measured from the provider-reported usage on the last ModelOutputThunk.

Empirically on the BCP benchmark with Granite 4.1-8b, llm_summarize cuts inference cost by 23.7% and raises accuracy by 3.5 pp — compaction is a dual win, not a quality/cost trade-off.

Backwards compatible: compaction=None (default) preserves existing react() behavior exactly.

Motivation

Long agentic loops — especially retrieval-heavy ones — pile up tool responses. Each react iteration re-sends the full history to the model, so prompt-token cost grows quadratically and the loop can exhaust the model's context window before reaching a final answer. Compaction trims that history, lowering both the dollar cost of inference and the likelihood of hitting the context / timeout wall.

On the BrowseCompPlus (BCP) benchmark with Granite 4.1-8b (131K context, 830 questions, loop_budget=400, per_question_timeout_s=1800, averaged across 3 runs per strategy):

Compaction Est. cost @ 80% cache hit Accuracy Mean wall-clock per Q
none $1801.0M 15.6% 724 s
clear_all $1499.2M (−16.8%) 18.5% (+2.9 pp) 980 s
llm_summarize $1373.9M (−23.7%) 19.1% (+3.5 pp) 709 s
  • Without compaction, inference cost on the 830-Q set is ~$1801M under an 80%-cache-hit pricing model, because each react iteration re-sends an ever-larger prompt (history + tool outputs) — the prompt-token total balloons to 5.4B before the 1800 s wall / context limit stops further progress.
  • llm_summarize is the clear winner: it cuts inference cost by −23.7% ($1801M → $1374M) and raises accuracy from 15.6% to 19.1% (+3.5 pp, +23% relative), at similar mean wall-clock. The summary shrinks the prompt enough that each iteration is actually faster, so both cost and quality improve.
  • clear_all reduces cost by −16.8% and lifts accuracy +2.9 pp, but its +35% wall-clock makes it less attractive than llm_summarize when the judge budget allows the extra summarization call.

Hardware / infrastructure for the table above:

  • Model: ibm-granite/granite-4.1-8b (bf16, native 131K context)
  • Node: single 8× NVIDIA H100 80 GiB host (IBM LSF cluster, exclusive GPU allocation)
  • Inference: vLLM 0.19.1, 7 instances × TP=1 (1 GPU each), with --enable-prefix-caching on (Granite-family default). GPU 0 reserved for the BCP local search service (tevatron + BM25 over the BCP corpus).
  • Agent: Mellea react loop via OpenAIBackend → local vLLM, concurrency=56 (8 × num_vllm), loop_budget=400, per_question_timeout_s=1800. Threshold: --compaction_threshold 50000 tokens (~38% of 131K), --compaction_keep_n 5 where applicable.
  • Questions: all 830 from the BCP parquet, decrypted at load.
  • Judge: openai/gpt-oss-120b on the same node after the agent phase teardown. "Correct" = judge verdict matches the reference answer.
  • 3 runs per strategy, numbers in the table are means.

Measurement was done with a separate BCP eval harness (forthcoming PR); the harness data is included here as motivating evidence — the compaction feature itself has no external runtime dependency.

Design

Protocol — CompactionStrategy

class CompactionStrategy(abc.ABC):
    def __init__(self, *, threshold: int = 0) -> None: ...
    def should_compact(self, context: ChatContext) -> bool: ...
    async def maybe_compact(
        self, context, *, backend=None, goal=None
    ) -> ChatContext: ...
    @abc.abstractmethod
    async def compact(
        self, context, *, backend=None, goal=None
    ) -> ChatContext: ...

Threshold is compared against total_tokens from the most recent ModelOutputThunk.usage — i.e. the prompt+completion of the last LLM call.

Concrete strategies

  • ClearAll(threshold) — discard everything after the first ReactInitiator, keeping only the system prefix. Cheapest, most aggressive; model must rebuild context each cycle.
  • KeepLastN(keep_n, threshold) — retain prefix + the last keep_n body components. Middle-ground; preserves recent tool outputs.
  • LLMSummarize(keep_n, threshold) — summarize the older body components via an additional LLM call, keep the last keep_n verbatim. Highest-fidelity, most expensive per-fire; empirically the best for Granite 4.1 on BCP (see table).

Integration site (react.py, +13 lines)

async def react(
    ..., compaction: CompactionStrategy | None = None,
) -> tuple[ComputedModelOutputThunk[str], ChatContext]:
    ...
    while turn_num < loop_budget or loop_budget == -1:
        step, next_context = await mfuncs.aact(...)
        ...
        if is_final:
            return step, context

        # Compact AFTER the final-answer check so terminal turns skip it.
        if compaction is not None:
            context = await compaction.maybe_compact(
                context, backend=backend, goal=goal
            )

Design decision — compact after the is_final check: a terminal turn has no next iteration to benefit from compaction, and for LLMSummarize this saves a full LLM call per question that would otherwise be wasted. Flagged inline so future refactors don't regress it.

Non-goals

  • No changes to mellea/core/ — the feature is purely additive under mellea/stdlib/, reusing the existing ChatContext + ModelOutputThunk surfaces.

Test plan

  • uv run pytest test/stdlib/test_compaction.py — 26 tests, 11 s locally, no external services required

The test file uses DummyThunk with synthetic usage dicts — no real backend needed. Coverage includes:

  • Each strategy's compact() output shape
  • Token-count comparison (below, at, above threshold)
  • threshold=0 disables compaction
  • Empty-context / no-thunk-with-usage returns False
  • Prefix preservation (first ReactInitiator never dropped)
  • LLMSummarize error handling when backend/goal omitted

Pitfalls (flagged here so reviewers know what to watch for)

  1. Backends that don't populate mot.usage silently disable compaction. All mainline Mellea backends set it (OpenAI, HF, Ollama, LiteLLM), and AGENTS.md §5 codifies this as a requirement. If a new backend lands without usage population, its users will see compaction become a no-op with no loud error.

  2. One-turn lag in the token-count measurement. The count reflects the prompt+completion of the LLM call that just completed — tool responses appended since are not yet counted. In practice negligible (a typical tool response is <5K tokens relative to a 50K+ threshold). Becomes visible only if a single tool response is very large (e.g. a raw document dump). Documented in the compaction.py docstring.

  3. _last_usage_tokens returns None before the first model call. should_compact then returns False — no-op, not an error. Matters for strategies with very low thresholds where the first LLM call itself crosses the bar.

  4. LLMSummarize needs backend + goal at compact() time. These are forwarded from the react call site. If a user constructs LLMSummarize and calls compact() directly (outside react), they must pass both. The docstring says so; reviewers may prefer a runtime check.

  5. Strategies that drop all body components can leave a thunk-less context. On the next check, _last_usage_tokens returns None and compaction correctly doesn't re-fire immediately — behavior we want, but worth verifying didn't regress.

  6. Token-count threshold is absolute, not a percentage of max context. A possible enhancement is to express the threshold as a percentage of the model's max context length rather than an absolute token count. This would require backends to reliably report that limit (e.g. 131K for Granite 4.1), after which --compaction_threshold 0.5 would read as "fire at 50% of context".

  7. Compaction firing point is after the is_final check, not before. Easy to accidentally swap in a future refactor. Code comment calls this out; ideally a regression test guards it, but the current tests focus on the strategies themselves rather than react-loop placement. Worth adding if the reviewer flags it.

Files

 mellea/stdlib/frameworks/react_compaction.py
 mellea/stdlib/frameworks/react.py
 test/stdlib/frameworks/test_react_compaction.py
 1 file modified, 2 added

@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@psschwei

psschwei commented May 4, 2026

Copy link
Copy Markdown
Member

@yelkurdi I updated the PR body so that the update-pr-body check would pass (and copied your original body into a comment). Now that the check is passing, feel free to re-edit to include your original comments in the appropriate section.

@ramon-astudillo

Copy link
Copy Markdown

relevant related discussion #1099

@yelkurdi

Copy link
Copy Markdown
Contributor Author

Will update according to #1099

@github-actions

Copy link
Copy Markdown
Contributor

This comment is managed by a bot. Editing it is fine — checking off boxes, adding notes — but please leave the HTML comment marker on the first line alone, otherwise checklist updates will break.

Component PR Checklist

Use this checklist when adding or modifying components in mellea/stdlib/components/.

Protocol Compliance

  • parts() returns list of constituent parts (Components or CBlocks)
  • format_for_llm() returns TemplateRepresentation or string
  • _parse(computed: ModelOutputThunk) parses model output correctly into the specified Component return type

Content Blocks

  • CBlock used appropriately for text content
  • ImageBlock used for image content (if applicable)

Integration

  • Component exported in mellea/stdlib/components/__init__.py or, if you are adding a library of components, from your sub-module

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — the Compactor architecture is really well thought through, and the test coverage in test_compactor.py is thorough. A few things to sort out before this lands, mostly around LLMSummarizeCompactor edge cases and a couple of quick doc fixes.

Comment thread test/stdlib/test_base_context.py Outdated
def test_context_construction():
context_construction(SimpleContext)
# ChatContext defaults to WindowCompactor(5); a single add stays well under
# the window so the linked-list shape is identical to the pre-compaction

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] This comment says "ChatContext defaults to WindowCompactor(5)" but the actual default is compactor=None — no compaction at all. test_default_has_no_compactor in test_compactor.py asserts this directly. Looks like a leftover from an earlier design iteration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@planetf1 you are correct it is a leftover design iteration, will be corrected.

Comment thread mellea/stdlib/context/compactor.py Outdated
directly between turns, e.g. when compaction is exposed to the model as a
tool.

See ``docs/rewrite/`` for full usage examples.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] The module docstring points to docs/rewrite/ for usage examples, but that directory only has session_deepdive/ and streaming/ — nothing compaction-related. The actual examples are in docs/examples/context/. Worth fixing so the breadcrumb is useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will be fixed to point to docs/examples/context/

return pool.submit(asyncio.run, coro).result()


class LLMSummarizeCompactor:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] Worth calling out in the class docstring that this compactor can't be wired into ChatContext(compactor=...) directly. ChatContext.add() calls compact(new) without a backend, so once the body grows past keep_n every add() raises ValueError. The natural instinct after seeing ChatContext(compactor=WindowCompactor(...)) work is to try the same here, and the error message won't point you at the fix. A one-liner — "use via react(compactor=...) only" — would save a lot of confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@planetf1 how about adding a hard block, such as limiting the types of compactors ChatContext can directly take e.g. 1. BasicCompactor (does not require backend). 2. ThresholdCompactor (algorithmic compactor wraps LLMSummaryCompactor) ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's better than a docstring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I made the following changes to create a hard block at construction time:

  • Added InlineCompactor marker; ChatContext(compactor=...) rejects non-InlineCompactor instances with a TypeError pointing at react(compactor=...), ThresholdCompactor, or manual compact().
  • WindowCompactor and ThresholdCompactor inherit the marker; LLMSummarizeCompactor does not (would call backend every add()). ThresholdCompactor(LLMSummarizeCompactor(...)) is the recommended wiring — it gates by token usage, so backend calls fire sparsely rather than per-add().
  • LLMSummarizeCompactor now requires default_backend at construction; compact() falls back to it, with call-time backend= overriding.

from mellea.stdlib.context.compactor import Compactor


class ChatContext(Context):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] ChatContext(window_size=N) used to keep the full history in as_list() and only apply the window in view_for_generation(). Now as_list() itself is truncated to N items. For the model, nothing changes — but callers inspecting as_list() directly (e.g. session.py:376 uses it for interaction_count) will silently get a capped value. The PR description says "backwards compatible" which is true for the model's view, but not for as_list(). Worth a migration note in the docstring at minimum.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added better tracking of interaction counts, required small modifications in session.py

  • ChatContext docstring: added a Note flagging that as_list() now reflects post-compaction state (used to keep full history); points callers at out-of-band turn tracking.
  • MelleaSession: turned ctx into a property whose setter increments _interaction_count; reset() / __init__ bypass via _ctx so lifecycle events don't pollute the count. cleanup() publishes self._interaction_count instead of len(self.ctx.as_list()).
  • SessionCleanupPayload.interaction_count: docstring rewritten to "turns committed" semantics (was "items in context").

Comment thread mellea/stdlib/context/compactor.py Outdated
if len(body) <= self.keep_n:
return ctx

return _run_coro_blocking(self._async_compact(ctx, backend))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] If the backend call raises here (rate limit, network error, timeout), the exception propagates through _run_coro_blocking and kills the entire react loop. For a long-running research task, that's quite painful. Since compaction is best-effort by nature, wrapping this in a try/except, logging a warning, and returning ctx unchanged would be much more robust.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, compaction is wrapped in try/except and behaves as best effort by returning ctx and logs warning when backend fails, enabling the react loop to continue.

)


def _run_coro_blocking(coro): # type: ignore[no-untyped-def]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] When called from inside react() (which is async), this blocks the event loop thread for the full duration of the summary LLM call. Nothing else on the loop — telemetry flushers, cancellation signals, other sessions — can make progress during that time. It's documented as "fine for a serial ReACT loop", but worth noting that backends using per-loop resources (e.g. httpx.AsyncClient) may behave unexpectedly. A stronger docstring warning here, and ideally an async variant on the protocol long-term, would be the right direction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, beefed-up the warning message as you indicated.

Comment thread mellea/stdlib/context/compactor.py Outdated
elif isinstance(c, Message):
lines.append(f"{c.role}: {c.content}")
elif isinstance(c, ModelOutputThunk):
lines.append(f"assistant: {c.value}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] The rendering loop only handles c.content and c.value. Images, attached documents, and tool-call-only thunks (where value is None) are silently dropped — the latter render as "assistant: None". Users running multimodal or tool-heavy sessions will get incomplete summaries without any warning. Guarding with c.value or '' would at least fix the None case, and a docstring note about lossy summarisation for multimodal content would set expectations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Augmented the block with the following:

  • Message: append "[N image(s) attached]" / "[M document(s) attached]" markers; bytes not reproduced.
  • ModelOutputThunk: render "assistant called tools: name({args}), ..." for tool-call-only thunks (value=None); empty thunks (no value, no tool calls) are skipped entirely. Eliminates "assistant: None".
  • Catch-all else: "<TypeName: content>" or "<TypeName>" instead of the default object repr (e.g. ReactInitiator when not pinned).

lines.append(str(getattr(c, "content", c)))

prompt = self.prompt_template.format(conversation="\n".join(lines))
result, _ = await mfuncs.aact(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] The two equivalent mfuncs.aact() calls in react.py (lines 93 and 123) both pass silence_context_type_warning=True for internal framework calls. Without it here, every compaction fires a context-type-mismatch warning into the logs — potentially quite noisy with ThresholdCompactor triggering regularly in a long session.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added silence_context_type_warning=True parameter

a generic conversation-summary template.
"""

def __init__(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] There's no model_options parameter here, so the summary call uses the backend's default max_tokens. On many local backends that's 256–512, which will silently truncate a summary of a long conversation. react_summary_prompt(max_tokens_hint=N) adds a text nudge to the prompt but doesn't set the actual API parameter. Adding model_options: dict | None = None to the constructor and forwarding it to mfuncs.aact would let users enforce a real token budget.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_options added, it is a good idea, I have been actually using it in my separate BCP eval harness.

@yelkurdi

yelkurdi commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@planetf1 I have incorporated your feedback, I've also ran the tests and the full BCP deep search task tests (3 runs of each compaction method), all checks out.

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass after the latest fixes — most of the earlier thread was addressed cleanly, thanks. A few new things from this pass, plus one CI-blocker in the docs example.

Heads up — PR is currently in DIRTY merge state (conflicts against main). Worth a rebase before final review.

Findings inline; one separate top-level note on _run_coro_blocking as a follow-up suggestion (not reopening the resolved thread — your docstring update there is solid).

from mellea.stdlib.context.chat import _rebuild_chat_context


class TruncateOldest:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey — I think this example actually crashes when pattern_1_wired_into_context() runs: TruncateOldest doesn't inherit InlineCompactor, and the new guard in chat.py:52-58 raises TypeError on construction. The # pytest: unit annotation means it'll show up in the default CI run too, so worth catching before merge.

One-line fix:

from mellea.stdlib.context.compactor import InlineCompactor

class TruncateOldest(InlineCompactor):
    ...

Totally tangential thought, take it or leave it — the example imports _rebuild_chat_context and reads ctx._compactor, which is what WindowCompactor does internally too, so it's an honest reflection of reality. Just wondering whether it's worth promoting _rebuild_chat_context to public (rebuild_chat_context) so the "look how easy it is to write your own" framing holds together. Happy to file a follow-up if that's out of scope here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is fixed, docstring also elaborated, detailed the two patterns, pattern1 for simple custom InlineCompactor and pattern2 for general custom compactors.
I agree with you that _rebuild_chat_context would be promoted to public (rebuild_chat_context), however I would recommend taking that as a separate PR since it was that way in the original context.py file.


try:
return _run_coro_blocking(self._async_compact(ctx, backend))
except Exception as exc:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing on the except Exception here — it'll catch the network/rate-limit cases the docstring mentions, but it'll also quietly swallow TypeError/AttributeError from genuine bugs (e.g. a pin_predicate returning the wrong type, or a component shape change). The user sees the same "warning logged, context unchanged" outcome either way, which can hide a real contract violation for a long time.

Could be worth either narrowing to the actual transient classes, or keeping the broad catch but re-raising on (TypeError, AttributeError, AssertionError, KeyError) after logging. Either way the "best-effort, retries next call" promise still holds for the cases that promise actually targets.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the narrower re-raising on programming-error classes, also docstring updated

)

summary_message = Message(
role="user", content=f"[CONTEXT SUMMARY]\n{result.value or ''}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question on the role choice for the summary message — was role="user" deliberate? The content's coming back from the backend, but on the next turn the chat template will render it as "user: [CONTEXT SUMMARY] …" which feels like a small fib to the model.

I noticed react.py does the same thing for goal restatement, so it might just be the established convention here, in which case ignore me. Asking partly because I've seen smaller chat-tuned models (granite/llama 3B–8B range) occasionally treat a summary-under-user-role as a fresh instruction. role="system" or a marker on the next genuine user turn would dodge that — but only worth changing if the convention isn't already settled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that the role doesn't match the source. role="user" is deliberate though: the summary functions as user-supplied context for the next turn, same pattern as react.py's goal restatement. The [CONTEXT SUMMARY] content prefix is meant to be the disambiguator.
If you've seen smaller models conflate it with a fresh instruction in practice, I'd take that seriously — happy to sharpen the prefix (e.g., [CONTEXT SUMMARY — auto-generated, not a new user turn]) if that would help.
On extensibility: would you prefer exposed params (summary_prefix, summary_role, etc.) on LLMSummarizeCompactor, or is subclassing / wrapping with a custom compactor the right escape hatch? Leaning toward the latter to keep the default surface small, but happy either way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the code and I think this is fine — consistent with how the goal is surfaced as a user turn in react.py, and role="user" is the most portable choice across chat templates. Happy to leave it as-is.

Comment thread mellea/stdlib/context/chat.py Outdated
strategy, or ``window_size=`` as sugar for ``WindowCompactor(size=...)``.

Note:
Compaction is now applied at ``add()`` time and persists in the linked

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny doc nit, no rush: "Compaction is now applied at add() time…" reads like a release-note for me at the moment, but in six months "now" loses its anchor and just confuses a fresh reader. Could just be "Compaction is applied at add() time and persists in the linked list…" — same information, ages better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in d6a421d.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4247614, thanks.

@planetf1

planetf1 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Following up on your docstring update on _run_coro_blocking from earlier — wanted to flag this as a separate suggestion rather than reopening the old thread, since the docstring itself is solid.

Have you considered routing the coroutine through mellea.helpers.event_loop_helper._run_async_in_thread? It's mellea's shared sync↔async bridge — used by session.py, functional.py, mcp.py, tools.py, manager.py, ~15 call sites — and it dispatches onto the same event loop where backends create their httpx.AsyncClient.

The benefit is mostly for the sync-from-outside-loop case (someone calling compactor.compact(ctx, backend=…) from a plain script): routing through the shared loop sidesteps the "client bound to a different loop" failure your warning paragraph predicts. The inside-react case isn't fixed by either helper — both fall back to a fresh-loop child when called from the running loop — so your "async Compactor protocol long-term" call-out is still the real answer.

Totally fine as a follow-up issue if you'd rather not expand scope here.

yelkurdi and others added 15 commits June 1, 2026 19:10
Adds CompactionStrategy abstraction and KeepLastN implementation to
mellea/stdlib/compaction.py, wires an optional compaction parameter into
the react() loop, and adds full test coverage in test/stdlib/test_compaction.py.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Switches `CompactionStrategy.threshold` from a component-count trigger to a
token-count trigger, read from the most recent `ModelOutputThunk.usage`
populated by the backend. This aligns compaction with the real constraint
(context size) and sidesteps per-backend tokenizer dependencies by using
provider-reported usage; the trade-off is a one-turn lag since usage is
recorded at the end of each model call.

Also reorders the react loop so compaction runs after the final-answer
check, skipping wasted work (and a wasted LLM call for LLMSummarize) on
terminal turns.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Signed-off-by: ramon-astudillo <ramon@astudillo.com>
Move the compaction strategies alongside the react framework they serve:
- mellea/stdlib/compaction.py -> mellea/stdlib/frameworks/react_compaction.py
- test/stdlib/test_compaction.py -> test/stdlib/frameworks/test_react_compaction.py

Imports and module docstrings updated accordingly.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
The docstring quality gate (tooling/docs-autogen/audit_coverage.py
--quality --threshold 100) requires each documented symbol to have its
own Args/Returns sections — inheritance from the abstract parent is not
consulted. Six issues were reported against the compact() overrides on
ClearAll, KeepLastN, and LLMSummarize.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Replaces the original async ``react_compaction`` strategies (ClearAll,
KeepLastN, LLMSummarize) with a generic, sync ``Compactor`` protocol
that operates on any ``Context``. ``ReACT`` and ``ChatContext`` are
rewired around the new protocol; sample callers, tests, and docs are
updated.

Squash of 29 Mellea-side commits from
context_compaction_for_react_2; the BCP eval harness commits in that
branch are intentionally excluded.

mellea/stdlib/context/ becomes a package
  - Compactor protocol: sync ``compact(ctx, *, backend=None) -> Context``
  - WindowCompactor(size, pin_predicate)
      keep last-N body components; ``size=0`` clears the body and
      retains only the pinned prefix
  - ThresholdCompactor(inner, threshold)
      token-gated wrapper that reads cumulative context size from the
      most recent ModelOutputThunk's ``generation.usage`` and forwards
      to ``inner.compact`` only above the gate
  - LLMSummarizeCompactor(keep_n, pin_predicate, prompt_template)
      summarizes old body components via the backend; the (async)
      backend call is hidden behind a sync ``compact()`` via
      ``_run_coro_blocking`` so the protocol stays sync
  - PinPredicate API: ``pin_nothing``, ``pin_system``,
    ``pin_system_and_initial_user``; chat compactors compose freely

mellea/stdlib/frameworks/react.py
  - ``react()`` gains a ``compactor: Compactor | None = None`` per-turn
    hook; invoked once after each tool observation
  - The old ``react_compaction`` module is removed

mellea/stdlib/components/react.py
  - ``pin_react_initiator``: a PinPredicate that pins everything up to
    and including the first ``ReactInitiator``
  - ``react_summary_prompt(goal=None, max_tokens_hint=None)``: factory
    that returns a research-flavoured summary prompt template (with the
    {conversation} placeholder LLMSummarizeCompactor expects). Optional
    ``GOAL: <goal>`` line and optional ``- Be at most ~N tokens`` bullet
    when callers want goal anchoring or length-cap hints.

mellea/stdlib/context/chat.py
  - ``ChatContext()`` defaults to no compactor (full history); pass
    ``compactor=`` or ``window_size=`` for opt-in compaction. Matches
    upstream main's window_size=None unbounded semantics.

Test coverage
  - test/stdlib/test_compactor.py (~500 LOC): protocol semantics;
    Window / Threshold / LLMSummarize behaviours; pin-predicate edge
    cases; ``size=0`` collapse; threshold gate edge cases
  - test/stdlib/frameworks/test_react_framework.py (~210 LOC):
    react() per-turn hook integration + react_summary_prompt
    (default, goal interpolation, brace escaping, max_tokens_hint
    bullet ordering, LLMSummarizeCompactor template-validation)
  - test/stdlib/test_base_context.py: pin-non-compacting ChatContext
    in the session-copy operations test (matches new opt-in default)

Net diff: 17 files, +381 / -896 lines (drops the old
react_compaction.py and its dedicated test file).

Backwards-compatible default behaviour preserved: bare ``ChatContext()``
retains full history; ``react()`` without ``compactor=`` behaves
identically to today; ``LLMSummarizeCompactor`` defaults to a generic
conversation-summary prompt unless callers opt in to the
research-flavoured variant via ``react_summary_prompt``.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
…ples

- Correct comment in test_base_context.py: ChatContext defaults to
  compactor=None, not WindowCompactor(5).
- Point compactor.py module docstring to docs/examples/context/ instead
  of the nonexistent docs/rewrite/.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
…n LLMSummarize

Restrict ChatContext(compactor=...) to compactors that inherit InlineCompactor.
Wiring a backend-requiring compactor (e.g. LLMSummarizeCompactor) directly
would invoke the backend on every add(); the new isinstance guard rejects
that with a TypeError pointing at react(compactor=...), ThresholdCompactor,
or manual compact() as alternatives. ThresholdCompactor remains accepted
regardless of its inner -- it gates by token usage, so backend calls are
sparse rather than per-add.

LLMSummarizeCompactor now takes a required default_backend at construction
and falls back to it when compact() is invoked without an explicit backend.
A backend kwarg passed to compact() still overrides the default for that
call. This makes ThresholdCompactor(LLMSummarizeCompactor(...)) work
end-to-end when attached to ChatContext: at trip time, the inner uses its
stored default_backend.

InlineCompactor carries the compact() signature (raising NotImplementedError)
so it's a usable static type without cast() workarounds. Specialized to
ChatContext rather than parameterized over Context -- ThresholdCompactor's
prior generic-T signature was unexercised, so the simpler shape applies.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
…antic shift

Reviewer flagged that ChatContext(window_size=N) used to keep full history
in as_list() and only window view_for_generation(); the per-turn compactor
work made as_list() itself reflect the post-compaction state, which silently
undercounted in MelleaSession.cleanup() (interaction_count = len(as_list())).

- ChatContext docstring: add a Note describing the semantic shift and
  pointing callers at out-of-band turn tracking when full counts matter.
- MelleaSession: turn ctx into a property; the setter increments
  _interaction_count, and reset() / __init__ bypass via _ctx so lifecycle
  events don't pollute the count. cleanup() now publishes
  self._interaction_count, stable under any compaction strategy.
- SessionCleanupPayload: rewrite the interaction_count field doc to
  match the new semantics ("turns committed" rather than "items in
  context").

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer flagged that an exception from the summarisation backend call
(rate limit, network error, timeout) propagates through _run_coro_blocking
and kills the entire react loop. For long-running research tasks that's
quite painful, especially since compaction is best-effort by nature.

LLMSummarizeCompactor.compact() now wraps _run_coro_blocking in a
try/except Exception. On failure it logs a WARNING via MelleaLogger with
the exception type and message, then returns ctx unchanged. The next
compact() invocation retries; the conversation keeps growing in the
meantime. BaseException (KeyboardInterrupt, SystemExit) still propagates
so users can interrupt a stuck loop.

Added a regression test with a backend that raises RuntimeError on every
call: compact() returns the same ctx, original history is intact, and a
warning naming the exception type is logged.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer flagged that _run_coro_blocking, when invoked from inside an async
caller like react(), blocks the entire event loop for the full duration of
the wrapped coroutine — not just the calling task. The previous "fine for a
serial ReACT loop" wording undersold the implications.

Beefed-up Warning: block now spells out:
- The loop, not just the thread, is stalled — callbacks, telemetry,
  cancellation signals, other sessions sharing the loop, keepalives are
  all blocked.
- Backends with per-loop resources (notably httpx.AsyncClient) may behave
  unexpectedly because the coroutine runs on a fresh loop in a worker
  thread; documents the typical failure signatures.
- Long-term direction is an async variant on the Compactor protocol so
  callers can await natively.

Docs-only; no behavior change.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer flagged silent drops when rendering the slice fed to the
summariser:

- Message: append "[N image(s) attached]" / "[M document(s) attached]"
  markers; bytes not reproduced.
- ModelOutputThunk: render "assistant called tools: name({args}), ..."
  for tool-call-only thunks (value=None) and "assistant: <empty>" for
  empty thunks. Eliminates "assistant: None".
- Catch-all else: "<TypeName: content>" or "<TypeName>" instead of the
  default object repr (e.g. ReactInitiator when not pinned).

Docstring gains a Note: that summaries are text-only and lossy for
multimodal / heavy-tool sessions. Tests cover each branch.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
…ng "<empty>"

A "<empty>" marker for thunks with neither value nor tool_calls tended to
leak into the resulting summary verbatim. These turns carry no information
worth summarising, so drop them from the rendered slice entirely. Test
updated to assert no line is emitted for an empty thunk while neighbouring
turns still come through.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
…call

Reviewer flagged that aact's context-type warning could be noisy under
ThresholdCompactor-driven repeated compaction. Match react.py's pattern of
setting silence_context_type_warning=True on internal framework calls so
the warning stays quiet if the context argument is later changed to a
non-SimpleContext, and to self-document the intent.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer flagged that the summary call uses the backend's default
max_tokens (often 256-512 on local backends), silently truncating long
summaries. react_summary_prompt(max_tokens_hint=N) is only a soft prompt-
side nudge, not a real API parameter.

Add model_options: dict | None to the constructor and forward it to
mfuncs.aact so callers can set a hard token budget (or any other backend
option). Default None preserves existing behaviour. Tests cover both
forwarded and default paths.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
yelkurdi added 3 commits June 3, 2026 23:33
The body slice was only consumed by len() before being discarded
(_async_compact recomputes its own body). Replace with body_len
arithmetic, matching the existing pattern in WindowCompactor.compact.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
LLMSummarizeCompactor.compact() and _async_compact were independently
calling ctx.as_list() and self.pin_predicate(full). Thread the values
through instead so the async core skips the duplicate work.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Replace the while-with-index pattern in pin_system() with a cleaner
enumerate-based early-return loop. Behavior unchanged.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review — one concrete fix needed (TruncateOldest prefix), plus a few nits.

# Per-turn compaction hook (terminal turns skip this since `is_final` returned).
if compactor is not None:
context = compactor.compact(context, backend=backend)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since compact's return type is T bound to Context, a custom compactor that returns e.g. SimpleContext type-checks fine but then breaks the ChatContext assumptions the next aact call relies on. Worth noting in the compactor= docstring that the compactor must return a ChatContext.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — added a paragraph to the compactor= docstring noting that the Compactor protocol is generic in T bound to Context, so a compactor returning a non-chat Context (e.g. SimpleContext) type-checks fine but breaks the ChatContext assumptions the next aact call relies on. Custom compactors used with react must return a ChatContext.

items = ctx.as_list()
if len(items) <= 1:
return ctx
return _rebuild_chat_context(items[1:], compactor=ctx._compactor)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat worth flagging here: items[1:] drops element 0 unconditionally, so a leading system message gets evicted on the first add(). That's fine for this example since TruncateOldest is deliberately the smallest possible compactor, but it's worth a one-line note pointing readers at WindowCompactor / pin_system if they want the pinned-system variant rather than reaching for this as a starting point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added a Note to TruncateOldest's docstring spelling out that items[1:] evicts a leading system message on the first compaction, with a pointer to WindowCompactor + pin_system for the keep-the-system-prompt variant. Kept the body of the example as-is since the smallest-possible shape is the point.

Comment thread mellea/stdlib/context/compactor.py Outdated
mutate the input context; they must return a fresh instance and copy over
any data that should be preserved.

The protocol is generic in `T` (a `Context` subtype) so concrete

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "a chat-only compactor declares T = ChatContext" (line 120) is the misleading bit — there's no T = declaration anywhere. T is a per-call TypeVar on compact(), so a concrete chat compactor just annotates the override as compact(self, ctx: ChatContext) -> ChatContext. Could we reword that sentence to describe the override annotation rather than a class-level T assignment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded. T is a per-call TypeVar on compact(), so the docstring now describes the override annotation directly: a chat-only compactor overrides as def compact(self, ctx: ChatContext, *, backend=None) -> ChatContext. No misleading "T = ChatContext" framing.

self, ctx: ChatContext, *, backend: Backend | None = None
) -> ChatContext:
"""Subclasses must override this with their concrete strategy."""
raise NotImplementedError("InlineCompactor subclasses must implement compact()")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: compact raises NotImplementedError but the docstring has no Raises: entry — minor gap against the project convention.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Raises: NotImplementedError entry to InlineCompactor.compact()'s docstring noting that the marker base always raises and subclasses are expected to override.

yelkurdi added 5 commits June 4, 2026 12:51
The Compactor protocol is generic in T bound to Context, so a custom
compactor returning a non-chat Context (e.g. SimpleContext) type-checks
at the react() call site but breaks the ChatContext assumptions the
next aact call relies on. Document the constraint on the compactor=
parameter so callers don't get surprised at runtime.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer pointed out that items[1:] drops index 0 unconditionally, so
a leading system message gets evicted on first compaction. Add a Note
to the class docstring flagging the caveat and pointing readers at
WindowCompactor + pin_system for the keep-the-system-prompt variant.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer pointed out there's no class-level T = declaration — T is a
per-call TypeVar on compact(). Reword to describe the override
annotation directly: chat-only compactors override compact() with a
ChatContext signature.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Reviewer flagged the docstring gap: compact() raises NotImplementedError
unconditionally on the marker base class, but the docstring had no
Raises: section. Add one noting the always-raises behavior and that
subclasses are expected to override.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
The build-and-validate CI job's --fail-on-quality threshold flagged 8
issues across pin_nothing, pin_system, pin_system_and_initial_user, and
InlineCompactor.compact (missing Args + Returns). Add the sections.
Also collapse a multi-line _run_coro_blocking call back to one line so
ruff format passes.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Given the recent discussions in #1099 I'll defer to @nrfulton for final say, but I'm happy with the changes.

@planetf1 planetf1 self-requested a review June 11, 2026 15:19

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markstur

Copy link
Copy Markdown
Contributor

Thanks for addressing or replying to everything. I'll accept your explanation for the event loop blocking concern, but I think we need a follow-up issue and need Nathan's approval.

@yelkurdi

yelkurdi commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @planetf1 and @markstur , the follow issue is created to handle the loop blocking concern: #1258

@planetf1

Copy link
Copy Markdown
Contributor

Fix looks good, but can we rebase to current code and ensure no conflict issues.

@psschwei psschwei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing against the summary of our previous discussion in #1099, in my opinion the PR conforms to that design. There's a couple of things we need to do before we can merge:

  • rebase / resolve the merge conflicts
  • update the PR description to match the new design

Resolve conflicts in session cleanup payload: keep out-of-band
interaction_count tracking (turns committed) from this branch, and
adopt session_id + exception fields added upstream.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
@psschwei

Copy link
Copy Markdown
Member

DCO check is failing on this commit: 1e28704

image

@ramon-astudillo I think this is yours.

@ramon-astudillo

Copy link
Copy Markdown

This commit will get squashed anyway, but let me check with @yelkurdi

@yelkurdi

yelkurdi commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@psschwei I resolved the conflicts and now I'm in the process of updating the PR description.
Regarding the DCO error, we manually resolved that in the past. I can fix it by rebasing, or squashing, will handle it.

@yelkurdi yelkurdi force-pushed the context_compaction_for_react branch from 5b01e1e to 6daa892 Compare June 23, 2026 22:28
@yelkurdi

Copy link
Copy Markdown
Contributor Author

PR description updated.

@yelkurdi yelkurdi requested a review from planetf1 June 23, 2026 23:34
Comment thread test/stdlib/test_compactor.py Outdated
mot._generate_log = GenerateLog(is_final_result=True)
return mot, ctx.add(action).add(mot)

async def generate_from_raw(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Backend ABC was updated in #1264 — the abstract method is now _generate_from_raw (leading underscore); the public generate_from_raw is a concrete wrapper that fires hooks. These fakes override the wrapper but leave the abstract method unimplemented, so FakeBackend() raises TypeError: Can't instantiate abstract class at test setup and the whole TestLLMSummarizeCompactor class errors out. Same issue in OtherBackend, BrokenBackend, BuggyBackend, InterruptingBackend, and ScriptedBackend in react_compaction.py — a rename to _generate_from_raw in all six fixes it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching that, corrected.


# Per-turn compaction hook (terminal turns skip this since `is_final` returned).
if compactor is not None:
context = compactor.compact(context, backend=backend)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aact calls a few lines up both assert isinstance(next_context, ChatContext) to catch a wrong return type early — it'd be worth the same guard here, since the compact() docstring notes a non-ChatContext return would break the loop but nothing currently enforces it:

if compactor is not None:
    context = compactor.compact(context, backend=backend)
    assert isinstance(context, ChatContext), (
        "react() compactor must return a ChatContext"
    )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check added

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third pass — the design work here is solid and the earlier findings are all cleaned up. One thing that needs fixing before merge: six fake Backend subclasses (five in test_compactor.py, one in react_compaction.py) override generate_from_raw instead of the abstract _generate_from_raw that #1264 introduced. The whole TestLLMSummarizeCompactor class errors at fixture setup as a result — details in the inline comment. A rename in each fake unblocks it.

@planetf1 planetf1 dismissed their stale review June 24, 2026 10:02

CI is now running and the failing quality check is the active gate — no need for a separate review block on top of it.

yelkurdi added 2 commits June 24, 2026 16:35
The Backend ABC's abstract method was renamed to _generate_from_raw
(leading underscore) in generative-computing#1264; the public generate_from_raw is now a
concrete hook-firing wrapper. The fakes in test_compactor.py overrode
the wrapper but left the abstract method unimplemented, so each backend
raised TypeError on instantiation.

Rename generate_from_raw -> _generate_from_raw in FakeBackend,
OtherBackend, BrokenBackend, BuggyBackend, and InterruptingBackend.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Mirror the guard on the aact next_context returns: the LLMSummarizeCompactor
docstring notes a non-ChatContext return would break the loop, but nothing
enforced it. Add an isinstance assert after compact() to fail fast.

Assisted-by: Claude Code
Signed-off-by: Yousef El-Kurdi <yelkurdi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants