Skip to content

feat: model aware sliding window context#1270

Merged
AngeloDanducci merged 11 commits into
generative-computing:mainfrom
AngeloDanducci:ad-108
Jun 24, 2026
Merged

feat: model aware sliding window context#1270
AngeloDanducci merged 11 commits into
generative-computing:mainfrom
AngeloDanducci:ad-108

Conversation

@AngeloDanducci

@AngeloDanducci AngeloDanducci commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #108

Description

Adds a model aware sliding context window based on the issues request for such.

Note: HF_SMOLLM3_3B_no_ollama: ollama_name="" was changed to ollama_name=None — this is safe because _build_table already guards if name: before indexing, so the empty string was silently excluded from the lookup table anyway.

Behavioral change: reset() now preserves model_id, window_size, and token_context_length_limit on the new context. Previously it returned a bare ChatContext() with no configuration. Callers that relied on a config-free context after reset will need to set those fields explicitly if needed.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

@AngeloDanducci AngeloDanducci requested a review from a team as a code owner June 15, 2026 21:54
@github-actions github-actions Bot added the enhancement New feature or request label Jun 15, 2026

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback from Claude.

Blocking — feature does not implement token-aware truncation

ModelIdentifier.context_length is in tokens. ChatContext.window_size is a count of context items (CBlocks/Components). view_for_generation() passes the token count straight into as_list(last_n_components=...) as an item count.

The new docstring at context.py lines 34–42 candidly admits the ceiling is "never reached" because real conversations don't accumulate 131,072 items. So for the default path (no explicit window_size), this PR is a no-op — full history is always returned, just like before.

#108 asks for a sliding window that moves when the token budget would be exceeded. To do that the implementation needs to estimate per-item token counts (likely via the bound backend's tokenizer) and walk history popping oldest items until the running sum fits under context_length.

Two ways forward, either is fine:

  1. Implement the token-aware truncation here.
  2. Re-scope this PR to "context-length metadata + binding hook," explicitly defer truncation to a follow-up issue, and update the title / docstring so it doesn't read as if truncation already happens.

Comment thread mellea/stdlib/context.py Outdated
Comment thread mellea/stdlib/context.py
Comment thread mellea/backends/context_lengths.py Outdated
Comment thread mellea/backends/model_ids.py
Comment thread mellea/stdlib/session.py Outdated
@AngeloDanducci

Copy link
Copy Markdown
Contributor Author

Thanks for the review - should be fixed, was missing a batch of staged commits but I also incorporated your review into the newest changeset.

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session.py:377reset() docstring (lines not in diff hunk, so can't inline):

The docstring still says "replaces self.ctx with the result of ctx.reset_to_new()" but ChatContext now goes through new_instance(). Suggested update:

    def reset(self) -> None:
        """Reset the context state to a fresh, empty context of the same type.

        Fires the `SESSION_RESET` plugin hook if any plugins are registered, then
        replaces `self.ctx` with a fresh empty context, discarding all accumulated
        conversation history. For `ChatContext`, uses `new_instance()` so the
        `model_id` and `window_size` bindings are preserved; for all other context
        types, uses `reset_to_new()`.
        """

Comment thread mellea/stdlib/context.py
Comment thread mellea/stdlib/session.py Outdated
Comment thread mellea/stdlib/context.py Outdated
Comment thread mellea/backends/model_ids.py Outdated
Comment thread mellea/backends/context_lengths.py

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the blocker from my previous review is addressed. _as_list_token_budget actually walks history now and the docstring is clear about token-vs-item semantics.

A few small nits below, plus +1 to Nigel's open comments — headroom (packing to 100% of context_length will overflow once the action + response are appended), hasattrgetattr(..., None) is not None, debug log on truncation, the collision guard in _build_table, the # 8B+ comment on Qwen3, the str()-falls-back-to-repr note, and the reset() docstring update. Those are all real and worth taking before merge.

Two test-coverage gaps worth filling:

  • Boundary test for _as_list_token_budget where history fits exactly at token_budget — locks the > vs >= choice in the truncation condition (currently if spent + cost > token_budget: break, which correctly allows equality).
  • If Nigel's _build_table collision guard lands, add a test that two ModelIdentifiers sharing a name with mismatched context_length raises.

One thing still outstanding from the previous round: please add a one-line callout in the PR description for the ollama_name="" → None change in HF_SMOLLM3_3B_no_ollama — it's incidental to the title but reviewers shouldn't have to grep to confirm it's safe.

Comment thread mellea/backends/context_lengths.py
Comment thread mellea/stdlib/context.py
@ajbozarth

Copy link
Copy Markdown
Contributor

Also might be worth noting that #1264 adds a Backend._model_id string value that may or may not be useful in this change, I'm hoping to merge that PR later today.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review. #1 (Ollama truncation) is the one real blocker — silent correctness gap. The rest are worth considering but not blocking.

Comment thread mellea/backends/context_lengths.py
Comment thread mellea/stdlib/context.py Outdated
Comment thread mellea/stdlib/session.py
Comment thread mellea/stdlib/context.py
Comment thread test/stdlib/test_base_context.py Outdated
Comment thread mellea/stdlib/context.py Outdated
Comment thread mellea/stdlib/context.py

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good — all the blockers from the previous round are addressed, plus the bonus new_instance() move to the base class is a nice cleanup. One small doc-accuracy nit inline.

Comment thread mellea/stdlib/context.py

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - consider alex's nit

@AngeloDanducci AngeloDanducci requested a review from ajbozarth June 17, 2026 19:00
Comment thread mellea/stdlib/context.py
Comment thread mellea/stdlib/context.py Outdated
Comment thread mellea/stdlib/context.py Outdated
Comment on lines +170 to +183
def _as_list_token_budget(self, token_budget: int) -> list[Component | CBlock]:
"""Return history items that fit within *token_budget*, dropping oldest first.

Walks the linked list from newest to oldest, accumulating items until
adding the next item would exceed the budget. The returned list is in
chronological order (oldest-first), matching `as_list` behaviour.
Token count per item is estimated as `len(str(item)) // 4`; note that
`str()` falls back to `repr()` for `Component` subclasses, so the
estimate reflects repr boilerplate rather than rendered content.

A headroom factor of 0.55 is applied to absorb repr-vs-render skew and
to leave capacity for the current action and the model's generated
response. This is a conservative approximation; a tokenizer-backed
estimate is a known follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good estimate. I have a few issues:

  1. we aren't accounting for default system prompts. I'm fine if we want to leave that in the wiggle room / headroom factor. However, we could:
    a. store known lengths of system prompts (this helps but isn't 100% accurate because of tools / docs that might get inserted).
    b. use a "dead-reckoning" system for backends that return prompt token lengths
    c. and/or, include these sources of variability in the note
  2. A factor of .55 seems like it might be off by a large margin. I'm not sure that it's within the helpful range.
  3. You should probably be calling the TemplateFormatter class to get the exact word count of the stringified object. This template formatter is actually tied to the model id so you should be able to either create a new one (or grab it from a backend, depending on where this actually gets called).

Comment thread mellea/stdlib/session.py Outdated

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-checked after Jake's round. All four of his points are addressed cleanly and tests/lint are green — the previous approval still stands. One small note inline.

Comment thread mellea/stdlib/context.py Outdated
while not current.is_root_node:
item = current.node_data
assert item is not None
cost = max(1, len(formatter.print(item)) // 4)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor behavior shift worth noting: formatter.print(item) raises ValueError("could not find template candidate...") for Components without a registered template, where the previous str(item) always returned a repr fallback. In practice the same item would fail at actual generation anyway (same formatter), so this is a latency-of-failure change rather than a new failure mode — but a try/except ValueError here that falls back to len(str(item)) // 4 would keep view_for_generation() from blowing up earlier than the rest of the pipeline. Not blocking; flagging in case you want to tighten it.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
Comment thread mellea/stdlib/context.py
Comment on lines +65 to +68
self,
*,
window_size: int | None = None,
model_id: str | ModelIdentifier | None = None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow passing in an arbitrary token context length limit? How would a user currently set an ad hoc limit? or even to override a given limit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the argument, something like this?

1. window_size (item count) — wins if set
2. token_limit (explicit token limit) — wins over model-derived
3. Model-derived limit via get_context_length
4. No limit

I don't think anything is currently exposed to allow adjustment of ad hoc limits or overrides.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will approve the PR as is; but can you either open a follow-up PR or create an issue for this feature / functionality, @AngeloDanducci?

Comment thread mellea/stdlib/context.py Outdated
fits within `context_length`. Set `window_size` explicitly to enforce
an item-count limit instead of a token budget.

Per-item token count is estimated as ``len(rendered) // 4`` where

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should've commented this before as well. I didn't quite realize what the //4 heuristic was doing. I think we should explain the choice here that it's saying 1 token == 4 characters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes inline. One broader item: docs/docs/concepts/context-and-sessions.md still describes only window_size and frames context overflow as unmitigated — a short paragraph covering the new model-aware auto-sizing path and the Ollama num_ctx caveat would round it out.

Comment thread mellea/stdlib/context.py Outdated
prev = current.previous_node
assert prev is not None
current = prev
dropped = total - len(collected)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped here will always be 0 or 1, regardless of how many items were actually truncated — total only increments for items examined before the break, so when the loop exits total == len(collected) + 1 at most. A history of 7 items truncated to 3 logs "dropped 1 item(s)" rather than 4.

A clean fix: walk the full chain once up front, then compute the real count at the end:

# before the while loop
chain_length = 0
node = self
while not node.is_root_node:
    chain_length += 1
    node = node.previous_node  # type: ignore[assignment]

# replace the dropped = … line
dropped = chain_length - len(collected)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, should be accurate now.

Comment thread mellea/stdlib/context.py Outdated
current: Context = self
while not current.is_root_node:
item = current.node_data
assert item is not None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is silently stripped by python -O. A corrupted chain would then produce a confusing downstream error rather than a clear diagnostic here. A guard would be safer:

Suggested change
assert item is not None
if item is None: # pragma: no cover
raise RuntimeError(
"Malformed context chain: node_data is None at a non-root node"
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment thread mellea/stdlib/context.py Outdated
collected.append(item)
spent += cost
prev = current.previous_node
assert prev is not None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — assert is a no-op under python -O:

Suggested change
assert prev is not None
if prev is None: # pragma: no cover
raise RuntimeError(
"Malformed context chain: previous_node is None at a non-root node"
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment thread mellea/stdlib/session.py
invoke_hook(HookType.SESSION_RESET, payload, backend=self.backend)
)
self.ctx = self.ctx.reset_to_new()
self.ctx = self.ctx.new_instance()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring covers the new behaviour, but worth a note in the PR description too: reset() previously returned a bare ChatContext() with no config; it now preserves model_id, window_size, and token_context_length_limit. Callers who relied on getting a config-free context after reset will see different behaviour silently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description to PR.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
@AngeloDanducci AngeloDanducci added this pull request to the merge queue Jun 24, 2026
Merged via the queue into generative-computing:main with commit 878c98d Jun 24, 2026
9 checks passed
@AngeloDanducci AngeloDanducci deleted the ad-108 branch June 24, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model-aware sliding window context

4 participants