Skip to content

Derive pagination LIMIT/OFFSET from a SQL parser#286

Open
debba wants to merge 4 commits into
mainfrom
feat/pagination-sql-parser
Open

Derive pagination LIMIT/OFFSET from a SQL parser#286
debba wants to merge 4 commits into
mainfrom
feat/pagination-sql-parser

Conversation

@debba

@debba debba commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Builds on #275.

Background

PR #275 fixed the grid pagination rewriter to fold the user's OFFSET into the
per-page offset, but it did so by extending the hand-rolled token scanner in
src-tauri/src/drivers/common/query.rs, which only recognises the trailing
… LIMIT <n> OFFSET <n> shape.

That heuristic is fragile. It does not understand:

  • MySQL's LIMIT <offset>, <count> syntax
  • OFFSET before LIMIT (valid in Postgres)
  • numeric expressions / placeholders

In those cases the values are read wrong (or dropped) and the stripped base can be
inconsistent with what was extracted, producing a malformed appended query.

The change

  • Add the sqlparser crate (v0.62) and read Query.limit_clause from the AST,
    parsed with the correct per-driver dialect (PaginationDialect::{MySql,Postgres,Sqlite}
    threaded through build_paginated_query). MySQL's LIMIT <offset>, <count> is
    normalised to the same (limit, offset) shape.
  • Strip the trailing clause at its LIMIT/OFFSET keyword (reusing the existing
    position-aware tokenizer, which collapses parenthesised subqueries so inner
    LIMITs are never touched), consistent with what the parser saw.
  • Render the new pagination clause from a LimitClause AST node and concatenate it
    to the verbatim sliced base, so leading comments, inline /*+ hints */, and
    the body's formatting are preserved (no full-query reserialization).
  • Keep the original token scanner as a fallback for inputs the parser rejects,
    so behaviour never regresses. FETCH FIRST … ROWS is out of scope and defers to
    the fallback.

Tests

Existing build_paginated_query tests updated to pass a dialect; new cases added for
the MySQL comma form (pages 1 & 2), OFFSET before LIMIT, backtick identifiers,
inline-hint preservation, and the parse-error fallback. cargo test drivers::common
— 63 passed.

debba added 3 commits June 4, 2026 14:18
The grid pagination rewriter read the user's LIMIT/OFFSET with a
hand-rolled token scanner that only recognised the trailing
`LIMIT <n> OFFSET <n>` shape. It mis-handled MySQL's
`LIMIT <offset>, <count>`, `OFFSET` before `LIMIT`, and numeric
expressions, producing wrong values or a malformed appended query.

Parse the query with sqlparser using the driver's dialect and read
LIMIT/OFFSET from the AST instead, normalising the MySQL comma form.
The trailing clause is stripped at its keyword (consistent with what
the parser saw) and the new pagination clause is rendered from a
LimitClause AST node, then concatenated to the verbatim sliced base so
leading comments, inline hints, and formatting are preserved. The
token scanner is kept as a fallback for inputs the parser rejects, so
behaviour never regresses. FETCH FIRST ... ROWS is out of scope and
defers to the fallback.

Builds on #275.
@kilo-code-bot

kilo-code-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (21 files)
  • src-tauri/Cargo.lock - dependency additions (sqlparser, recursive, stacker, etc.) and version bump to 0.13.0
  • src-tauri/Cargo.toml - adds sqlparser = "0.62" dependency
  • src-tauri/src/drivers/common.rs - re-exports new symbols
  • src-tauri/src/drivers/common/query.rs - AST-based pagination rewriter with dialect-aware parsing and fallback heuristics
  • src-tauri/src/drivers/common/tests.rs - updated existing tests + new coverage for MySQL comma syntax, offset-before-limit, backticks, hints, and parse-error fallback
  • src-tauri/src/drivers/mysql/mod.rs - wires MySQL dialect and error annotation
  • src-tauri/src/drivers/postgres/mod.rs - wires Postgres dialect and error annotation
  • src-tauri/src/drivers/sqlite/mod.rs - wires SQLite dialect and error annotation
  • src/components/notebook/SqlCell.tsx - passes originalQuery to result component
  • src/components/notebook/SqlCellResult.tsx - accepts and forwards originalQuery
  • src/components/ui/ErrorDisplay.tsx - displays original/executed query via collapsible panels
  • src/components/ui/ResultEntryContent.tsx - passes entry.query to ErrorDisplay
  • src/pages/Editor.tsx - passes activeTab.query to ErrorDisplay
  • src/i18n/locales/{de,en,es,fr,it,ja,ru,zh}.json - adds show/hide query translation keys

Reviewed by kimi-k2.6-20260420 · 174,393 tokens

@debba debba requested a review from NewtTheWolf June 8, 2026 17:31

@NewtTheWolf NewtTheWolf left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated the parser-based rewriter end-to-end against a seeded perf_demo (50k rows, MySQL + Postgres). The headline cases are solid — MySQL LIMIT <off>, <cnt>, OFFSET before LIMIT, backtick identifiers and inline leading hints all paginate correctly across pages. 👍

But I hit four issues where the rewriter mishandles inputs — three confirmed directly via the new "executed query" the UI shows. Requesting changes for the two correctness ones (CTE + the strip desync); the FETCH item is smaller.

Confirmed at runtime (executed query in brackets):

  1. CTEs are never paginated. parse_pagination parses the whole statement and matches Statement::Query, which already includes WITH … SELECT. But the call-site gate is still is_select_query() — a starts_with("SELECT") string check — so a CTE falls through to the non-paginated branch and is hard-truncated at page_size with no paging controls. The new parser could handle it; the gate blocks it. (all three drivers)

  2. strip_at_limit_keyword desyncs from the parser for non-numeric clauses. When the parser reports has_limit_clause = true but the clause isn't plain digits, both the token heuristic and the strip_limit_offset fallback fail to strip it, so pagination is appended to the un-stripped query:

    • … LIMIT ALL… LIMIT ALL LIMIT 501 OFFSET 0syntax error at or near "LIMIT"
    • … LIMIT 2000 -- note… LIMIT 2000 -- note LIMIT 501 OFFSET 0 → the -- comments out the appended pagination, which is then silently dropped (the same query without the comment strips cleanly to … LIMIT 501 OFFSET 0).

    Root cause: once the parser has recognized the clause, stripping re-tokenizes heuristically instead of cutting at the span the parser already computed.

  3. FETCH FIRST … ROWS fallback produces a mixed clause. … FETCH FIRST 5000 ROWS ONLY… FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0 → DB error. Out of scope per the PR — but the inline comment claims it defers "rather than producing a mixed clause", and it does produce one. (suggestion inline)

Recommendations

  • (1) Derive paginatability from the parse, not the string prefix: paginate iff the query parses to a single Statement::Query — cleanly includes CTEs/VALUES, excludes SHOW/EXPLAIN/DDL. (Don't widen to returns_result_set, which would try to paginate SHOW/EXPLAIN and append an invalid LIMIT.)
  • (2) When has_limit_clause is true, cut at the clause's source span from the parse instead of re-scanning tokens, so LIMIT ALL, trailing comments, etc. can't desync.

Details inline.

Comment thread src-tauri/src/drivers/common/query.rs
Comment thread src-tauri/src/drivers/common/query.rs Outdated
Comment thread src-tauri/src/drivers/postgres/mod.rs
… span

Addresses review feedback on #286:

- Paginate iff the query parses to a single Statement::Query instead of
  a starts_with("SELECT") check, so CTEs (WITH ... SELECT) and VALUES
  reach the parser-based rewriter while SHOW/EXPLAIN/DDL stay out. The
  prefix check remains as fallback when the parser rejects the input.
- Strip a parsed LIMIT/OFFSET clause by cutting at its AST source span
  rather than re-scanning tokens, so clauses the heuristics don't
  recognise (LIMIT ALL, trailing comments, non-literal expressions)
  can no longer desync and produce mixed clauses or silently dropped
  pagination. A bare LIMIT ALL is folded into "no clause" by sqlparser
  and is located textually; queries with locking clauses defer to the
  fallback so FOR UPDATE is never silently dropped.
- Make the heuristic tokenizer comment-aware so the fallback strip and
  extract scans cannot be shielded by trailing comments either.
- Fix the misleading FETCH comment: the fallback does produce a mixed
  clause for FETCH FIRST queries.

@NewtTheWolf NewtTheWolf left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed 3ccec110..d96ec81 — both correctness blockers resolved, excellent work @debba.

CTEs (#1)is_paginatable_query now derives paginatability from the parse (single Statement::Query, fallback to the prefix check), and all three drivers gate on it. Verified on live Postgres: WITH recent AS (…) SELECT * FROM recent now paginates instead of being hard-truncated.

Strip desync (#2) — cutting at the clause's AST source span (clause_start) instead of re-tokenizing fixes LIMIT ALL, trailing --//* */ comments, and expression limits. Confirmed the old … LIMIT ALL LIMIT 101 OFFSET 0 output errored on PG and the new … LIMIT 101 OFFSET 0 is clean. The LIMIT ALL textual special-case (parser folds it away) and the comment-skipping tokenizer are nicely handled.

👍 Bonus: deferring FOR UPDATE/SHARE to the fallback so a span-cut can't silently drop a lock.

drivers::common 79/79 green. The corrected comment on the FETCH FIRST mixed-clause is the honest call — one optional follow-up: for FETCH/locked queries it'd be strictly safer to skip auto-pagination than to append a clause the DB rejects. Non-blocking. LGTM 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants