Derive pagination LIMIT/OFFSET from a SQL parser#286
Conversation
The grid pagination rewriter read the user's LIMIT/OFFSET with a hand-rolled token scanner that only recognised the trailing `LIMIT <n> OFFSET <n>` shape. It mis-handled MySQL's `LIMIT <offset>, <count>`, `OFFSET` before `LIMIT`, and numeric expressions, producing wrong values or a malformed appended query. Parse the query with sqlparser using the driver's dialect and read LIMIT/OFFSET from the AST instead, normalising the MySQL comma form. The trailing clause is stripped at its keyword (consistent with what the parser saw) and the new pagination clause is rendered from a LimitClause AST node, then concatenated to the verbatim sliced base so leading comments, inline hints, and formatting are preserved. The token scanner is kept as a fallback for inputs the parser rejects, so behaviour never regresses. FETCH FIRST ... ROWS is out of scope and defers to the fallback. Builds on #275.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (21 files)
Reviewed by kimi-k2.6-20260420 · 174,393 tokens |
NewtTheWolf
left a comment
There was a problem hiding this comment.
Validated the parser-based rewriter end-to-end against a seeded perf_demo (50k rows, MySQL + Postgres). The headline cases are solid — MySQL LIMIT <off>, <cnt>, OFFSET before LIMIT, backtick identifiers and inline leading hints all paginate correctly across pages. 👍
But I hit four issues where the rewriter mishandles inputs — three confirmed directly via the new "executed query" the UI shows. Requesting changes for the two correctness ones (CTE + the strip desync); the FETCH item is smaller.
Confirmed at runtime (executed query in brackets):
-
CTEs are never paginated.
parse_paginationparses the whole statement and matchesStatement::Query, which already includesWITH … SELECT. But the call-site gate is stillis_select_query()— astarts_with("SELECT")string check — so a CTE falls through to the non-paginated branch and is hard-truncated atpage_sizewith no paging controls. The new parser could handle it; the gate blocks it. (all three drivers) -
strip_at_limit_keyworddesyncs from the parser for non-numeric clauses. When the parser reportshas_limit_clause = truebut the clause isn't plain digits, both the token heuristic and thestrip_limit_offsetfallback fail to strip it, so pagination is appended to the un-stripped query:… LIMIT ALL→… LIMIT ALL LIMIT 501 OFFSET 0→syntax error at or near "LIMIT"… LIMIT 2000 -- note→… LIMIT 2000 -- note LIMIT 501 OFFSET 0→ the--comments out the appended pagination, which is then silently dropped (the same query without the comment strips cleanly to… LIMIT 501 OFFSET 0).
Root cause: once the parser has recognized the clause, stripping re-tokenizes heuristically instead of cutting at the span the parser already computed.
-
FETCH FIRST … ROWSfallback produces a mixed clause.… FETCH FIRST 5000 ROWS ONLY→… FETCH FIRST 5000 ROWS ONLY LIMIT 501 OFFSET 0→ DB error. Out of scope per the PR — but the inline comment claims it defers "rather than producing a mixed clause", and it does produce one. (suggestion inline)
Recommendations
- (1) Derive paginatability from the parse, not the string prefix: paginate iff the query parses to a single
Statement::Query— cleanly includes CTEs/VALUES, excludesSHOW/EXPLAIN/DDL. (Don't widen toreturns_result_set, which would try to paginateSHOW/EXPLAINand append an invalidLIMIT.) - (2) When
has_limit_clauseis true, cut at the clause's source span from the parse instead of re-scanning tokens, soLIMIT ALL, trailing comments, etc. can't desync.
Details inline.
… span Addresses review feedback on #286: - Paginate iff the query parses to a single Statement::Query instead of a starts_with("SELECT") check, so CTEs (WITH ... SELECT) and VALUES reach the parser-based rewriter while SHOW/EXPLAIN/DDL stay out. The prefix check remains as fallback when the parser rejects the input. - Strip a parsed LIMIT/OFFSET clause by cutting at its AST source span rather than re-scanning tokens, so clauses the heuristics don't recognise (LIMIT ALL, trailing comments, non-literal expressions) can no longer desync and produce mixed clauses or silently dropped pagination. A bare LIMIT ALL is folded into "no clause" by sqlparser and is located textually; queries with locking clauses defer to the fallback so FOR UPDATE is never silently dropped. - Make the heuristic tokenizer comment-aware so the fallback strip and extract scans cannot be shielded by trailing comments either. - Fix the misleading FETCH comment: the fallback does produce a mixed clause for FETCH FIRST queries.
NewtTheWolf
left a comment
There was a problem hiding this comment.
Re-reviewed 3ccec110..d96ec81 — both correctness blockers resolved, excellent work @debba.
✅ CTEs (#1) — is_paginatable_query now derives paginatability from the parse (single Statement::Query, fallback to the prefix check), and all three drivers gate on it. Verified on live Postgres: WITH recent AS (…) SELECT * FROM recent now paginates instead of being hard-truncated.
✅ Strip desync (#2) — cutting at the clause's AST source span (clause_start) instead of re-tokenizing fixes LIMIT ALL, trailing --//* */ comments, and expression limits. Confirmed the old … LIMIT ALL LIMIT 101 OFFSET 0 output errored on PG and the new … LIMIT 101 OFFSET 0 is clean. The LIMIT ALL textual special-case (parser folds it away) and the comment-skipping tokenizer are nicely handled.
👍 Bonus: deferring FOR UPDATE/SHARE to the fallback so a span-cut can't silently drop a lock.
drivers::common 79/79 green. The corrected comment on the FETCH FIRST mixed-clause is the honest call — one optional follow-up: for FETCH/locked queries it'd be strictly safer to skip auto-pagination than to append a clause the DB rejects. Non-blocking. LGTM 🚀
Builds on #275.
Background
PR #275 fixed the grid pagination rewriter to fold the user's
OFFSETinto theper-page offset, but it did so by extending the hand-rolled token scanner in
src-tauri/src/drivers/common/query.rs, which only recognises the trailing… LIMIT <n> OFFSET <n>shape.That heuristic is fragile. It does not understand:
LIMIT <offset>, <count>syntaxOFFSETbeforeLIMIT(valid in Postgres)In those cases the values are read wrong (or dropped) and the stripped base can be
inconsistent with what was extracted, producing a malformed appended query.
The change
sqlparsercrate (v0.62) and readQuery.limit_clausefrom the AST,parsed with the correct per-driver dialect (
PaginationDialect::{MySql,Postgres,Sqlite}threaded through
build_paginated_query). MySQL'sLIMIT <offset>, <count>isnormalised to the same
(limit, offset)shape.LIMIT/OFFSETkeyword (reusing the existingposition-aware tokenizer, which collapses parenthesised subqueries so inner
LIMITs are never touched), consistent with what the parser saw.LimitClauseAST node and concatenate itto the verbatim sliced base, so leading comments, inline
/*+ hints */, andthe body's formatting are preserved (no full-query reserialization).
so behaviour never regresses.
FETCH FIRST … ROWSis out of scope and defers tothe fallback.
Tests
Existing
build_paginated_querytests updated to pass a dialect; new cases added forthe MySQL comma form (pages 1 & 2),
OFFSETbeforeLIMIT, backtick identifiers,inline-hint preservation, and the parse-error fallback.
cargo test drivers::common— 63 passed.