Skip to content

[refactor](catalog) Catalog spi 07 paimon#64446

Draft
morningman wants to merge 47 commits into
apache:branch-catalog-spifrom
morningman:catalog-spi-07-paimon
Draft

[refactor](catalog) Catalog spi 07 paimon#64446
morningman wants to merge 47 commits into
apache:branch-catalog-spifrom
morningman:catalog-spi-07-paimon

Conversation

@morningman

Copy link
Copy Markdown
Contributor

No description provided.

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman force-pushed the branch-catalog-spi branch from b8d6426 to f09b6df Compare June 12, 2026 14:22
morningman and others added 28 commits June 12, 2026 22:23
本 session 仅调研+设计。14-agent code-grounded recon + cross-cut 对抗复审,
覆盖 paimon 5 功能区(普通读/系统表/procedure/DDL/mtmv)旧框架实现 →
映射新 catalog SPI → 对齐 maxcompute 连接器接口一致性。

新增:
- research/p5-paimon-migration-recon.md: 5 区旧实现 + E1–E10 SPI 状态 +
  跨切面风险 + MC 一致性 11 约定 + 测试基线
- tasks/P5-paimon-migration.md: old→new 映射 + 30 TODO/B0–B9 批 +
  批次依赖图 + 验收标准

用户签字决策:
- D-037 (P5-D1): flavor=单 Catalog + createCatalog flavor switch(MC 一致,
  不建 backend 模块——5 个 backend 模块是空壳)
- D-038 (P5-D2): MTMV/MVCC 桥 P5 内实现(fe-core PaimonPluginDrivenExternalTable),
  翻闸 gated on 它,禁静默读 latest 回归

证伪 3 先验: backend 模块空壳(连接器走单 Catalog stub)/ FE 分发部分已预接
(残留=连接器 listPartitions)/ Base64 非 blocker(BE 有 STD fallback)。
procedure 区=零可迁 doc-only。

doc 同步: connectors/paimon.md(修 3 stale 表述)、decisions-log.md(+D-037/D-038,
36→38)、PROGRESS.md(header/§一/§二/§三/§四/§六/§七)、HANDOFF.md(覆盖,不留折叠历史)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T01: extract PaimonCatalogOps injection seam (5 read methods, B0 read-only)
over the paimon SDK Catalog; refactor PaimonConnectorMetadata to inject it
(6 call sites migrated, read path byte-for-byte unchanged); build the first
fe-connector-paimon test module (no-mockito recording fake, mirroring MC's
McStructureHelper): 9 metadata UTs pinning the databaseExists try/catch and
the getColumnHandles reload-fallback, FakePaimonTable (fail-loud on non-read
methods), and an env-gated live connectivity smoke.

T02: R-007 paimon.version 3-way pin invariant comment (FE connector + BE
paimon-scanner + preload-extensions already aligned at 1.3.1 via the single
fe/pom.xml property); offline FE->BE serialized-Table round-trip smoke (real
FileSystemCatalog -> connector encode -> BE-mirrored URL-first/STD-fallback
decode, asserts rowType/partition/primary keys); parity-baseline doc
inventorying the 41 existing regression suites as the after-cutover parity
gate plus the real connector-side gaps and the live-e2e hard gate.

Connector module: Tests run: 12, Failures: 0, Errors: 0, Skipped: 1 (the
skip is the env-gated live test); checkstyle 0; import-gate clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-Catalog flavor switch on paimon.catalog.type for all five flavors
(filesystem/hms/rest/jdbc/dlf), mirroring the legacy fe-core flavor
properties without importing fe-core/fe-common.

- New PaimonCatalogFactory: pure validate() + buildCatalogOptions()
  (paimon.catalog.type -> paimon `metastore` opt, per-flavor options,
  paimon.* passthrough excl storage prefixes) + buildHadoopConfiguration /
  buildHmsHiveConf / buildDlfHiveConf + requireOssStorageForDlf.
- PaimonConnector: thread ConnectorContext; createCatalog wires all 5
  flavors live (filesystem/jdbc with Hadoop Configuration, rest
  Options-only, hms/dlf with HiveConf), each wrapped in
  context.executeAuthenticated (Kerberos seam). JDBC DriverShim ported with
  driver-url resolution via getEnvironment() (replaces forbidden JdbcResource).
- PaimonConnectorProperties: all flavor key constants (multi-alias String[]).
- PaimonConnectorProvider: validateProperties override -> factory.validate.
- pom: add paimon-hive-connector-3.1 + hadoop-common + hive-common
  (hive-common over hive-catalog-shade to avoid the fastutil conflict).
- 31 new no-mockito unit tests (PaimonCatalogFactoryTest); module 43/0/0/1,
  checkstyle 0, import-gate clean.

hms/dlf live connection is gated on B7 cutover + live-e2e: the Thrift
metastore client is host-provided (not bundled) with a child-first
Configuration/HiveConf cross-loader hazard to verify; jdbc driver_url FE
security allow-list + external hive-site.xml file load are deferred. All
documented in code NOTEs and plan-doc. rest also requires warehouse
(legacy parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connector-side only; no fe-core / fe-connector-api / fe-connector-spi changes.
B2 and B3 were both uncommitted and are entangled in the same files
(PaimonConnectorMetadata, PaimonCatalogOps, PaimonConnector,
RecordingPaimonCatalogOps), so they are committed together.

B2 normal-read (T06-T10):
- T06 PaimonScanPlanProvider transient-Table reload fallback (planScan +
  getScanNodeProperties both guarded)
- T07 PaimonPredicateConverter parity-correct TZ (NTZ keeps UTC, LTZ not
  pushed) + supportsCastPredicatePushdown=false
- T08 listPartitionNames/listPartitions/listPartitionValues (legacy
  display-name parity) + seam listPartitions(Identifier)
- T09 doc-only pure-predicate pruning; T10 cache deferred to B8

B3 DDL metadata (T11-T15):
- T11 PaimonTypeMapping.toPaimonType (Doris->paimon, byte-parity with legacy
  DorisToPaimonTypeVisitor; narrow gap preserved)
- T12 PaimonSchemaBuilder (ConnectorCreateTableRequest -> paimon Schema)
- T13 createTable/dropTable + seam DDL methods + ConnectorContext threaded
  (D7=B: each DDL op wrapped in executeAuthenticated; read path un-wrapped)
- T14 supportsCreateDatabase/createDatabase (HMS-props gate) +
  dropDatabase(force) (enumerate-loop + native cascade)
- T15 offline UTs (no-mockito; WHY+MUTATION)

Verified: fe-connector-paimon Tests run: 96, Failures: 0, Errors: 0,
Skipped: 1 (live); checkstyle 0; connector import-gate 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port paimon system tables and MVCC snapshots onto the plugin connector SPI.

- T16: greenfield E7 SPI on ConnectorTableOps — listSupportedSysTables +
  getSysTableHandle (default no-ops; MC/jdbc/es/trino unaffected).
- T17: PaimonConnectorMetadata implements E7 — names from
  SystemTableLoader.SYSTEM_TABLES; sys table loaded via the existing
  getTable seam with a 4-arg Identifier(db,table,"main",sysName); sys
  handle carries sysTableName + forceJni (binlog/audit_log); shared
  PaimonTableResolver gives metadata + scan one sys-aware reload rule.
- T18: generic fe-core glue — PluginDrivenExternalTable centralizes handle
  acquisition into resolveConnectorTableHandle and delegates
  getSupportedSysTables to the connector; new PluginDrivenSysExternalTable
  (reports PLUGIN_EXTERNAL_TABLE) + PluginDrivenSysTable reuse the live
  SysTableResolver/NativeSysTable machinery (reusable by future connectors).
- T19: forceJni gate so binlog/audit_log go JNI not native; buildTableDescriptor
  -> HIVE_TABLE (also fixes a latent normal-table SCHEMA_TABLE descriptor gap,
  DV-024); PluginDrivenScanNode fail-loud guard rejects scan-params/time-travel
  on system tables.
- T20: first E5 MVCC consumer — beginQuerySnapshot/getSnapshotAt/getSnapshotById
  (empty table -> -1; sys handle -> empty) + SUPPORTS_MVCC_SNAPSHOT/TIME_TRAVEL
  capabilities. Inert until B5 wires the fe-core MvccTable consumer.

Decisions: D-039 (E7 reuses the live SysTable machinery; RFC §10's
$-suffix-via-getTableHandle design was never implemented and is superseded,
DV-023). Deviations: DV-023, DV-024.

Verification: import-gate 0; connector 124 tests pass (1 live skipped);
fe-core PluginDriven*Test 100 pass; checkstyle 0; no cutover/B5 leakage
(paimon not in SPI_READY_TYPES; PluginDrivenExternalTable still not an MvccTable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ridge + time-travel + procedure doc no-op

B5a (MTMV/MVCC bridge): source-agnostic PluginDrivenMvccExternalTable (MTMVRelatedTableIf+MTMVBaseTableIf+MvccTable, D-042) wiring the B4-inert E5 snapshot SPI; PluginDrivenMvccSnapshot; list-partitions-at-snapshot.
B5b (time-travel): scan-pin + AS-OF + tag + branch + @incr across connector (ConnectorTimeTravelSpec, PaimonIncrementalScanParams) and fe-core; holistic review fixes RD-1 (partitioned time-travel empty-universe scan-all guard in PluginDrivenScanNode) + RD-2 (@incr lists-latest partitions/schema).
B6/T26: procedure doc no-op — zero migratable code; closed-form reject verified (ExecuteActionFactory:59-62 / CallFunc:42-43).
All inert/gated until B7 cutover (paimon NOT yet in SPI_READY_TYPES). Excludes regression-conf.groovy (secrets) + scratch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview fixes

Combines all previously-uncommitted P5 paimon work into one commit (per request).

8 fullpath-review fixes (BLOCKERs + key MAJORs) — connector + SPI + fe-core bridge:
- FIX-STORAGE-CREDS: applyStorageConfig translates canonical s3.*/oss.*/AWS_* ->
  fs.s3a./fs.oss. (+DLF region->OSS endpoint)
- FIX-NATIVE-PARTVAL: per-type serializePartitionValue + session TZ (LTZ only);
  binary/varbinary drops the partition map (no [B@hash garbage)
- FIX-TZ-ALIAS: full legacy ZoneId.SHORT_IDS + 4 Doris overrides alias map
  (CST/PST/EST now resolve for FOR TIME AS OF datetime strings)
- FIX-TABLE-STATS: getTableStatistics override + PaimonCatalogOps.rowCount seam
  (normal AND system tables, via the sys-aware resolveTable)
- FIX-CPP-READER: honor enable_paimon_cpp_reader -> native DataSplit.serialize so
  BE's PaimonCppReader can decode the split
- FIX-READ-NOTNULL: mapFields forces read-path columns nullable (legacy parity)
- FIX-HMS-CONFRES: new ConnectorContext.loadHiveConfResources hook + 2-arg
  buildHmsHiveConf file-base merge (external hive-site.xml reaches the metastore)
- FIX-REST-VENDED: new ConnectorContext.vendStorageCredentials hook + scan-props
  vended AWS_* overlay (REST per-table tokens reach BE)

Also carries the previously-uncommitted B7 core cutover + D-045/D-046 restores.

Tests: fe-connector-paimon 213 pass / 0 fail / 1 skip (live-gated); fe-core compiles +
DefaultConnectorContextVendTest 2/0. Each fix's root-cause/patch/UT and impl-time
corrections are in plan-doc/tasks/designs/P5-fix-<id>-design.md.

Excluded from this commit: regression-test/conf/regression-conf.groovy (plaintext Aliyun
keys, pending scrub) and scratch dirs (.audit-scratch/, conf.cmy/, META-INF/, *.bak).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical scheme

Root cause: the paimon connector sent native ORC/Parquet data-file paths and
deletion-vector (DV) paths to BE un-normalized. The paimon SDK emits
warehouse-native schemes (oss://, cos://, obs://, s3a://, or the OSS
bucket.endpoint authority form); BE's scheme-dispatched S3 file factory only
recognizes s3://. On S3-compatible (non-AWS) warehouses this breaks native reads
outright (B-7DF, data file) and silently drops the DV so DELETEd rows reappear
(B-7DV, merge-on-read corruption). Legacy PaimonScanNode normalized both via the
2-arg LocationPath.of; the cutover dropped it. The two paths reach BE via
different mechanisms (data-file through PluginDrivenSplit's single-arg
LocationPath.of -> FileQueryScanNode:568; DV baked into thrift by the connector's
populateRangeParams), so a fe-core-bridge-only fix cannot reach the DV path.

Solution: new ConnectorContext.normalizeStorageUri SPI hook (identity default,
mirroring vendStorageCredentials), implemented in DefaultConnectorContext via the
engine's 2-arg normalizing LocationPath.of with the catalog's static storage map
(threaded via a new lazy supplier + 4-arg ctor; PluginDrivenExternalCatalog wires
it). The connector routes BOTH the data-file and DV paths through it inside the
extracted, unit-testable buildNativeRange. JNI path untouched (carries its own
FileIO). Fail-loud on un-normalizable paths (legacy parity). Static-vs-vended map
scope noted in DV-025 (the pure-vended edge belongs to credential fixes #2/#3).

Tests: fe-core DefaultConnectorContextNormalizeUriTest (oss->s3, s3 idempotent,
null/blank, empty-map fail-loud); connector PaimonScanPlanProviderTest x3 (both
paths normalized + call count, DV-less, no-context raw). paimon module 216/0/0,
fe-core targeted green, checkstyle 0, import-gate clean. Live OSS+DV e2e CI-gated
(not run). SPI RFC section 21 (E13), deviations DV-025.

Also includes the round-2 review report + task list this fix derives from.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark FIX-URI-NORMALIZE complete (commit 20b19d1) in the task list and update
HANDOFF: #1 summary + verification, next session starts at #2 (reuse the
normalizeStorageUri BE-scan-prop normalization seam), and the standing reminders
(regression-conf.groovy still holds a plaintext key -> path-whitelist only; P2
apache#8/apache#9 need user scope decision first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical AWS_*

Finding B-9 (BLOCKER, rereview2). The paimon connector copied static
catalog-level storage credentials/config verbatim into the BE scan-node
properties: PaimonScanPlanProvider.getScanNodeProperties iterated the raw
catalog properties and emitted location.<rawkey> for any s3./oss./cos./obs./
hadoop./fs./dfs./hive. prefix; the fe-core bridge only strips the location.
prefix. BE's native (FILE_S3) reader understands ONLY AWS_ACCESS_KEY/
AWS_SECRET_KEY/AWS_ENDPOINT/AWS_REGION/AWS_TOKEN, so static s3.access_key/
oss.access_key on a private bucket reached BE unintelligible -> no usable
credentials -> 403. This is the third credential seam (static->BE-scan),
missed by both the prior round and the 8 fixes (review §9.3); the catalog-
FileIO seam (FIX-STORAGE-CREDS) and the vended seam (FIX-REST-VENDED) were
already closed.

Root cause: legacy PaimonScanNode.getLocationProperties returns only
CredentialUtils.getBackendPropertiesFromStorageMap(storagePropertiesMap) (the
canonical AWS_*/hadoop/dfs map). The cutover replaced that single normalized
call with a raw prefix-copy loop; the connector cannot import fe-core's
StorageProperties so it had no access to the normalization.

Solution (D-048, user-signed full legacy-parity scope): new no-op-default SPI
ConnectorContext.getBackendStorageProperties(); DefaultConnectorContext returns
getBackendPropertiesFromStorageMap over the storagePropertiesSupplier already
wired in FIX-URI-NORMALIZE (no ctor change, CredentialUtils already imported).
The connector replaces its raw prefix-copy loop with a context-gated overlay of
that map; the vended overlay stays after it (vended wins on collision, legacy
precedence). Object-store creds -> AWS_*; HDFS -> canonical hadoop/dfs
(preserves user overrides + adds the legacy defaults, folding in the §211
MINOR); drops the non-parity hive.* passthrough. Investigated the
AWS_CREDENTIALS_PROVIDER_TYPE=ANONYMOUS two-step edge and confirmed via BE
s3_util.cpp (both providers prefer explicit ak/sk over cred_provider_type) that
it is harmless — no regression. Connector import-gate stays clean.

Tests: fe-core DefaultConnectorContextBackendStoragePropsTest (OSS static creds
-> AWS_*, raw alias absent; no-supplier -> empty); connector
PaimonScanPlanProviderTest (+getScanNodePropertiesNormalizesStaticCreds raw
alias not shipped; modified vended-overlay collision to canonical keys; renamed
no-context test -> emits no storage props). Fail-before/pass-after proven by
reverting the connector change (2/3 go red). Module 217/0/0 (1 CI-gated skip),
checkstyle clean, import-gate clean. Live private-bucket native-read e2e is
CI-gated (not run). SPI RFC §22 (E14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record FIX-STATIC-CREDS-BE commit d23d5df in the task-list and update
HANDOFF.md (HEAD, migration chain, completed/next sections). Next: #3
FIX-SCHEMA-EVOLUTION (B-1a+M-10) — the largest P0 SPI surface, independent of
#1/#2; recommend a fresh session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ema_info from the connector

Root cause (rereview2 BLOCKER B-1a): on the native (ORC/Parquet) read path the
paimon connector emitted only the per-file TPaimonFileDesc.schema_id but never
set the scan-level TFileScanRangeParams.current_schema_id / history_schema_info.
BE (table_schema_change_helper.h:219-237) then took the !__isset branch and fell
back to NAME-based file<->table column matching, so a schema-evolved (renamed /
reordered) table read NULL/garbage for the renamed columns silently. JNI path is
unaffected; native is the default. (M-10, Column.uniqueId=-1, deferred — DV-026.)

Design C (user-signed D-049): BE's field-id matcher (table_schema_change_helper
.cpp:312-430) reads only TField.id/name and a nested-vs-scalar type.type tag — no
Doris Type, no tuple descriptor — and org.apache.doris.thrift.* is import-legal in
connectors, so the connector builds the TSchema dictionary directly from paimon
SchemaManager and ships it via the existing populateScanLevelParams hook (the seam
DV-006 anticipated for hudi). Zero new SPI surface; connector-only.
  - current_schema_id = -1; history_schema_info = the -1/current (pinned) schema +
    one entry per SchemaManager.listAllIds() so every native file schema_id is
    covered (BE fails loud on a missing entry, never silent).
  - transport: base64 TBinaryProtocol carrier (a throwaway TFileScanRangeParams)
    via a props key, because getScanPlanProvider() is per-call (no shared state).

Clean-room 3-lens review found 2 real BLOCKERs in the -1/current entry (both fixed
+ re-verified): (1) column-name casing — BE keys the table-side StructNode by the
-1 entry's name verbatim while the native reader queries the lowercase Doris slot
name, and current_schema_id=-1 never hits the ConstNode fast-path, so a mixed-case
column crashed (std::out_of_range) even on never-evolved tables; fix lowercases
ONLY top-level names (default-locale, matching the slot-name producer + legacy
parseSchema:507; nested stays paimon-cased per legacy PaimonUtil:302). (2) time
travel — the -1 entry used schemaManager.latest() (absolute latest) instead of the
snapshot-pinned schema the tuple uses; fix builds it from FileStoreTable.schema()
(pinned) and narrows the guard DataTable->FileStoreTable. Eager all-schemas read
accepted as a fail-loud deviation (DV-027).

Tests: PaimonScanPlanProviderTest +5 (field-id/name carriage, nested ARRAY/MAP/
STRUCT shape + struct-child ids, scalar tag, rename round-trip apply, top-level
lowercase vs nested paimon-case, non-FileStoreTable skip). Module 222/0/0 (1
CI-gated skip), checkstyle clean, import-gate clean. e2e
test_paimon_full_schema_change.groovy is CI-gated (not run). Design doc + D-049 +
DV-026/DV-027 + SPI RFC §23 (no new SPI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…at CREATE (B-8a + B-8b)

rereview2 #4. JDBC-metastore-flavor paimon catalogs only. Connector-only, zero new SPI.

Root cause:
- B-8a (functional BLOCKER): PaimonScanPlanProvider.getBackendPaimonOptions forwarded
  driver_url to BE RAW and its `key.startsWith("jdbc.")` filter dropped the `paimon.jdbc.*`
  alias. A bare `jdbc.driver_url=mysql.jar` reached BE, where JdbcDriverUtils.registerDriver
  does `new URL(value)` -> MalformedURLException; a `paimon.jdbc.driver_url` alias was dropped
  outright. Legacy PaimonJdbcMetaStoreProperties.getBackendPaimonOptions emits
  `jdbc.driver_url=JdbcResource.getFullDriverUrl(driverUrl)` (resolved) + `jdbc.driver_class`.
- B-8b (security): driver_url was loaded into the FE JVM (URLClassLoader) and shipped to BE
  with no format / jdbc_driver_url_white_list / jdbc_driver_secure_path validation, plus a
  stale "paimon is not in SPI_READY_TYPES" disclaimer (false since the B7 cutover added paimon
  to CatalogFactory SPI_READY_TYPES).

Solution (reuses existing hooks; no new SPI surface):
- B-8a: getBackendPaimonOptions now reads driver_url via firstNonBlank(JDBC_DRIVER_URL) (honors
  both the jdbc.* and paimon.jdbc.* alias) and emits the canonical `jdbc.driver_url` RESOLVED to
  a scheme-bearing URL plus `jdbc.driver_class` (BE accepts both alias forms). Resolution is
  extracted to a shared static PaimonCatalogFactory.resolveDriverUrl(driverUrl, env) so FE driver
  registration and the BE-bound options resolve a given driver_url identically.
- B-8b: PaimonConnector overrides Connector.preCreateValidation to route a configured driver_url
  (either alias) through ConnectorValidationContext.validateAndResolveDriverPath at CREATE CATALOG
  (format/whitelist/secure-path; throws -> CREATE fails before the jar loads). Mirrors
  JdbcDorisConnector. Stale disclaimer replaced with an accurate note.

Scope (user-signed D-050; see DV-028/DV-029): validation is CREATE-time only — parity with the
JDBC reference connector. The FE-restart-reload / ALTER-CATALOG / scan-time re-validation gap is a
pre-existing fe-core limitation shared by all plugin connectors (default config is permissive);
accepted, with a cross-connector follow-up filed. BE-side paimon.jdbc.{user,password,uri} alias-drop
is out of scope (BE deserializes the table from serialized_table; only driver_url/driver_class are
consumed by registerDriverIfNeeded).

Tests: PaimonScanPlanProviderTest +5 (resolve bare name, honor paimon.jdbc.* alias, both-aliases
priority+override, preserve scheme-bearing, non-jdbc empty); new PaimonConnectorPreCreateValidationTest
+5 (validate jdbc/alias, skip non-jdbc/no-driver_url, propagate rejection). Module 232/0/0 (1 CI-gated
skip); fail-before verified (5/9 new tests red when neutered); checkstyle 0; connector import-gate clean.
Live e2e (JDBC flavor + remote jar) is CI-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#4 FIX-JDBC-DRIVER-URL committed as 2d15b1b (P0 BLOCKERs now all clear).
Fill the #4 task-list commit cell; rewrite HANDOFF to point at #5 (M-crit,
re-verify the dotted-vs-underscore type-mapping key facts before coding).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon type-mapping toggles

Root cause: after the SPI cutover the paimon connector reads the type-mapping
toggles from UNDERSCORE keys (enable_mapping_binary_as_varbinary /
enable_mapping_timestamp_tz; PaimonConnectorProperties:39,42 ->
PaimonConnectorMetadata.buildTypeMappingOptions), but fe-core only ever writes
the canonical DOTTED catalog keys (enable.mapping.varbinary /
enable.mapping.timestamp_tz; CatalogProperty:50,52, written/defaulted by
ExternalCatalog.setDefaultPropsIfMissing and hidden via HIDDEN_PROPERTIES).
PluginDrivenExternalCatalog.createConnectorFromProperties hands the connector
the raw catalog property map verbatim, so getOrDefault(underscore,"false") is
always false. Even when the user enables the mapping at CREATE CATALOG, Paimon
BINARY stays STRING and TIMESTAMP_WITH_LOCAL_TIME_ZONE stays DATETIMEV2 — a
silent cutover regression (legacy PaimonExternalTable:350 reads the dotted key
and honors it). The binary key is doubly drifted (separator . -> _ AND token
varbinary -> binary_as_varbinary), so a generic dot->underscore normalizer
would not fix it. Latent until the flag is enabled.

Re-confirmation: M-crit was critic-surfaced (not 3-lens-gated), so the finding
was independently re-verified by a 5-agent scout + adversarial synthesizer
(REAL_BUG, high confidence; false-positive steelman rejected — dotted is
canonical per the original feature PRs, every regression CREATE CATALOG, legacy
parity, and the JDBC connector which kept dotted in the same SPI PR).

Solution (connector-only, zero new SPI, no BE): re-point the two
PaimonConnectorProperties constants to the canonical dotted keys
(ENABLE_MAPPING_VARBINARY = "enable.mapping.varbinary", renamed from
ENABLE_MAPPING_BINARY_AS_VARBINARY to match the CatalogProperty/JDBC/iceberg
convention and fix both separator and token; ENABLE_MAPPING_TIMESTAMP_TZ =
"enable.mapping.timestamp_tz") and update the one reference in
PaimonConnectorMetadata. No logic change — the Options(mapBinaryToVarbinary,
mapTimestampTz) arg order is already correct. BE-side consistency verified:
PluginDrivenScanNode extends FileQueryScanNode and inherits the dotted-key read
for the BE scan param (FileQueryScanNode:192-193,635-678), so FE column type
and BE scan param now agree (they diverged before this fix).

Scope: paimon-only (user-signed D-051). NEW hive + iceberg connectors share the
identical root cause; logged as a cross-connector follow-up (DV-030), not fixed
here. Rejected an fe-core dot->underscore normalizer (broader blast, breaks
JDBC which already reads dotted, and insufficient for paimon's renamed token).

Tests (PaimonConnectorMetadataTest): +2 UT. getTableSchemaHonorsDottedMappingKeys
(bug-catcher) sets the dotted keys true and asserts BINARY->VARBINARY /
LTZ->TIMESTAMPTZ; getTableSchemaDefaultsMappingFlagsOff (guard) asserts the
default-off STRING/DATETIMEV2. Module 234/0/0 (1 CI-gated skip), checkstyle 0,
import-gate clean. Fail-before verified: the bug-catcher reddens on the
underscore key (expected <VARBINARY> but was <STRING>) while the guard stays
green. E2E test_paimon_catalog_{varbinary,timestamp_tz}.groovy are CI-gated
(enablePaimonTest=false + external fixture) — not run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROS-DOAS

- task-list #5 commit-cell filled with 9dcf6d1
- HANDOFF rewritten: #5 summary + #6 next (two scope questions for the user)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… all read RPCs in doAs (M-11)

Both are Kerberos-only (harmless on simple-auth: the no-op authenticator's
execute() == task.call()).

Root cause
- M-8 (fe-core): paimon filesystem/jdbc catalogs over Kerberized HDFS lost UGI
  doAs on the cutover path. The HDFS HadoopExecutionAuthenticator is built only
  inside initializeCatalog(), which is dead on the plugin path (only legacy
  PaimonExternalCatalog calls it), so PluginDrivenExternalCatalog read the base
  no-op from getExecutionAuthenticator(). HMS was unaffected — it wires the
  authenticator in initNormalizeAndCheckProps(), which always runs.
- M-11 (connector): metadata read RPCs (listDatabases/getDatabase/listTables/
  getTable[handle+sys+resolveTable]/listPartitions) ran without
  executeAuthenticated; only the 4 DDL ops were wrapped (signed D7=B read-vs-DDL
  asymmetry). On a Kerberos HMS catalog these reads ran outside the catalog
  principal. Legacy wrapped every read.

Fix
- M-8 (filesystem+jdbc only; DLF/REST/HMS excluded — DLF uses Aliyun STS not
  Kerberos, the review's "DLF" clause was overstated): new internal fe-core hook
  MetastoreProperties.initExecutionAuthenticator(List<StorageProperties>) (default
  no-op), invoked by PluginDrivenExternalCatalog.initPreExecutionAuthenticator from
  the already-built storage list; filesystem/jdbc override it to build the HDFS
  authenticator (shared AbstractPaimonProperties helper), mirroring HMS. No
  connector change; no connector SPI change.
- M-11 (full legacy parity, signed D-052, supersedes the D7=B read clause): wrap
  all 7 connector read RPCs in context.executeAuthenticated. A single resolveTable
  wrap covers all resolveTable callers (metadata + scan). Domain exceptions are
  caught INSIDE the lambda because Kerberos UGI.doAs wraps a thrown checked
  Catalog.*NotExistException in UndeclaredThrowableException.

Tests
- M-11: PaimonConnectorMetadataReadAuthTest (12) + 2 scan-path tests assert each
  read runs inside executeAuthenticated (RecordingConnectorContext failAuth/
  authCount). Connector module 248/0/0 (1 CI-gated skip).
- M-8: Paimon{FileSystem,Jdbc}MetaStorePropertiesTest assert getExecutionAuthenticator()
  returns HadoopExecutionAuthenticator after wiring without initializeCatalog;
  fe-core metastore-props 21/0/0 (DLF/HMS regression-clean).
- fail-before verified red for both (M-8: stays base no-op AbstractPaimonProperties$1;
  M-11: authCount/log-empty).
- True end-to-end doAs is live-Kerberos-e2e only (no paimon-kerberos suite); DV-031.

Decisions D-052 (M-11) / D-053 (M-8); deviation DV-031; design
plan-doc/tasks/designs/P5-fix-KERBEROS-DOAS-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-SCANNER

#6 fix commit = 2b1442f. Fill task-list commit cell; roll HANDOFF to #7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aimon connector scan path (M-1)

Root cause: the cutover (plugin) connector's split router read only the
name-derived handle flag paimonHandle.isForceJni() (the binlog/audit_log
NAME hatch) and never consulted the session var force_jni_scanner, so
ORC/Parquet always took the native reader — legacy's JNI escape hatch
(SET force_jni_scanner=true, used to dodge native-reader bugs incl. the
B2 schema-evolution class) was silently gone. The connector ported only
two of legacy's three native-gate conjuncts (PaimonScanNode.java:430:
!forceJniScanner && !forceJniForSystemTable && supportNativeReader); the
dropped !forceJniScanner conjunct is M-1.

Solution (pure connector; no SPI, no fe-core import, no BE param — legacy
serializes nothing for this var):
- new isForceJniScannerEnabled(session): byte-for-byte mirror of
  isCppReaderEnabled, reads key "force_jni_scanner" (byte-identical to
  SessionVariable.FORCE_JNI_SCANNER) from the same VariableMgr.toMap
  channel; null-guarded, default false (legacy default).
- Site A (correctness): shouldUseNativeReader gains an explicit
  forceJniScanner param (mirrors legacy's sibling boolean 1:1) ANDed into
  the native gate; planScan passes isForceJniScannerEnabled(session). The
  handle name-force is OR-sibling, never replaced (binlog/audit_log intact).
- Site B (correctness-neutral): getScanNodeProperties suppresses the
  native-only paimon.schema_evolution dict when force_jni_scanner routes
  every split to JNI (BE consumes it only on native ORC/Parquet ranges;
  JNI/cpp readers ignore it). Matches the connector's own documented contract.

Tests (fail-before + pass-after both verified):
- isForceJniScannerEnabledReadsSessionProperty: pins the exact key,
  default-false, null-safety.
- forceJniScannerRoutesNativeEligibleSplitToJni: a native-eligible split
  must route to JNI when force_jni_scanner=true (legacy parity).
- 3 existing shouldUseNativeReader calls updated for the new param.
- Module 250/0/0 (+1 CI-gated live skip); connector import-gate + checkstyle clean.
- Real BE reader selection is a CI-gated live-e2e check (no offline coverage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-COUNT-PUSHDOWN (P2, ask scope first)

- task-list: #7 row → ✅ design/impl/build(250/0/0)/commit `05132a42668` + DONE detail.
- HANDOFF: #7 summary (3rd-param overrides synthesizer call-site-OR per Rule 9;
  Site B correctness-neutral, no offline red test honestly noted); next = apache#8/apache#9
  P2 perf-parity → AskUserQuestion for scope (accept-or-defer) BEFORE implementing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(*) on plugin paimon (M-2)

Root cause: after cutover, COUNT(*) over a plugin-driven paimon table is
result-correct but slow. The COUNT enum already reaches BE
(FileScanNode.toThrift:90; PhysicalPlanTranslator:873 sets it on the plugin
node, not excluded) and the per-range emit seam is already built
(PaimonScanRange.Builder.rowCount -> paimon.row_count -> setTableLevelRowCount,
byte-identical to legacy PaimonScanNode:303-308). The missing half is the
signal + compute: DataSplit.mergedRowCount() is paimon-SDK-only (connector),
and the getPushDownAggNoGroupingOp()==COUNT signal lives only on the fe-core
node and reached nobody. So every split carried table_level_row_count=-1 and
BE materialized the full post-merge row set just to count (file_scanner.cpp:
1298-1326) — costly on PK/MOR tables.

Not pure-connector: the signal must cross the SPI boundary. Threading it via
ConnectorSession (the FIX-FORCE-JNI precedent) was rejected — the agg-op is a
per-query planner output, not a SET-variable, and would be a silent untyped
channel.

Solution (3 files; user signed off, D-054):
- SPI (ConnectorScanPlanProvider): new default planScan overload carrying
  `boolean countPushdown`, delegating to the 6-arg variant — mirrors the
  limit/requiredPartitions extension chain; other connectors are no-op (E15).
- fe-core (PluginDrivenScanNode.getSplits): read
  getPushDownAggNoGroupingOp()==TPushAggOp.COUNT and forward the flag. No
  post-loop math.
- connector (PaimonScanPlanProvider): extract planScanInternal(...,countPushdown)
  (4-arg delegates false, new 7-arg delegates the flag); add the count
  short-circuit as the FIRST routing arm (a count-eligible split must not also
  emit a data range, else BE double-counts vs deletion vectors / PK merge);
  collapse-to-one — sum every count-eligible split's mergedRowCount and emit ONE
  JNI count range bearing the total (= legacy's <=10000 singletonList +
  assignCountToSplits case). New members: static isCountPushdownSplit + buildCountRange.

Param shape = boolean (BE only needs COUNT-vs-not), scope = paimon-only
(default no-op). legacy's >10000 parallel-split trim is intentionally dropped
(connector has no numBackends, an fe-core-only concern) — perf-only divergence,
result identical (DV-032). No new thrift, no BE change.

Tests: connector PaimonScanPlanProviderTest +2 — isCountPushdownSplit eligibility
on a real split (true/2, disabled/false); end-to-end planScan over a PARTITIONED
PK table with asymmetric per-partition counts (2 + 3) asserting collapse-to-one
carrying the SUM (5, unreachable from any single split) and no row_count when the
flag is off. Connector 252/0/0 (1 CI-gated live skip), fe-core compile + checkstyle
0, import-gate clean. Fail-before verified: neuter isCountPushdownSplit->false ->
the count tests red; mutate `countSum +=` -> `=` -> the cross-split-sum assertion
red. Real BE CountReader selection / EXPLAIN = CI-gated live-e2e (existing legacy
paimon count regression covers the BE contract).

Adversarially reviewed (workflow wf_6ead7c2c-b58): one MAJOR caught and fixed
(the collapse/sum test was degenerate on a single-split fixture); two MINORs
refuted (batch-path signal moot for paimon; EXPLAIN count-line drop is cosmetic,
noted in DV-032).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…files for read parallelism (M-3)

Root cause: after cutover, a large native (ORC/Parquet) paimon data file gets
ONE scanner — no intra-file parallelism. The connector's native arm emitted
exactly one PaimonScanRange per RawFile (start=0, length=file.length()). Legacy
PaimonScanNode:434-465 sub-splits each large file via determineTargetFileSplitSize
+ fileSplitter.splitFile. Result is correct (BE reads the whole file either way);
only read parallelism regresses.

Recon (wf_ad764bf6-1c9) confirmed: it is a real gap (ORC/Parquet are
PLAIN/splittable, legacy does sub-split); DV x sub-split is SAFE (paimon
deletion-vector rowids are GLOBAL file row positions, BE native readers report
global positions even within a partial byte range, _kv_cache shares the DV bitmap
across sub-splits keyed by path+offset, iceberg uses the identical machinery on
routinely-split files); and it is pure-connector (the splitter math + 5 session
vars re-stated with plain longs — the connector cannot import fe-core
FileSplitter/SessionVariable).

Solution (pure connector, zero SPI, zero fe-core; D-055):
- Two pure statics: computeFileSplitOffsets(fileLength, targetSplitSize) ports
  FileSplitter.splitFile's specified-size branch byte-for-byte incl. the >1.1D
  tail guard (the last range absorbs a remainder up to 1.1x instead of a tiny
  tail split); determineTargetSplitSize(...) ports determineTargetFileSplitSize +
  applyMaxFileSplitNumLimit (the isBatchMode->0 branch omitted — paimon is never
  batch).
- sessionLong + lazy resolveTargetSplitSize read the 5 file-split session vars via
  the VariableMgr.toMap channel (like isCppReaderEnabled) and sum native-eligible
  file sizes once per scan.
- Native arm: emit one range per [start,length) sub-range via buildNativeRanges,
  attaching the SAME unmodified per-RawFile DeletionFile to EVERY sub-range (DV is
  global-row-position indexed; no offset re-basing). buildNativeRange gains
  (start, length); fileSize stays the whole file length.
- Under COUNT(*) pushdown a native split that is not count-eligible (no precomputed
  merged count, e.g. a DV with null cardinality) is kept WHOLE (target size 0 ->
  one whole-file range), mirroring legacy splittable=!applyCountPushdown.

The split-weight/target-size scheduling nicety is not ported (pre-existing native
path already omitted it; perf/scheduling-only, not correctness) -> DV-033.

Tests: connector PaimonScanPlanProviderTest +6 — computeFileSplitOffsets math
(250MB/64MB->4 with 58MB tail, exact-multiple, small-file-whole, empty, target<=0);
determineTargetSplitSize heuristic (file_split_size override, 32MB<->64MB threshold,
max_file_split_num floor); end-to-end append-only fixture (tiny file_split_size ->
>=2 contiguous sub-ranges tiling [0,fileLength); default -> 1 range); DV on every
sub-range; whole-file under count pushdown. Updated the 3 existing buildNativeRange
call sites to the new signature. Connector 258/0/0 (1 CI-gated live skip),
checkstyle 0, import-gate clean. Fail-before verified: neuter computeFileSplitOffsets
-> the 3 splitting tests red; attach DV only to the first sub-range -> the DV test
red. Real BE multi-range + DV read = CI-gated live-e2e (legacy paimon regression
covers the BE contract; no BE change).

Adversarially reviewed (workflow wf_4ac7479d-39d): 2 confirmed and fixed (the
count-pushdown sub-split parity gap + false comment; the missing DV-on-every-sub-range
test), 2 refuted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hand off P3 coverage-gap verification

- FIX-COUNT-PUSHDOWN (apache#8, M-2) = 525be03; FIX-NATIVE-SUBSPLIT (apache#9, M-3) = 2f5f467.
- Both recon'd (multi-scout workflow) + adversarially reviewed before commit; each review
  caught a real finding (degenerate test / parity gap) that was fixed.
- P0/P1/P2 all clear. Next: P3 coverage gaps (verify, not fix) — FIX-HMS-CONFRES re-check,
  DDL write parity, ANALYZE/column-stats, split-count accounting, cross-connector follow-ups.
- task-list apache#9 commit hash finalized; HANDOFF overwritten.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rejection in PluginDrivenExternalCatalog.createTable

Root cause: the generic fe-core bridge PluginDrivenExternalCatalog.createTable
collapsed legacy PaimonMetadataOps.performCreateTable's ordered remote-then-local
existence probe into a single `exists` OR that was consumed ONLY by the IF NOT
EXISTS branch. The !IF NOT EXISTS path ignored it and unconditionally called
metadata.createTable. So a table present only in the local FE cache (a case-variant
folded onto an existing name under lower_case_meta_names, absent on a case-sensitive
remote) was CREATED remotely instead of rejected with ERR_TABLE_EXISTS_ERROR --
silent metadata corruption. Found by the P3 plugin-vs-legacy parity audit
(adversarially verified); narrow, backend-dependent trigger (filesystem/jdbc paimon;
HMS lowercases so both sides reject). Generic bridge -> also affects MaxCompute /
future iceberg/hudi.

Solution (fe-core bridge only; zero SPI/connector/BE): split the `exists` OR into
remoteExists/localExists; under !IF NOT EXISTS, when localExists is true throw
ERR_TABLE_EXISTS_ERROR (legacy local-arm parity). A remote-only conflict still falls
through to connector.createTable (case A unchanged). Option-2 surgical (D-056); the
residual case-A / all-DDL-op generic-error-code collapse is pre-existing and out of
scope (DV-034).

Tests: new PluginDrivenExternalCatalogDdlRoutingTest
.testCreateTableLocalConflictWithoutIfNotExistsRejects (local-hit + remote-miss +
!IF NOT EXISTS -> asserts DdlException thrown + metadata.createTable never called +
no edit log). fail-before: exactly 1 new test red ("Expected DdlException...nothing
was thrown"); pass-after: 26/0/0. fe-core checkstyle 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…P3-fix landed)

P3 "go check" done via adversarial audit wf_25450c36-b7a: HMS-CONFRES /
ANALYZE-stats / split-count all PARITY_HOLDS; DDL write surfaced one MAJOR
correctness divergence -> FIX-CREATE-TABLE-LOCAL-CONFLICT (67a9b9d).
Updates HANDOFF for next steps (P4 cleanup / B8 legacy removal /
cross-connector follow-up). No P0/P1/P2/P3 blockers remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4 N10.1)

Root cause: the plugin read-direction type mapping
PaimonTypeMapping.toVarcharType used `len >= 65533` to overflow a paimon
VarCharType to STRING, while legacy PaimonUtil.paimonPrimitiveTypeToDorisType
uses `len > 65533`. 65533 == ScalarType.MAX_VARCHAR_LENGTH is the legal
exact-fit max VARCHAR, not the STRING wildcard, so the connector widened
VARCHAR(65533) to STRING — a DESCRIBE / SHOW CREATE TABLE reported-type
divergence (data and read correctness unaffected; STRING is a superset).

Fix: change the boundary `>= 65533` -> `> 65533` to match legacy byte-for-byte
(pure connector, 1 char). The unreachable `len <= 0` defensive guard is kept
untouched (paimon VarCharType min length is 1).

Tests: new read-direction PaimonTypeMappingReadTest pins the boundary intent
(65532 -> VARCHAR(65532); 65533 -> VARCHAR(65533) [the fix]; 65534 -> STRING).
Fail-before exactly the 65533 assertion red ("expected VARCHAR but was STRING");
pass-after green. Full module 260/0/0 (1 CI-gated live skip), checkstyle 0,
connector import-gate clean. No BE/SPI change; reported-type parity otherwise
covered by the CI-gated legacy paimon DESCRIBE regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion value to NULL (P4)

Root cause: PaimonScanRange.populateRangeParams routed paimon partition values
through ConnectorPartitionValues.normalize, which applies Hive-directory
null-sentinel coercion (a value of "\N" or "__HIVE_DEFAULT_PARTITION__" -> isNull).
That coercion is correct for hudi (path-encoded partitions) but wrong for paimon:
paimon partition values are TYPED — serializePartitionValue returns Java-null for a
genuine null and the literal toString() otherwise — so a null is never a directory
sentinel, and the coercion only ever bites a genuine literal value. A string
partition column literally holding "\N" (which paimon does NOT reserve) or
"__HIVE_DEFAULT_PARTITION__" was materialized as SQL NULL instead of the literal on
the native ORC/Parquet read, diverging from legacy PaimonScanNode.setScanParams
(source/PaimonScanNode.java:323-326) and yielding wrong rows for WHERE col='\N' /
col IS NULL. The dominant genuine-NULL case is unaffected (both sides set isNull=true
and BE ignores the rendered value string when is_null==true,
partition_column_filler.h:40-44).

Fix (1 file): derive isNull from the Java null ONLY (render genuine null as "",
legacy-exact); drop the unused ConnectorPartitionValues import. ConnectorPartitionValues
itself is left untouched — hudi (HudiScanRange.java:226) legitimately needs the
Hive-directory coercion. The residual scan-vs-prune skew for a literal
"__HIVE_DEFAULT_PARTITION__" value lives in the generic fe-core prune bridge
(TablePartitionValues), is pre-existing and unchanged by this fix, and is logged as a
deviation.

Tests: new PaimonScanRangePartitionNullTest pins genuine-null -> (isNull=true, "");
literal "\N" -> (isNull=false, "\N"); literal "__HIVE_DEFAULT_PARTITION__" ->
(isNull=false, verbatim); ordinary -> kept. Fail-before (re-inlined coercion) reds the
literal + render rows; pass-after green. Full module 261/0/0 (1 CI-gated live skip),
checkstyle 0, import-gate clean. Adversarial review (5 angles) SAFE_TO_COMMIT: total
convergence of all 3 range builders on populateRangeParams; no query goes correct->wrong.
No BE/SPI change; native partition materialization otherwise covered by the CI-gated
legacy paimon partition regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…035])

Records the P4 cleanup pass disposition (P0–P4 now all clear):
- FIX-VARCHAR-BOUNDARY (N10.1) `bcee91dcb52` + FIX-PARTITION-NULL-SENTINEL
  `4b2c2190dc2` landed as independent fix commits.
- 15 items accepted as deviations (M5.1 transient-only + 14
  display/perf/text/inert/connector-more-correct/false-premise) → [DV-035].
- D-057 logs the user-signed scope; DV-035 the accepted batch.
- task-list §P4 marked done; HANDOFF rolled to next session (B8 legacy
  deletion or cross-connector follow-up batch).

Read-only adversarial recon `wf_6884d37b-8ef` re-verified all ~17 review §5/§7
items against current code; the sentinel ACCEPT verdict was refuted by a
prune-path skeptic (converted to FIX) and M5.1's "cheap fallback" premise was
refuted at impl level (confirmed ACCEPT).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
morningman and others added 10 commits June 12, 2026 22:23
… injection)

Next session = a third independent adversarial review of every paimon
connector functional path (basic read, @incr, time travel, branch/tag,
sys-tables, metadata cache, deletion vectors, multi-metastore, multi-storage,
Parquet/ORC native read, type mapping, and a legacy-logic/fallback sweep),
checking design + implementation delivery and diffing each path against the
legacy datasource/paimon/* reference (kept in-tree for side-by-side).

Hard constraint per user: do NOT inject accumulated development priors during
the find-and-judge phase — reviewers judge from current code + legacy only;
decisions-log / deviations-log / prior review reports / catalog-spi-p5-* memory
are consulted ONLY in a final reconciliation phase and must not suppress a
finding. B8 legacy deletion deferred until after this review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rows on URI normalize (P9-1, BLOCKER)

Root cause: native ORC/Parquet reads on a Paimon REST catalog over object
storage (oss/cos/obs/s3a) threw during FE planning —
"StoragePropertiesException: No storage properties found for schema: oss".
PaimonScanPlanProvider.normalizeUri routed both the data-file path and the
deletion-vector path through ConnectorContext.normalizeStorageUri, which
normalizes via the catalog's STATIC storage map. That map is empty by design
for REST catalogs (vended creds are per-table/dynamic;
CatalogProperty.initStorageProperties seeds an empty map when vended creds are
enabled), so LocationPath.of(uri, {}) found no scheme entry and threw.
shouldUseNativeReader has no flavor gate, so every REST native read hit it;
the only escape was SET force_jni_scanner=true. DV-025 deferred this exact
corner to FIX-STATIC-CREDS-BE / FIX-REST-VENDED, but those fixed credential
down-flow to BE, not normalizeStorageUri — the deferral was never closed.

Legacy parity: PaimonScanNode.doInitialize computes a vended-overlay storage
map once (VendedCredentialsFactory.getStoragePropertiesMapWithVendedCredentials
— vended REPLACES the empty static map for REST) and uses it for
LocationPath.of at both the data-file (:443) and DV (:296) sites.

Solution: route the per-table vended token into native URI normalization,
replicating legacy precedence.
- SPI: add default overload ConnectorContext.normalizeStorageUri(uri, token)
  that ignores the token and delegates to the 1-arg form, so every non-paimon
  connector is unaffected.
- fe-core DefaultConnectorContext: extract the vended-typed-map build (filter
  cloud props -> StorageProperties.createAll -> index by Type) into a shared
  buildVendedStorageMap (single source of truth with vendStorageCredentials, no
  drift). The 2-arg override normalizes against the vended map when present and
  falls back to the static map otherwise (legacy "vended replaces static"); the
  1-arg form delegates with a null token (byte-identical to prior behavior).
  vendStorageCredentials keeps an outer try so its fail-soft boundary is
  preserved across the refactor.
- connector PaimonScanPlanProvider: extract the vended token ONCE per planScan
  (validToken() may refresh) and thread it through buildNativeRanges/
  buildNativeRange to both normalize sites. Empty for non-REST (FileIO gate) and
  offline -> folds to the static path, so non-REST reads are byte-unchanged.

Tests:
- fe-core DefaultConnectorContextNormalizeUriTest (+3): vended-REST normalize
  under an empty static map (the gap that hid the bug twice); fail-loud when the
  token is also empty (proves the fix is the token, not a swallow); static-map
  path unaffected by an empty token.
- connector PaimonScanPlanProviderTest (+1, 5 call sites updated): the per-table
  vended token is threaded verbatim to BOTH the data-file and DV normalize calls
  (RecordingConnectorContext now captures the 2-arg token).
- The positive RESTTokenFileIO token-extraction path needs a live REST stack and
  remains E2E-gated (enablePaimonTest=false), not run here.
Verified: connector 42/0/0; fe-core NormalizeUri 7/0, Vend 2/0, BackendStorageProps 2/0;
checkstyle 0 across spi/paimon/fe-core; connector import-gate clean.
Design + adversarial red-team: plan-doc/FIX-REST-VENDED-URI-NORMALIZE-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stead of real format (P7-1, MAJOR)

Root cause: PaimonScanPlanProvider.buildJniScanRange and buildCountRange
hardcoded .fileFormat("jni") on PaimonScanRange.Builder. The real
defaultFileFormat (= table.options().getOrDefault(file.format,"parquet"),
computed in planScanInternal) was passed into buildJniScanRange and IGNORED,
and was not passed into buildCountRange at all. PaimonScanRange
.populateRangeParams then emitted fileDesc.file_format="jni". BE
paimon_cpp_reader.cpp backfills paimon FILE_FORMAT/MANIFEST_FORMAT from this
field (only when unset/empty, guarded !file_format.empty()) to avoid defaulting
manifest.format=avro — with the invalid "jni" it injects MANIFEST_FORMAT=jni
(and FILE_FORMAT=jni when unset) and the manifest read breaks.

Key mechanism: the JNI formatType routing is gated by the paimon.split property
(PaimonScanRange.populateRangeParams), NOT by the fileFormat string (that string
drives formatType only on the native branch, where it is already real). So
emitting the real orc/parquet leaves JNI routing intact and only corrects the
inner fileDesc.file_format BE consumes — matching legacy
PaimonScanNode.setPaimonParams, which sets setFormatType(FORMAT_JNI) AND
setFileFormat(getFileFormat(...)) = the real data-file format.

Solution (connector-only, no BE change):
- buildJniScanRange: .fileFormat("jni") -> .fileFormat(defaultFileFormat) (the
  already-passed, previously-ignored parameter). Covers the non-DataSplit
  metadata-split call and the DataSplit JNI call.
- buildCountRange: add a defaultFileFormat parameter, use it, and thread it from
  the call site in planScanInternal.
- PaimonScanRange.Builder default: "jni" -> "" (every production caller sets the
  format explicitly; empty is the safe default — BE skips its format backfill on
  empty rather than ever injecting an invalid value).

Tests: PaimonScanPlanProviderTest (+1) jniAndCountRangesCarryRealFileFormatNotJni
— a real FileSystemCatalog PK table created with explicit file.format=orc (so
the asserted value is the table option, distinct from the parquet fallback):
force_jni_scanner=true scan -> every JNI data range carries "orc" (not "jni");
count-pushdown scan -> the collapsed count range carries "orc". Reverting either
method to "jni", or dropping the threaded defaultFileFormat, turns the assertion red.
Verified: connector 262/0/1skip (PaimonScanPlanProviderTest 43/0); checkstyle 0;
import-gate clean. Design: plan-doc/FIX-JNI-FILE-FORMAT-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…AJOR) done, next FIX-3

- FIX-1 FIX-REST-VENDED-URI-NORMALIZE committed c376aba
- FIX-2 FIX-JNI-FILE-FORMAT committed 2e845e8
- HANDOFF now points the next session at FIX-3 (FIX-INCR-SCAN-RESET) → FIX-4 (FE-config parity)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ode reset (P2-1, MAJOR)

Root cause: PaimonIncrementalScanParams.validate stripped legacy's defensive null
reset of scan.snapshot-id/scan.mode (PaimonScanNode:842-846), justified by a wrong
"a fresh per-query Table can't inherit scan.*" rationale. A base table that PERSISTS
scan.snapshot-id/scan.mode (legal & mutable via ALTER TABLE SET / TBLPROPERTIES /
table-default.*) carries it on every fresh load. Without the reset, resolveScanTable's
Table.copy merges the stale scan.snapshot-id with incremental-between and paimon 1.3.1
either THROWS ("[incremental-between] must be null when you set [scan.snapshot-id,
scan.tag-name]") or silently downgrades the @incr read to FROM_SNAPSHOT at the stale id
(wrong rows). The connector dropped exactly the safeguard legacy relied on.

Solution (Option 2; design red-team wf_ffd11631-ed2, DESIGN-SOUND): keep validate()
emitting only the non-null incremental-between* keys so the shared ConnectorMvccSnapshot
SPI / handle stay null-free, and reapply the two null resets at the single Table.copy
chokepoint via new PaimonIncrementalScanParams.applyResetsIfIncremental(scanOptions),
called in PaimonScanPlanProvider.resolveScanTable. paimon copyInternal consumes a null
value as options.remove(k), clearing the stale pin. The one edit covers BOTH callers
(native/JNI scan planScanInternal + JNI serialized-table getScanNodeProperties). Gated
on incremental-between / incremental-between-timestamp presence, so a genuine
scan.snapshot-id / scan.tag-name pin passes through unchanged (no false positive). Strict
legacy parity: resets scan.snapshot-id + scan.mode only. Corrected the now-refuted
"byte-parity on a freshly-loaded base" rationale in the affected javadoc/comments.

Tests: PaimonIncrementalScanParamsTest +4 (helper seeds the null resets for snapshot and
timestamp windows; passes non-incremental pins through unchanged; no-op for empty/null)
and reworded the keep-null-free validate() test; PaimonScanPlanProviderTest +1 real-table
(FileSystemCatalog over a persisted scan.snapshot-id), proven fail-before (paimon throws)
/ pass-after; PaimonConnectorMetadataMvccTest WHY-comment reworded (assertions unchanged).
Connector suites 20/44/37 green; checkstyle 0; import-gate clean. Connector-only — no SPI,
no BE change. Live @incr-over-persisted-scan.snapshot-id E2E is CI-gated (enablePaimonTest
=false), noted as gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FIX-3 FIX-INCR-SCAN-RESET committed f08bc22. Adds FIX-INCR-SCAN-RESET-summary.md,
marks FIX-3 done in the task-list, rolls HANDOFF to FIX-4 (FIX-FECONF-STORAGE-PARITY).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l legacy parity (P8-1..4, P9-2/3)

Root cause: the connector cannot import fe-core, so PaimonCatalogFactory rebuilds the FE-side Hadoop
Configuration/HiveConf from raw props with literal key logic. That reconstruction was incomplete vs the
legacy *Properties classes, so paimon catalogs on several storage backends failed FE-side catalog/metadata
access (the live FileSystemCatalog/HiveCatalog/JdbcCatalog could not resolve the storage FileIO).

Solution (connector-only; no fe-core/SPI/BE change):
- Extract a shared applyS3aBaseConfig helper (port of AbstractS3CompatibleProperties.appendS3HdfsProperties)
  taking caller-resolved creds AND the 4 tuning values, so each scheme passes its OWN aliases/defaults.
- 4a OSS: derive fs.oss.endpoint from region when blank (oss-<region>[-internal].aliyuncs.com, default
  -internal, publicAccess from dlf.access.public/dlf.catalog.accessPublic), MOVED from the DLF-local block
  into the shared OSS block (so filesystem+hms flavors get it too); also emit the S3A base for OSS.
  Removed the now-dead DLF-local derivation block.
- 4b S3: emit fs.s3a.path.style.access + connection.maximum/request.timeout/timeout. Tuning defaults are
  per-backend: S3=50/3000/1000 (incl AWS_* alias twins), OSS/COS/OBS=100/10000/10000 (a single shared
  default would silently mis-tune AWS S3).
- 4c COS/OBS: new applyCanonicalCosConfig/ObsConfig. Detection mirrors legacy guessIsMe (endpoint/warehouse
  PATTERN: myqcloud.com / myhuaweicloud.com) OR a cos./obs.-prefixed key, NOT scheme-key-only (a cosn://
  catalog configured with only s3.endpoint=cos...myqcloud.com would be missed otherwise). Each emits the
  S3A base (cosn/obs FS impl is S3AFileSystem, which reads fs.s3a.*) THEN the unconditional fs.cosn.* /
  fs.obs.* keys; OBS prefers the native OBSFileSystem when classpath-available.
- S3 endpoint-from-region (user-approved, same defect class as the OSS P8-1 fix): region-only AWS S3 derives
  https://s3.<region>.amazonaws.com.
- 4d HMS username: resolve hadoop.username from firstNonBlank(hive.metastore.username, hadoop.username)
  (alias priority), run AFTER the storage overlay so the raw hadoop.* passthrough cannot clobber it.
- 4e (folded in, pre-existing MAJOR found in impl review): the kerberos block forced
  hadoop.security.authentication=kerberos before applyStorageConfig, so a kerberized-HMS + simple-HDFS
  catalog had it clobbered back to simple by the raw hadoop.* passthrough (auth=simple but sasl=true ->
  broken GSSAPI). Relocated the kerberos block to run AFTER the overlay, mirroring legacy
  initHadoopAuthenticator-last ordering.

Design red-team (wf_a6385c61-669, 5 skeptics + completeness critic) caught the divergent tuning defaults,
the endpoint-pattern detection gap, and the unconditional fs.cosn.*/fs.obs.* requirement before coding;
impl verification (wf_f90260cb-5e6) confirmed byte-for-byte legacy key/alias/default fidelity and found 4e.

Tests: PaimonCatalogFactoryTest +15 (S3 endpoint-from-region, S3 50/3000/1000 tuning, path-style, OSS
endpoint-from-region filesystem+hms, OSS S3A base, COS keys + pattern-detect + unconditional region, OBS
keys + pattern-detect, no-COS/OBS-for-plain-S3, HMS username alias + priority, kerberos-survives-simple-HDFS).
The priority + kerberos tests are RED on the pre-move ordering. Verified: connector 56/0/0 +
full module green; checkstyle 0; import-gate clean. Live e2e (paimon_base_filesystem/dlf/hms suites) CI-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ne; all 4 round-3 fixes complete, next B8

- Mark FIX-4 done (commit f0210b5) in task-list-P5-rereview3-fixes.md; record the beyond-literal-scope
  items (user-approved S3 endpoint-from-region, per-backend tuning defaults, endpoint-pattern detection,
  unconditional fs.cosn.*/fs.obs.*, folded-in 4e kerberos-ordering MAJOR) and the known out-of-scope residual.
- Add FIX-FECONF-STORAGE-PARITY-summary.md.
- Roll HANDOFF: all 4 user-approved round-3 fixes (FIX-1..FIX-4) complete; next session = B8 legacy deletion
  (paimon/* + *Properties dead residue, now that FIX-4 no longer needs them as a literal-port reference)
  + round-3 follow-ups (D-057 re-scope, accepted-deviation sign-off, uncheckedFallbacks), gated on an
  AskUserQuestion scope check since B8 is a large change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oop FS closure

Root cause: the Paimon connector plugin runs under a child-first ClassLoader with
org.apache.hadoop NOT parent-first, and bundled hadoop-common/hadoop-client-api but
NOT hadoop-aws. So FileSystem/SecurityUtil loaded child-first while S3AFileSystem
resolved from the parent 'app' loader -> cross-loader ClassCastException
('S3AFileSystem cannot be cast to FileSystem') and a permanent SecurityUtil.<clinit>
poison ('Could not initialize class ...SecurityUtil', 'DNSDomainNameResolver not
DomainNameResolver', 'ServiceConfigurationError: NullScanFileSystem not a subtype'),
cascading to 'Unknown database X'. ~39 of 42 external-regression suites failed on the
af2037 TeamCity run; not fixed by any later commit.

Solution (self-contained plugin — aligns with fe-core dropping hadoop/hive-catalog-shade
after full connector migration; does NOT lean on the parent):
- pom: add hadoop-aws (the only missing FS impl, S3AFileSystem; DistributedFileSystem
  already comes from the transitive hadoop-client-api). hive-common stays bundled.
- PaimonCatalogFactory.buildHadoopConfiguration: conf.setClassLoader(plugin loader) so
  Configuration.getClass("fs.<scheme>.impl") resolves the FS impl from the plugin loader.
- PaimonConnector.createCatalogFromContext (single chokepoint for all flavors): pin the
  thread-context classloader to the plugin loader around catalog creation so the
  FileSystem ServiceLoader and SecurityUtil static init resolve from the child. Mirrors
  JdbcConnectorClient / ThriftHmsClient.

Tests: connector build SUCCESS + all connector UTs 0 fail/0 error; plugin lib/ now
contains hadoop-aws/S3AFileSystem; checkstyle + connector import-gate clean. The full
runtime proof is the docker external paimon suite (CI-gated, enablePaimonTest) — not run
locally. See plan-doc/FIX-PAIMON-HADOOP-CLASSLOADER-{design,summary}.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROPERTIES to paimon

Root cause: branch commit 98a73bf (D-046 paimon parity) added LOCATION+PROPERTIES
emission to the SHARED PLUGIN_EXTERNAL_TABLE branch of Env.getDdlStmt, gated only on
!properties.isEmpty(). JDBC/ES/Trino catalogs are plugin-driven with non-empty
getTableProperties() (connection props incl. credentials), so SHOW CREATE TABLE on a JDBC
external table emitted LOCATION '' + PROPERTIES("password"=...) instead of the legacy
comment-only ENGINE=JDBC_EXTERNAL_TABLE; — a correctness regression
(test_nereids_refresh_catalog) and a JDBC credential leak. Still present on HEAD.

Solution: gate the LOCATION+PROPERTIES emission additionally on
TableType.PAIMON_EXTERNAL_TABLE.name().equals(getEngineTableTypeName()) — only the paimon
engine type (the sole plugin-driven connector whose legacy DDL carried LOCATION/PROPERTIES)
renders them. JDBC/ES/Trino/MaxCompute revert to comment-only; the credential leak is
closed. Did NOT rebaseline the .out (would entrench the leaked-credential output).

Tests: fe-core compile SUCCESS + checkstyle clean; adversarial static review SOUND (paimon
incl. sys-table unwrap still renders LOCATION/PROPERTIES; jdbc/es/trino/maxcompute match
committed comment-only .out; getTableProperties has no other DDL consumer). e2e:
external_table_p0/nereids_commands/test_nereids_refresh_catalog (CI external pipeline). See
plan-doc/FIX-SHOWCREATE-PLUGIN-PROPS-{design,summary}.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@morningman morningman force-pushed the catalog-spi-07-paimon branch from d6c93da to f7114a2 Compare June 12, 2026 14:23
morningman and others added 9 commits June 13, 2026 06:05
…ma-cache (CI 968828)

Root cause: PluginDrivenSysExternalTable did not override getSchemaCacheValue(), so it
inherited ExternalTable.getSchemaCacheValue() which routes through ExternalCatalog.getSchema()
and re-resolves the table by name in the db map. A transient system table (e.g. tbl$snapshots /
tbl$manifests) is never registered in that map, so the lookup failed with "failed to load schema
cache value for: ...$snapshots". Regression from the paimon SPI migration; legacy
PaimonSysExternalTable avoided it by overriding getSchemaCacheValue()/initSchema() to compute on
the transient instance.

Solution: override getSchemaCacheValue() (and initSchema(SchemaCacheKey)) to compute the schema
directly via the inherited PluginDrivenExternalTable.initSchema() (which honors this class's
resolveConnectorTableHandle that threads the sys-table handle), memoized with double-checked
locking — mirroring legacy PaimonSysExternalTable.

Tests: covered by existing e2e suites paimon_system_table ($manifests), paimon_time_travel
($snapshots), test_paimon_system_table_auth (re-run in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…68828)

Root cause: PaimonConnectorMetadata.mapFields built ConnectorColumn via the 5-arg ctor, which
defaults isKey=false; ConnectorColumnConverter propagates it, so DESC showed Key=false for every
paimon column. Legacy PaimonExternalTable/PaimonSysExternalTable always set Column isKey=true (3rd
positional arg) for every column, so the .out files expect Key=true. Caused test_paimon_schema_change,
test_paimon_char_varchar_type, test_paimon_timestamp_with_time_zone DESC diffs.

Solution: pass isKey=true via the 6-arg ConnectorColumn ctor in mapFields (single chokepoint for
latest + at-snapshot + system-table schema paths; toSchemaCacheValue preserves isKey on remap).

Tests: extended PaimonConnectorMetadataTest.getTableSchemaForcesColumnsNullableForLegacyParity to
pin isKey=true for both a PK and a non-PK column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split (CI 968828)

Root cause: the paimon (and hudi) plugin-zip bundled org.apache.thrift:libthrift and loaded
org.apache.thrift.* child-first (not in the connector parent-first allowlist), while fe-thrift is
provided so org.apache.doris.thrift.TFileScanRangeParams resolves parent-first and implements the
PARENT's TBase. PaimonScanPlanProvider.encodeSchemaEvolution()'s TSerializer.serialize(carrier)
then mixes a child TSerializer with a parent-TBase carrier -> IncompatibleClassChangeError. Being an
Error (not Exception), it escaped catch(Exception) and the connection handler, killing the mysql
session. This was the dominant CI failure (~19 tests: 2 ANALYZE, the family-D connection drops, and
the predict/timestamp_tz/sql_block_rule explain failures).

Solution:
- Exclude org.apache.doris:fe-thrift + org.apache.thrift:libthrift from the paimon and hudi
  plugin-zip assemblies, so org.apache.thrift.* resolves from the single parent fe-core copy that
  also owns org.apache.doris.thrift.* (matches the es/jdbc/hive/maxcompute assemblies).
- Defense-in-depth: broaden encodeSchemaEvolution's catch to Exception | LinkageError so any future
  linkage error surfaces as a clean per-query failure instead of an uncaught Error that kills the
  whole connection (this is what turned ~5 real failures into ~19 collateral ones).

Verified: rebuilt paimon and hudi plugin zips no longer contain libthrift/fe-thrift.
Tests: e2e re-run in CI (the native-path paimon suites).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter scans (CI 968828)

Root cause: on the SPI plugin scan path, PaimonScanPlanProvider.getScanNodeProperties emitted the
paimon.predicate property only when filter.isPresent() && !predicates.isEmpty(), and
populateScanLevelParams set the thrift field only when non-null. So a paimon read with no
pushed-down filter (e.g. force_jni_scanner=true `select *`) omitted paimon_predicate entirely; BE
then omitted the JNI key, and PaimonJniScanner.getPredicates() called PaimonUtils.deserialize(null)
-> NPE "encodedStr is null". Legacy PaimonScanNode.createScanRangeLocations always serialized the
(possibly empty) predicate list, so the field was always present. Caused test_paimon_catalog_varbinary,
paimon_tb_mix_format, paimon_partition_legacy, paimon_timestamp_types, test_paimon_partition_table.

Solution:
- getScanNodeProperties always serializes the predicate list (empty list -> non-null base64 string)
  and emits paimon.predicate unconditionally, restoring the legacy invariant.
- BE backstop: PaimonJniScanner.getPredicates() treats a null paimon_predicate param as "no filter"
  (returns emptyList) so the JNI reader never NPEs on a missing param.

Tests: PaimonScanPlanProviderTest.getScanNodePropertiesAlwaysEmitsPredicateForNoFilterScan pins that
a no-filter scan emits paimon.predicate and it deserializes to an empty list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8-family root-cause analysis (adversarially verified) of the 37 external-regression failures.
7 in-scope paimon-SPI regressions + 2 out-of-scope (hive CTAS stale test; BE shutdown ASAN race).
RC-1/2/6/7 fixed (contained); RC-3/4/5 deferred to the docker-gated self-contained-classloader batch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imon plugin (CI 968828)

Root cause: the connector sets fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem, but that impl
ships only in the thirdparty jindofs jars (packaged by post-build.sh into fe/lib/jindofs, not a maven
artifact). The paimon plugin runs child-first, so JindoOssFileSystem resolves from the parent and
cannot be cast to the plugin's child-loaded org.apache.hadoop.fs.FileSystem -> "JindoOssFileSystem
cannot be cast to FileSystem" -> "Unknown database" on first OSS listing (paimon_base_filesystem,
test_paimon_deletion_vector_oss). The maven route is unbuildable (jindo-sdk/jindo-core are bound to an
undeclared jindodata repo -> "present but unavailable"; runtime jindofs is 6.10.4, not in maven).

Solution: after deploying the connector plugins, copy the jindofs jars (already placed in fe/lib/jindofs
by post-build.sh) into the paimon plugin lib so JindoOssFileSystem loads child-first alongside the
plugin's own hadoop FileSystem. Naturally gated (no-op unless --jindofs/DISABLE_BUILD_JINDOFS=OFF).

CAVEAT (docker-gated, enablePaimonTest=true): jindo-core ships a native lib that binds to one
classloader per JVM, so this is safe only while no concurrent non-paimon path loads jindo from
fe/lib/jindofs in the same FE process — must be confirmed by the docker paimon suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on plugin (CI 968828)

Root cause: the prior fix (FIX-PAIMON-HADOOP-CLASSLOADER) bundled hadoop-aws into the plugin
(S3AFileSystem child-first) but NOT the AWS SDK v2 (hadoop-aws declares it as software.amazon.awssdk:bundle,
which fe/pom.xml excludes). So the plugin's S3AInternalAuditConstants.<clinit> registered an
ExecutionAttribute against the single PARENT-loaded sdk-core static, colliding with fe-core's S3A in
ExecutionAttribute.ensureUnique() -> ExceptionInInitializerError that permanently poisoned S3A for the
whole FE JVM (test_iceberg_jdbc_catalog/statistics/case_sensibility, test_paimon_statistics).

Solution: bundle the AWS SDK v2 (software.amazon.awssdk:s3 + apache-client, BOM-managed 2.29.52) into the
plugin child-first, so the plugin's S3A registers against its OWN ExecutionAttribute static. s3's compile
closure brings sdk-core (ExecutionAttribute); apache-client is explicit (hadoop-aws wires ApacheHttpClient).
software.amazon.awssdk stays child-first (not parent-first) — the separate child SDK copy is the point.

Verified: rebuilt plugin zip bundles lib/sdk-core-2.29.52.jar containing
software/amazon/awssdk/core/interceptor/ExecutionAttribute.class. Runtime S3A read + assumed-role/STS
docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… client (CI 968828)

Root cause: paimon-hive-connector's RetryingMetaStoreClientFactory probes getProxy(HiveConf,...) via
reflection, but RetryingMetaStoreClient/HiveMetaHookLoader resolved from the parent hive-catalog-shade-3.1.1
whose getProxy overloads use the PARENT's Configuration/HiveConf Class objects -> exact Class-identity
mismatch across loaders -> all probes NoSuchMethodException -> "Failed to create the desired metastore
client" (test_create_paimon_table). The metastore itself is reachable.

Solution: bundle org.apache.hive:hive-metastore:2.3.7 (RetryingMetaStoreClient/HiveMetaStoreClient/
HiveMetaHookLoader + metastore api) child-first so its getProxy(HiveConf,...) overloads compile against the
SAME child-bundled hive-common-2.3.9 HiveConf the connector builds. 2.3.7 pairs with hive-common 2.3.9
(API-stable HiveConf) and is fastutil-CLEAN, so unlike hive-catalog-shade it does not reintroduce the
fastutil collision. libfb303 rides transitively; server-side datanucleus/derby/hbase/tephra, the stale
hadoop-2.7.2 trio + guava, and libthrift are excluded (libthrift stays parent-first like the other
connectors).

Verified: rebuilt plugin zip bundles lib/hive-metastore-2.3.7.jar (RetryingMetaStoreClient with 5
getProxy(HiveConf) overloads) + libfb303; 0 fastutil entries; no hadoop-2.7.2 leak. The thrift
0.9.3-vs-host-0.16.0 wire skew and the DLF ProxyMetaStoreClient path are docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RC-3 AWS SDK (b5205c4), RC-5 HMS client (7841830), RC-4 jindo via build.sh (e881247).
Runtime behavior gated on the docker paimon suite (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants