You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was testing out Comet in a multi-stage ETL process (contains multiple joins and native Iceberg scans). Some of the steps look like a usual Comet execution:
However when aggregating the final result (to test Comet vs Spark sanity) - getting bloated values for sum aggregation. We can even see that without the aggregation, just comparing single values for users (domain-specific) across both resulting tables.
Note:
using "spark.comet.scan.icebergNative.enabled": "true". So native Iceberg scan is enabled. And using it from both iceberg-rust and iceberg-storage-opendal repo main branch. Why? There was a fix regarding reading .parquet files that didn't contain page index. Before that - whole query would faild. Now it works, but produces inconsistent results. Check fix(reader): graceful handling of missing column index iceberg-rust#2693
bloated values are consistent and deterministic. So it isn't related to spark.comet.exec.strictFloatingPoint being set to true or false. Wrong results are being consistent.
When disabling native Iceberg scan - query runs as intended, results are being stored in an iceberg table and results are consistent across vanilla Spark and Comet. So the problem itself lies somewhere in the iceberg-rust integration in Comet.
Steps to reproduce
No response
Expected behavior
Same (or almost the same down to floating point precision) values for both Spark and Comet
Describe the bug
Was testing out Comet in a multi-stage ETL process (contains multiple joins and native Iceberg scans). Some of the steps look like a usual Comet execution:
However when aggregating the final result (to test Comet vs Spark sanity) - getting bloated values for
sumaggregation. We can even see that without the aggregation, just comparing single values for users (domain-specific) across both resulting tables.Note:
"spark.comet.scan.icebergNative.enabled": "true". So native Iceberg scan is enabled. And using it from bothiceberg-rustandiceberg-storage-opendalrepomainbranch. Why? There was a fix regarding reading.parquetfiles that didn't contain page index. Before that - whole query would faild. Now it works, but produces inconsistent results. Check fix(reader): graceful handling of missing column index iceberg-rust#2693spark.comet.exec.strictFloatingPointbeing set to true or false. Wrong results are being consistent.When disabling native Iceberg scan - query runs as intended, results are being stored in an iceberg table and results are consistent across vanilla Spark and Comet. So the problem itself lies somewhere in the
iceberg-rustintegration in Comet.Steps to reproduce
No response
Expected behavior
Same (or almost the same down to floating point precision) values for both Spark and Comet
Additional context
No response