Skip to content

refactor: add v1alpha2 crd version for spark applications/templates#711

Draft
razvan wants to merge 6 commits into
mainfrom
refactor/remove-job
Draft

refactor: add v1alpha2 crd version for spark applications/templates#711
razvan wants to merge 6 commits into
mainfrom
refactor/remove-job

Conversation

@razvan

@razvan razvan commented Jun 29, 2026

Copy link
Copy Markdown
Member

Description

Fixes #659
Fixes #701

On-Prem meeting notes: https://app.nuclino.com/Stackable/Engineering/2026-06-On-Site-Spark-Operator-Rewrite-e8fc2be4-9c0c-41ce-bf83-8ac057ba9c7c

Changes in the handling of Spark applications:

  • Add v1alpha2 CRD version with the changes below.
  • Deprecated spec.mode field. Applications now run in deploy-mode: client.
  • Deprecated spec.job field and ignore it. Now in client mode, a Job is derived from the spec.driver field.
  • Removed pod_driver_controller.rs. The application status is now derived directly from the driver's Job.

Breaking changes:

  • spec.job.config is now ignored.
    • Applications must move spec.job.config.affinity to spec.driver.config.affinity.
    • spec.job.retryOnFailureCount is not supported. TODO: reconsider this and maybe also think about streaming applications.
  • spec.sparkConf cannot be used to set pod driver properties anymore.
    • Applications need to remove all spark.kubernetes.driver.* properties from spec.sparkConf.
    • Pod driver resources must be set via spec.driver.config.resources
    • Pod driver name cannot be set at all.

Issues to sort through:

  • CRD Decision.
  • Add a smoke test for v1alpha2.
  • Streaming.
  • Should the driver service be set up by the listener operator?
    • It's always internal and should not be exposed to users.

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

@razvan razvan self-assigned this Jun 29, 2026
razvan added 3 commits June 29, 2026 15:24
* ensure pre and post hooks run in the driver pod by calling
run-spark.sh from the image.
* give driver pods a stable hostname '<app>-driver' which now differs
from the pod name but it keeps backward compatibility relevant for
logging.
@razvan

razvan commented Jun 29, 2026

Copy link
Copy Markdown
Member Author
--- PASS: kuttl (1153.67s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/spark-pi-private-s3_openshift-false_spark-4.1.1 (85.67s)
        --- PASS: kuttl/harness/overrides_openshift-false_spark-4.1.1 (125.94s)
        --- PASS: kuttl/harness/logging_openshift-false_spark-logging-4.1.1_ny-tlc-report-0.3.0 (236.19s)
        --- PASS: kuttl/harness/custom-log-directory_openshift-false_spark-4.1.1_hdfs-latest-3.5.0_zookeeper-latest-3.9.5 (148.77s)
        --- PASS: kuttl/harness/smoke_openshift-false_spark-4.1.1_s3-use-tls-true (191.21s)
        --- PASS: kuttl/harness/pyspark-ny-public-s3_openshift-false_spark-4.1.1 (103.65s)
        --- PASS: kuttl/harness/pyspark-ny-public-s3-image_openshift-false_spark-4.1.1_ny-tlc-report-0.3.0 (108.56s)
        --- PASS: kuttl/harness/spark-ny-public-s3_openshift-false_spark-4.1.1_s3-use-tls-true (163.34s)
        --- PASS: kuttl/harness/resources_openshift-false_spark-4.1.1 (22.61s)
        --- PASS: kuttl/harness/spark-history-server_openshift-false_spark-4.1.1_s3-use-tls-true (269.73s)
        --- PASS: kuttl/harness/spark-connect_openshift-false_iceberg-latest-1.11.0_hive-iceberg-4.0.0_spark-connect-4.1.1_s3-use-tls-true (154.22s)
        --- PASS: kuttl/harness/hbase-connector_openshift-false_spark-hbase-connector-3.5.8_hbase-2.6.4_hdfs-latest-3.5.0_zookeeper-latest-3.9.5 (155.82s)
        --- PASS: kuttl/harness/spark-examples_openshift-false_spark-4.1.1 (25.57s)
        --- PASS: kuttl/harness/spark-connect-kerberos_openshift-false_iceberg-latest-1.11.0_hive-iceberg-4.0.0_spark-connect-4.1.1_krb5-1.21.1_kerberos-realm-CLUSTER.LOCAL_s3-use-tls-true (164.87s)
        --- PASS: kuttl/harness/iceberg_openshift-false_spark-iceberg-4.1.1_iceberg-latest-1.11.0 (37.05s)
        --- PASS: kuttl/harness/delta-lake_openshift-false_spark-delta-lake-4.1.1 (238.50s)
PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark CR deletion leaves orphaned pods SparkApplication CRD improvements

1 participant