Skip to content

HDDS-14812. Support datanode topology in MiniOzoneCluster#10598

Open
hevinhsu wants to merge 3 commits into
apache:masterfrom
hevinhsu:HDDS-14812
Open

HDDS-14812. Support datanode topology in MiniOzoneCluster#10598
hevinhsu wants to merge 3 commits into
apache:masterfrom
hevinhsu:HDDS-14812

Conversation

@hevinhsu

@hevinhsu hevinhsu commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Support rack and host topology awareness in MiniOzoneCluster and MiniOzoneHAClusterImpl. This enables integration tests to properly exercise topology-sensitive pipeline placement.

Key changes:

  • Add FixedHostMapping, a CachedDNSToSwitchMapping implementation that resolves hostnames to rack locations via a static map, bypassing DNS lookups to handle synthetic or unresolvable test hostnames.
  • Pre-populate DatanodeDetails via the DN yaml file mechanism to bypass validateDatanodeIpAddress() during DatanodeService startup, which performs a native DNS lookup and would fail for synthetic hostnames.
  • Trigger pipeline recreation after all DataNodes are registered to ensure PipelinePlacementPolicy observes the full DN pool and produces rack-diverse pipelines.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14812

How was this patch tested?

CI was run on the changes in this PR: MiniOzoneCluster topology tests

Additionally, three integration test classes covering rack-aware scenarios were added and verified in a personal fork (not included in this PR). The tests are structured around three topology configurations: racks+hosts, racks only, and hosts only.

TestRackAwarePlacement — verifies that rack and hostname topology configured
via setRacks() / setHosts() is correctly propagated to SCM's NodeManager
through the real cluster startup path, covering three topology configurations
(racks + hosts, racks only, hosts only):

  • testDatanodesHaveCorrectRack: verifies each datanode's networkLocation matches
    the configured rack
  • testRatisPipelineSpansMultipleRacks: verifies all open RATIS THREE pipelines
    span ≥ 2 racks
  • testDatanodesAllInDefaultRack (hosts only): verifies all nodes fall back to
    NetworkTopology.DEFAULT_RACK when no rack is configured
  • testDatanodesHaveCorrectHostname: verifies SCM-registered hostnames match
    setHosts()

TestScmHAFinalizationWithRacks — verifies that SCM HA upgrade finalization succeeds under different topology configurations. Modelled after TestScmHAFinalization, with setRacks() / setHosts() added to the cluster builder:

  • testFinalizationWithLeaderChange*: verifies finalization completes correctly after a leader change mid-finalization
  • testFinalizationWithRestart*: verifies finalization resumes correctly after all SCMs are restarted
  • testSnapshotFinalization*: verifies a lagging SCM correctly catches up via snapshot installation

TestSecretKeySnapshotWithRack — verifies that symmetric secret keys are correctly synchronized from leader to follower during snapshot installation under different topology configurations. Modelled after TestSecretKeySnapshot, with setRacks() / setHosts() added to the cluster builder:

  • testInstallSnapshot*: verifies the follower receives and applies the leader's secret keys correctly after snapshot installation

Tests were validated with green CI.

@peterxcli peterxcli left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you said this is the behaviour cloning of existing topology test cases but in unit test, could you also add them in description? Thanks!

@hevinhsu

Copy link
Copy Markdown
Contributor Author

Thanks @peterxcli for the reminder! I've updated the PR description.

Originally I intended to port the unit test cases directly, but the unit test logic relies on mocked components that don't translate to an integration test context, so I instead validated that rack topology works correctly at a high level across the three test classes.

Let me know if you'd like more test scenarios covered or if the tests should be included in this PR.

@ivandika3

ivandika3 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Thanks @hevinhsu for the patch, I'll review this soon.

Let me know if you'd like more test scenarios covered or if the tests should be included in this PR.

@hevinhsu I think it's good to include the rack placement test and we can add a test for each of the pipeline and container placement policy. For the TestScmHAFinalizationWithRacks and TestSecretKeySnapshotWithRack I don't think they are affected by the the topology so might not be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants