HDDS-14812. Support datanode topology in MiniOzoneCluster#10598
HDDS-14812. Support datanode topology in MiniOzoneCluster#10598hevinhsu wants to merge 3 commits into
Conversation
peterxcli
left a comment
There was a problem hiding this comment.
I remember you said this is the behaviour cloning of existing topology test cases but in unit test, could you also add them in description? Thanks!
|
Thanks @peterxcli for the reminder! I've updated the PR description. Originally I intended to port the unit test cases directly, but the unit test logic relies on mocked components that don't translate to an integration test context, so I instead validated that rack topology works correctly at a high level across the three test classes. Let me know if you'd like more test scenarios covered or if the tests should be included in this PR. |
|
Thanks @hevinhsu for the patch, I'll review this soon.
@hevinhsu I think it's good to include the rack placement test and we can add a test for each of the pipeline and container placement policy. For the TestScmHAFinalizationWithRacks and TestSecretKeySnapshotWithRack I don't think they are affected by the the topology so might not be needed. |
What changes were proposed in this pull request?
Support rack and host topology awareness in
MiniOzoneClusterandMiniOzoneHAClusterImpl. This enables integration tests to properly exercise topology-sensitive pipeline placement.Key changes:
FixedHostMapping, aCachedDNSToSwitchMappingimplementation that resolves hostnames to rack locations via a static map, bypassing DNS lookups to handle synthetic or unresolvable test hostnames.DatanodeDetailsvia the DN yaml file mechanism to bypassvalidateDatanodeIpAddress()duringDatanodeServicestartup, which performs a native DNS lookup and would fail for synthetic hostnames.PipelinePlacementPolicyobserves the full DN pool and produces rack-diverse pipelines.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14812
How was this patch tested?
CI was run on the changes in this PR: MiniOzoneCluster topology tests
Additionally, three integration test classes covering rack-aware scenarios were added and verified in a personal fork (not included in this PR). The tests are structured around three topology configurations: racks+hosts, racks only, and hosts only.
TestRackAwarePlacement— verifies that rack and hostname topology configuredvia
setRacks()/setHosts()is correctly propagated to SCM'sNodeManagerthrough the real cluster startup path, covering three topology configurations
(racks + hosts, racks only, hosts only):
testDatanodesHaveCorrectRack: verifies each datanode'snetworkLocationmatchesthe configured rack
testRatisPipelineSpansMultipleRacks: verifies all open RATIS THREE pipelinesspan ≥ 2 racks
testDatanodesAllInDefaultRack(hosts only): verifies all nodes fall back toNetworkTopology.DEFAULT_RACKwhen no rack is configuredtestDatanodesHaveCorrectHostname: verifies SCM-registered hostnames matchsetHosts()TestScmHAFinalizationWithRacks— verifies that SCM HA upgrade finalization succeeds under different topology configurations. Modelled afterTestScmHAFinalization, withsetRacks()/setHosts()added to the cluster builder:testFinalizationWithLeaderChange*: verifies finalization completes correctly after a leader change mid-finalizationtestFinalizationWithRestart*: verifies finalization resumes correctly after all SCMs are restartedtestSnapshotFinalization*: verifies a lagging SCM correctly catches up via snapshot installationTestSecretKeySnapshotWithRack— verifies that symmetric secret keys are correctly synchronized from leader to follower during snapshot installation under different topology configurations. Modelled afterTestSecretKeySnapshot, withsetRacks()/setHosts()added to the cluster builder:testInstallSnapshot*: verifies the follower receives and applies the leader's secret keys correctly after snapshot installationTests were validated with green CI.