SDSTOR-22293: handle logdev recovery corner case#894
Conversation
|
@JacksonYao287 Since there isn’t a clean, fully compatible way to handle this case right now (for example, the two sbs in logstore are not atomic), I only changed logstream recovery in this PR. PTAL. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## stable/v7.x #894 +/- ##
==============================================
Coverage ? 48.30%
==============================================
Files ? 110
Lines ? 12972
Branches ? 6232
==============================================
Hits ? 6266
Misses ? 2564
Partials ? 4142 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
xiaoxichen
left a comment
There was a problem hiding this comment.
TL;DR
Instead of fighting with the corner cases, it seems like we dont need the dynamic feature at all, we knows the journal vdev size and we know our #PG per SM, any possibility of static allocation / pre-fill works with minimal modification ?
|
it`s a big change, will review it later this week. |
I don't think static allocation/pre-fill would be a minimal modification. It would fundamentally change the chunk chain structure for each PG into a pre-allocated ring, requiring significant redesign across several areas:
If we want to handle this issue, this PR is a quick fix. If we want to revisit the chunk allocation model, that should be treated as a standalone architectural initiative — worth looping in all affected members to evaluate and plan for the right release. |
Thanks! raft.md and raft_repl_dev_log_dev.md just provide related knowledge (future usage for AI triage more issues), you can skip them. |
also add relevant knowledge to docs/structures to support AI in future triaging
| m_log_idx.store(hint->head_start_idx, std::memory_order_release); | ||
| m_vdev_jd->update_data_start_offset(hint->head_start_offset); | ||
| device_cursor = hint->head_start_offset; | ||
| do_load(device_cursor); |
There was a problem hiding this comment.
not sure do we need to update m_logdev_meta and persist with the correct offset in private data?
also add relevant knowledge to docs/structures to support AI in future triaging