Skip to content

debezium/dbz#2040 feat: add core SyncManager for routing CDC operations#7

Open
KMohnishM wants to merge 3 commits into
gsoc-week-2-transformationfrom
gsoc-week-3-sync
Open

debezium/dbz#2040 feat: add core SyncManager for routing CDC operations#7
KMohnishM wants to merge 3 commits into
gsoc-week-2-transformationfrom
gsoc-week-3-sync

Conversation

@KMohnishM

@KMohnishM KMohnishM commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes debezium/dbz#2040
This pull request introduces the core Synchronization Layer (SyncManager) for Week 3 of the GSoC CDC project. It establishes the event routing engine, retry mechanism, and Dead Letter Queue (DLQ).

Key Changes

  • Core SyncManager (pydebeziumai/sync/manager.py):
    • Implemented event routing logic for CDC operation mapping:
      • Create/Read (c/r) $\rightarrow$ upsert
      • Update (u) $\rightarrow$ delete then upsert (avoiding dangling metadata or duplicate embeddings)
      • Delete (d) $\rightarrow$ delete / soft-delete update.
    • Added support for soft_delete=True, updating metadata with _is_deleted=True using the deleted row's before payload state.
  • Retry Logic: Implemented RetryConfig providing exponential backoff retries with randomized jitter for transient connection errors.
  • Dead Letter Queue (DLQ): Implemented a thread-safe DeadLetterQueue to hold failed events when maximum retries are exceeded.
  • Unit Tests (tests/unit/test_sync_manager.py): Fully unit-tested the manager class, covering operations routing, soft delete configurations, retry policies (mocking sleep), and DLQ routing.
  • CI Trigger Adjustments: Removed the branches: [main] filter from the CI and commit-signoff workflows, allowing these validation checks to run on pull requests targeting stacked branches.

Verification Results

All checks passed locally under WSL:

  • Ruff: Passed
  • MyPy: Passed (Success: no issues found in 25 source files)
  • PyTest: 52 passed (including 6 new unit tests for SyncManager)

@KMohnishM KMohnishM changed the title Gsoc week 3 sync debezium/dbz#2040 feat: add core SyncManager for routing CDC operations Jun 10, 2026
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from 7fe87d1 to c7314b2 Compare June 11, 2026 10:21
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from c7314b2 to d5eb1f7 Compare June 11, 2026 10:28
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from d5eb1f7 to 85f8f5b Compare June 11, 2026 11:21
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from 85f8f5b to f615937 Compare June 12, 2026 09:00
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from f615937 to 9af708f Compare June 12, 2026 12:07
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from 9af708f to 43611d6 Compare June 12, 2026 14:21
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from 43611d6 to edbff15 Compare June 12, 2026 14:28
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from edbff15 to 52fc55a Compare June 18, 2026 12:41

@vjuranek vjuranek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments, otherwise LGTM

Comment thread pydebeziumai/sync/manager.py Outdated
"""Thread-safe Dead Letter Queue (DLQ) to hold failed events."""

def __init__(self) -> None:
self._queue: queue.Queue[tuple[DebeziumEventModel, Exception]] = queue.Queue()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to have some limit on the queue not to consume infinite amount of memory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a configurable max_size parameter (defaulting to 1000) to the DeadLetterQueue constructor. If the queue fills up, new failed events are safely dropped with a warning logged to prevent out-of-memory errors on high-volume CDC failure streams. Added a test_dlq_max_size_limit unit test to cover this behavior.

Comment thread pydebeziumai/sync/manager.py Outdated
Comment on lines +173 to +191
if self.soft_delete:
page_content, row_metadata = self.document_builder.projection_policy.project(event)
system_meta: dict[str, Any] = {
"_table": event.table_name,
"_schema": event.schema_name,
"_op": op,
"_doc_id": doc_id,
"_is_deleted": True,
}
if event.payload.ts_ms is not None:
system_meta["_ts_ms"] = event.payload.ts_ms

metadata = {**row_metadata, **self.document_builder.extra_metadata, **system_meta}
metadata = self.document_builder._sanitize_metadata(metadata)

document = Document(
page_content=page_content,
metadata=metadata,
id=doc_id,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posibly move this into DocumentBuilder?

Signed-off-by: Mohnish <kmohnishm@gmail.com>
…iggers

Signed-off-by: Mohnish <kmohnishm@gmail.com>
…sanitizer API

Signed-off-by: KMohnishM <kmohnishm@gmail.com>
@KMohnishM KMohnishM force-pushed the gsoc-week-2-transformation branch from 52fc55a to 2f3f212 Compare June 18, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants