Skip to content

debezium/dbz#2074 feat: add real-time RAG chatbot demonstration example#10

Open
KMohnishM wants to merge 1 commit into
gsoc-week-4-retrieverfrom
gsoc-week-4-chatbot
Open

debezium/dbz#2074 feat: add real-time RAG chatbot demonstration example#10
KMohnishM wants to merge 1 commit into
gsoc-week-4-retrieverfrom
gsoc-week-4-chatbot

Conversation

@KMohnishM

Copy link
Copy Markdown
Contributor

Description

This Pull Request implements the real-time RAG Chatbot demonstration example for PyDebeziumAI (Week 4, PR 2).

This serves as the first fully functional end-to-end demonstration of the library's capabilities: streaming PostgreSQL logical replication transactions (CDC) via an embedded Debezium engine, transforming and synchronizing them into Chroma DB in real-time, and querying the updated database records using a LangGraph agent.

Resolves: debezium/dbz#2074
Stacked on: PR 1 (gsoc-week-4-retriever)

Key Components Added

  1. Local Infrastructure (examples/rag_chatbot/docker-compose.yml & setup_db.sql):

    • PostgreSQL service configured with logical replication enabled (wal_level=logical) mapping on port 5432.
    • Seeding script containing initial mock data for product inventory.
  2. Sync Daemon (examples/rag_chatbot/stream_sync.py):

    • Sync background script running embedded Debezium, translating logical row transactions, and synchronizing database changes into Chroma DB in real-time using ChromaAdapter and SyncManager.
    • Logs change events in real-time to visualize incoming CDC transactions.
  3. Interactive LangGraph Chatbot (examples/rag_chatbot/chatbot.py):

    • An interactive console application querying Chroma DB in real-time.
    • Integrates the create_retriever_tool helper from PR 1.
    • Multi-Model Support: Automatically detects and transitions between OpenAI GPT models (if OPENAI_API_KEY is configured), local Ollama models (such as llama3.2 or smollm running natively), and a local mock RAG model (fallback).
    • Verbose Pipeline Visualisation: Automatically prints the raw Vector DB search queries, matching database records, and their complete CDC metadata (e.g. _op flag, price value, and record keys) during tool execution, illustrating the exact RAG flow.
  4. Guide (examples/rag_chatbot/README.md):

    • Detailed, step-by-step Quickstart guide covering environment setup, testing the real-time updates, and handling timeline resets/LSN desyncs.

Verification Results

All checks and code quality verifications passed in the WSL environment:

  • Unit & Integration Tests: 68 passed (pytest)
  • Ruff check & formatting: Passed (All checks passed!)
  • MyPy strict type-checking: Passed (Success: no issues found)

@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 277eaeb to 4d436fe Compare June 17, 2026 14:27
@KMohnishM KMohnishM force-pushed the gsoc-week-4-retriever branch from bb30493 to 805e8ff Compare June 18, 2026 12:46
@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 4d436fe to 1ae91ff Compare June 18, 2026 12:46
@KMohnishM KMohnishM force-pushed the gsoc-week-4-retriever branch from 805e8ff to d13e308 Compare June 18, 2026 15:40
@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 1ae91ff to 279505c Compare June 18, 2026 15:40
@KMohnishM KMohnishM force-pushed the gsoc-week-4-retriever branch from d13e308 to 3042183 Compare June 26, 2026 13:32
@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 279505c to 1384ebd Compare June 26, 2026 13:32

@vjuranek vjuranek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few miniro comments regarding code structure, otherwise LGTM

Comment thread tools/setup_jars.py Outdated
<version>${debezium.version}</version>
</dependency>"""
</dependency>""",
""" <dependency>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting?

Comment thread examples/rag_chatbot/chatbot.py Outdated
)

# 3. Create a Verbose Retriever Tool from Adapter
from langchain_core.tools import Tool

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF we don't expect module is not available and provide fallback (like one huggingface one above), the imports should be at the beginning.

Comment thread examples/rag_chatbot/chatbot.py Outdated
# 4. Decide Agent Mode (OpenAI vs. Local Ollama vs. Mock Fallback)
api_key = os.getenv("OPENAI_API_KEY")
agent_mode: str | None = None
from collections.abc import Callable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment thread examples/rag_chatbot/chatbot.py Outdated

# Try OpenAI first
if api_key:
try:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts to be too long. I'd propose to split it into functions like "create_openai_agent()"

Comment thread examples/rag_chatbot/chatbot.py Outdated
print(f"Failed to load LangChain/OpenAI packages: {e}. Checking local alternatives...")

# If no OpenAI, check if local Ollama is running
if not agent_mode:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and create_ollama_agent()

Comment thread examples/rag_chatbot/chatbot.py Outdated
pass

# Fallback to local Mock Agent
if not agent_mode:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and create_mock_agent()

@vjuranek

Copy link
Copy Markdown
Member

@KMohnishM could you please check also CI failures?

@KMohnishM

KMohnishM commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@KMohnishM could you please check also CI failures?

Yes @vjuranek , I investigated the Python 3.12 CI failures! It was caused by a parsing conflict where MyPy targeted Python 3.10 globally, but the Python 3.12 runner installed newer third-party stubs (like numpy) containing Python 3.12 specific syntax.

I have successfully resolved this in the latest downstream branch (gsoc-week-5-milvus) by Dynamically matching MyPy's target version to the running runner version: mypy --python-version ${{ matrix.python-version }} ..

@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 1384ebd to 089627a Compare June 27, 2026 07:17
Signed-off-by: KMohnishM <kmohnishm@gmail.com>
@KMohnishM KMohnishM force-pushed the gsoc-week-4-chatbot branch from 089627a to f364c36 Compare June 27, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants