Add PubSub and BigTable support for Central hosted network controllers by glimberg · Pull Request #2526 · zerotier/ZeroTierOne

glimberg · 2025-10-06T23:53:53Z

This is a big one that goes along with other internal upcoming changes.

Refactors existing Redis & Postgres NOTIFY message passing systems into the NotificationListener interface, and adds a 3rd method for message passing via GCP PubSub. Which one to use is configurable by local.conf controller settings
Adds GCP BigTable as a 3rd option for writing member status updates along side Postgres and Redis. Also configurable via local.conf
Set up CMake for building Central controllers. This can (and hopefully will) be extended for general builds of ZeroTier One in the future. More work will be needed to finish all that out however.
Use miniconda + CMake for external dependency management instead of dumping everything in ext/. This is likely more useful for Central controller builds than end user builds for the time being.

… ztcontroller feature flag

…stance for the whole library, add init/shutdown functions for it exposed to C

match other things already written

Tests currently need to be run with --test-threads=1. Seems like the instances of the pubsub emulator stomp on each other without that

…s in new CentralDB class This allows us to interchangeably use different listeners (pgsql, redis, pubsub) depending on configuration values passed into the constructor.

PubSub allows us to do schema validation, however it only allows one top level message at a time. Move other sub-message declarations under the main message declaration so that we can enable schema validation in the pubsub stream directly

Muuuuch easier to use external dependencies now Also tried out conan and vcpkg. Ran into dependency issues when solving for packages to install with conan. vcpkg is just obtuse as all hell to install and not easy to integrate

* Postgres Direct * Redis * BigTable

glimberg · 2025-11-12T03:24:24Z

OK not sure why the github action is failing now but I'll look at it tomorrow

…eation issue

Required to get the list of configs for a particular org that the controller has. Named it `linked_id` rather than `org_id` since we don't know what it will be linked to in CV2

Publish CTL_NONCE_UPDATE to PubSub when nonces are created or reused in getSSOAuthInfo(), with the network's frontend as a message attribute so only the correct CV frontend receives it. Listen for ZT1_AUTH_UPDATE messages and update sso_expiry.authentication_expiry_time accordingly, with a network existence check before applying. - Add sso_send_topic/sso_recv_topic to PubSubConfig - Add PubSubWriter::publishSSONonceUpdate() with frontend param - Add PubSubSSOListener class for inbound auth updates - Rename CV1_AUTH_UPDATE to ZT1_AUTH_UPDATE in sso.proto - Fix pre-existing connection pool leak in getSSOAuthInfo() catch block

Allow controllers to advertise which central version (cv1, cv2, or all) they are assigned to handle via a new configurable field. The value is persisted to the database on each heartbeat and validated at startup against the DB CHECK constraint.

Now handled in CV1 on new member join via pubsub integration when a new member comes through

…rocess

cspiegel · 2026-04-09T19:15:27Z

+
+	std::set<std::string> networksUpdated;
+	uint64_t updateCount = 0;
+	for (const auto& entry : _pending) {


_pending here should be toWrite.

cspiegel · 2026-04-09T19:31:03Z

+RedisStatusWriter::RedisStatusWriter(std::shared_ptr<sw::redis::Redis> redis, std::string controller_id)
+	: _redis(redis)
+	, _mode(REDIS_MODE_STANDALONE)
+{
+}
+
+RedisStatusWriter::RedisStatusWriter(std::shared_ptr<sw::redis::RedisCluster> cluster, std::string controller_id)
+	: _cluster(cluster)
+	, _mode(REDIS_MODE_CLUSTER)
+{
+}


These constructors don't set _controller_id.

cspiegel · 2026-04-09T19:48:05Z

+		throw std::runtime_error("controller config required");
+	}
+
+	if (_path.length() > 9 && (_path.substr(0, 9) != "postgres:")) {


This test isn't quite complete enough: it allows shorter, non-postgres strings through. You can just flip the size check:

if (_path.length() < 10 || (_path.substr(0, 9) != "postgres:"))

cspiegel · 2026-04-09T19:51:48Z

+			}
+		case LISTENER_MODE_PUBSUB:


Missing break here.

cspiegel · 2026-04-10T17:58:57Z

+								nlohmann::json oldMember;
+								nlohmann::json newMember = config;
+								if (! isNewMember) {
+									oldMember = _getNetworkMember(w, networkId, memberId);


Above was a w.commit(), after which any queries on w should be raising exceptions, since the transaction is complete. I'm wondering if this is a code path that has never been hit in testing (i.e. pubsub, null/controller change source, and it's not a new member).

cspiegel · 2026-04-10T18:00:22Z

+																   frontend);
+						}
+
+						w.commit();


Similar to elsewhere, this commit will be "invalidating" the handle (since the transaction is complete). That would mean a call such as on line 541 shouldn't properly work. Is this a path that isn't occurring or is my understanding of libpqxx's transaction handling obsolete?

cspiegel · 2026-04-10T18:01:17Z

+{
+	fprintf(stderr, "%s: commitThread start\n", _myAddressStr.c_str());
+	_queueItem qitem;
+	while (_commitQueue.get(qitem) & (_run == 1)) {


This should be && instead of &.

cspiegel · 2026-04-10T18:06:51Z

+		// if (memberId == "a10dccea52" && networkId == "8056c2e21c24673d") {
+		//	fprintf(stderr, "invalid authinfo for grant's machine\n");
+		//	info.version=1;
+		//	return info;
+		// }
+		//  fprintf(stderr, "CentralDB::updateMemberOnLoad: %s-%s\n", networkId.c_str(), memberId.c_str());


Leftover commented code.

cspiegel · 2026-04-10T18:10:49Z

+// void create_bigtable_table(std::string project_id, std::string instance_id)
+// {
+// 	auto bigtableAdminClient =
+// 		bigtable_admin::BigtableTableAdminClient(bigtable_admin::MakeBigtableTableAdminConnection());
+
+// 	std::string table_id = "member_status";
+// 	std::string table_name = "projects/" + project_id + "/instances/" + instance_id + "/tables/" + table_id;
+
+// 	// Check if the table exists
+// 	auto table = bigtableAdminClient.GetTable(table_name);
+// 	if (! table.ok()) {
+// 		if (table.status().code() == google::cloud::StatusCode::kNotFound) {
+// 			google::bigtable::admin::v2::Table table_config;
+// 			table_config.set_name(table_id);
+// 			auto families = table_config.mutable_column_families();
+// 			// Define column families
+// 			// Column family "node_info" with max 1 version
+// 			// google::bigtable::admin::v2::ColumnFamily* node_info = table_config.add_column_families();
+// 			// Column family "check_in" with max 1 version
+
+// 			auto create_result = bigtableAdminClient.CreateTable(
+// 				"projects/" + project_id + "/instances/" + instance_id, table_id, table_config);
+
+// 			if (! create_result.ok()) {
+// 				fprintf(
+// 					stderr, "Failed to create Bigtable table member_status: %s\n",
+// 					create_result.status().message().c_str());
+// 				throw std::runtime_error("Failed to create Bigtable table");
+// 			}
+// 			fprintf(stderr, "Created Bigtable table: member_status\n");
+// 		}
+// 		else {
+// 			fprintf(stderr, "Failed to get Bigtable table member_status: %s\n", table.status().message().c_str());
+// 			throw std::runtime_error("Failed to get Bigtable table");
+// 		}
+// 	}
+// }


Is this leftover code, or intended to stay as documetation?

The 10-second session.cancel() loop raced with in-flight acks — when cancel fired while the GCP client was processing messages, acks were lost before reaching the server. With message ordering enabled, an unacked message blocks all subsequent messages on that ordering key, causing silent stalls with no error output. Two fixes: - Replace the cancel/reconnect timer with a blocking session.get(), storing the session future so the destructor can cancel on shutdown. - Always ack messages even when onNotification fails — permanent errors (bad protobuf, missing fields) will never succeed on retry and would otherwise poison the ordering key indefinitely.

glimberg added 30 commits August 5, 2025 15:52

reorganize rustybits into a single library with smeeclient behind the…

e822811

… ztcontroller feature flag

fix dependencies when temporal isn't needed

f9500ca

gcloud-pubsub only if ztcontroller is flagged

44d0e81

fix macos build

a552163

fix windows bui ld

4b3b847

tokio is needed by both temporal & gcloud pubsub, so make just one in…

837f15d

…stance for the whole library, add init/shutdown functions for it exposed to C

cleanup warnings

f079f8b

WIP: pubsub wrapper in Rust

06b2ce9

plumb through callbacks

0e94891

function naming cleanup.

bf8c9d0

match other things already written

updates & tests.

ccb0fa7

Tests currently need to be run with --test-threads=1. Seems like the instances of the pubsub emulator stomp on each other without that

expose change_handler to C via FFI

3a209e2

remove ztcontroller from default feature list

a842ad8

feature cleanup

d4ee95e

fix build script

e4147f7

fix calling into async functions from non-async via the FFI

4fab227

revert default feature list once again

4237830

more calling async from non-async changes

b90ad51

Add C++ wrapper around pubsub listeners

650fc0c

rename classes

ebe8fdb

Refactor Redis & Posgres notification listeners into listener subclas…

9522437

…s in new CentralDB class This allows us to interchangeably use different listeners (pgsql, redis, pubsub) depending on configuration values passed into the constructor.

make whether SSO is enabled a switchable config value

2833d0e

rework protobuf messages

f8a4a5d

PubSub allows us to do schema validation, however it only allows one top level message at a time. Move other sub-message declarations under the main message declaration so that we can enable schema validation in the pubsub stream directly

Update central controller build to use CMake + conda

7f3b150

Muuuuch easier to use external dependencies now Also tried out conan and vcpkg. Ran into dependency issues when solving for packages to install with conan. vcpkg is just obtuse as all hell to install and not easy to integrate

spacing

9bdd564

clang format change (not applied project-wide yet)

3187c2f

add a StatusWriter class hierarchy for writing member status updates

85f2335

* Postgres Direct * Redis * BigTable

wire up status writers

d119547

add sleep at end of online notificaiton loop

e347f89

Merge branch 'dev' into gl/ctl-pubsub

17b48d7

glimberg added 24 commits November 12, 2025 16:17

set --provenance false on docker build to try and fix docker image cr…

2ba50f4

…eation issue

Merge branch 'dev' into gl/ctl-pusub

68a9634

Remove extra verbose logging from controller

ccb9a45

disable peer metrics in Central controller

35f7bf2

WIP: Update sso info retrieval method

c653e76

add a linked_id column to the oidc_config table.

0f0e6b3

Required to get the list of configs for a particular org that the controller has. Named it `linked_id` rather than `org_id` since we don't know what it will be linked to in CV2

fix db migrations

ae7ee51

drop index

0ad6b19

woops. out of order here

38f4d12

set network member frontend based on the network its a member of

03aa33b

Undo change to old migration that shouldn't have been made

d9507dd

another fix

7faf30d

plumb through config changes for sso pubsub

e49b347

configure assigned central version in startup script

7ec4246

print a message when the SSO PSK is configured

b047038

temporary logging

895b060

update settings to enable SSO networks

78b25f4

sso query fix in controller

dd6e69f

Skip redundant nonce sending with an expiry time of 0

20f7311

Remove smee from CentralDB.

ea5c91b

Now handled in CV1 on new member join via pubsub integration when a new member comes through

Added a little bit more logging for the node checkin/bigtable write p…

af7eae5

…rocess

periodic queue size logging, and fix some db connection leaks

1f3a04f

cspiegel reviewed Apr 10, 2026

View reviewed changes

glimberg added 4 commits April 11, 2026 07:40

fix poison pill blocking proccessing

4ca5c9b

logging

d2361a9

make sure subscribe pulls stay running

0e8ec66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PubSub and BigTable support for Central hosted network controllers#2526

Add PubSub and BigTable support for Central hosted network controllers#2526
glimberg wants to merge 195 commits intodevfrom
gl/ctl-pubsub

glimberg commented Oct 6, 2025

Uh oh!

glimberg commented Nov 12, 2025

Uh oh!

cspiegel Apr 9, 2026

Uh oh!

cspiegel Apr 9, 2026

Uh oh!

cspiegel Apr 9, 2026

Uh oh!

cspiegel Apr 9, 2026

Uh oh!

cspiegel Apr 10, 2026

Uh oh!

cspiegel Apr 10, 2026

Uh oh!

cspiegel Apr 10, 2026

Uh oh!

cspiegel Apr 10, 2026

Uh oh!

cspiegel Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

glimberg commented Oct 6, 2025

Uh oh!

glimberg commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants