Skip to content

Repeated OpenSyn -> Close(INVALID) with usrpwd in a peer-to-peer duplicate connect scenario #2577

@BOURBONCASK

Description

@BOURBONCASK

Describe the bug

Hi, thanks for maintaining zenoh.

We are seeing repeated OpenSyn -> Close(INVALID) errors when two zenoh peers are both configured to:

  • run in peer mode,
  • listen on a TCP endpoint,
  • connect to each other,
  • and use transport.auth.usrpwd.

To make this easier to investigate, I prepared a minimal repro branch based on upstream/main:

With that repro, whichever peer is started first keeps printing errors like:

ERROR zenoh_transport::unicast::establishment::open:
Received a close message (reason INVALID) in response to an OpenSyn on:
TransportLinkUnicast { ... }

If I reverse the startup order, the repeated error moves to the other peer instead.

Expected behavior:

  • duplicate/bidirectional peer connection attempts should be handled cleanly,
  • and they should not keep producing repeated Close(INVALID) responses.

Observed behavior:

  • the peer started first keeps retrying and repeatedly logs Received a close message (reason INVALID) in response to an OpenSyn,
  • while the peer started second does not emit the same repeated error.

As a workaround, we changed peer-to-peer auto-connect to use greater-zid (for example autoconnect_strategy: { peer: { to_peer: "greater-zid" } }). With that configuration change, we have not been able to reproduce this issue in our peer-to-peer deployment so far.

To reproduce

  1. Check out the repro branch:

    • git checkout repro/usrpwd-duplicate-transport
  2. Build the example:

    • cargo check -p zenoh-examples --example z_repro_usrpwd_peer
  3. In terminal A, run:

    RUST_LOG=zenoh_transport::unicast::manager=trace,zenoh_transport::unicast::establishment::accept=trace,zenoh_transport::unicast::establishment::open=trace,zenoh::net::runtime::orchestrator=debug ./repro/usrpwd-duplicate-transport/run-peer-a.sh
  4. In terminal B, run:

    RUST_LOG=zenoh_transport::unicast::manager=trace,zenoh_transport::unicast::establishment::accept=trace,zenoh_transport::unicast::establishment::open=trace,zenoh::net::runtime::orchestrator=debug ./repro/usrpwd-duplicate-transport/run-peer-b.sh
  5. Observe that the peer started first prints repeated Close(INVALID) / OpenSyn errors.

You can also start peer-b first and then peer-a; the repeated error follows the peer that was started first.

System info

  • Minimal repro platform: macOS arm64 (Apple Silicon)
  • Kernel: Darwin 25.3.0
  • Zenoh base commit used for the minimal repro: d12952d8599b83fef49a5a1b290b6b039ea028e0
  • Repro branch: origin/repro/usrpwd-duplicate-transport
  • Repro commit: 1b92bc932c5e55ecfb90a6c1650240a2ba8cccd7
  • Original environment where we first noticed this behavior: ROS 2 Humble with rmw_zenoh on its humble branch, with the commit titled Bump zenoh to 1.8.0 - 2nd attempt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions