fix: record workflow gRPC call latency for monitoring#2702
Conversation
Summary by CodeRabbit
WalkthroughA new ChangesgRPC Call Latency Instrumentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Adds latency measurement and logging to gRPC calls in workflow activities. Introduces a shared helper function to standardize how gRPC call durations are logged across activities, enabling better observability of workflow performance. This addresses issue NVIDIA#2649 where workflow latency metrics were defined but not recorded. The duration is now captured and logged for each gRPC call, providing visibility into API response times for troubleshooting and performance monitoring. Fixes NVIDIA#2649 Signed-off-by: rsd-darshan <poudeldarshan44@gmail.com>
b4fbd86 to
0ae41ea
Compare
|
@rsd-darshan I'm only seeing one file getting changed. Intentional? |
Extends the logGrpcCallLatency helper to every mutating Site Controller gRPC call (Create/Update/Delete/Add/Remove/Insert) across the workflow activities, not just the two introduced earlier. Each call now measures its duration and logs it alongside the operation name and error state. Read-only inventory lookups (Find/Get/Search) are left untouched since they're invoked through a different, generic code path used by the inventory collection helpers. Calls that go to the Flow gRPC client (AddComponent, CreateExpectedRackOnFlow) are also left untouched since they target a different downstream service. Signed-off-by: rsd-darshan <poudeldarshan44@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
rest-api/site-workflow/pkg/activity/tenant.go (2)
54-62: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer explicit
return nilfor consistency with other activities.After successful operations, line 62 returns
err(which isnilat that point) instead of explicitly returningnil. While functionally correct, this pattern is inconsistent with all other activity files in this PR and the broader codebase, which use explicitreturn nilon success paths.♻️ Proposed change for consistency
start := time.Now() _, err = grpcServiceClient.CreateTenant(ctx, request) duration := time.Since(start) logGrpcCallLatency(&logger, "CreateTenant", duration, err) if err != nil { return swe.WrapErr(err) } - return err + return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/site-workflow/pkg/activity/tenant.go` around lines 54 - 62, In the CreateTenant function, the final return statement at line 62 returns the err variable (which is nil at that point) instead of explicitly returning nil. Replace the return statement after the error handling block with an explicit return nil to maintain consistency with other activity files in the codebase and follow the established pattern for successful operation completion.
91-99: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer explicit
return nilfor consistency with other activities.After successful operations, line 99 returns
err(which isnilat that point) instead of explicitly returningnil. While functionally correct, this pattern is inconsistent with all other activity files in this PR and the broader codebase, which use explicitreturn nilon success paths.♻️ Proposed change for consistency
start := time.Now() _, err = grpcServiceClient.UpdateTenant(ctx, request) duration := time.Since(start) logGrpcCallLatency(&logger, "UpdateTenant", duration, err) if err != nil { return swe.WrapErr(err) } - return err + return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/site-workflow/pkg/activity/tenant.go` around lines 91 - 99, In the UpdateTenant activity function, change the final return statement from `return err` to `return nil` on line 99. After the successful UpdateTenant operation when no error has occurred, explicitly return nil instead of returning the err variable (which is nil at that point) to maintain consistency with other activity files in the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rest-api/site-workflow/pkg/activity/latency.go`:
- Around line 15-17: The successful operation latency logging in the else block
is currently using logger.Debug() which will not appear in typical production
log configurations, preventing real-time SLA monitoring and latency analysis.
Change the log level from Debug to Info in the else block where the duration and
"Completed %s" operation message are logged to ensure these latency measurements
are visible in production environments and enable proper SLA compliance
monitoring.
---
Nitpick comments:
In `@rest-api/site-workflow/pkg/activity/tenant.go`:
- Around line 54-62: In the CreateTenant function, the final return statement at
line 62 returns the err variable (which is nil at that point) instead of
explicitly returning nil. Replace the return statement after the error handling
block with an explicit return nil to maintain consistency with other activity
files in the codebase and follow the established pattern for successful
operation completion.
- Around line 91-99: In the UpdateTenant activity function, change the final
return statement from `return err` to `return nil` on line 99. After the
successful UpdateTenant operation when no error has occurred, explicitly return
nil instead of returning the err variable (which is nil at that point) to
maintain consistency with other activity files in the codebase.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1c52a219-69d7-415a-9ef0-bec7d3ff658b
📒 Files selected for processing (19)
rest-api/site-workflow/pkg/activity/dpuextensionservice.gorest-api/site-workflow/pkg/activity/expectedmachine.gorest-api/site-workflow/pkg/activity/expectedpowershelf.gorest-api/site-workflow/pkg/activity/expectedrack.gorest-api/site-workflow/pkg/activity/expectedswitch.gorest-api/site-workflow/pkg/activity/ibpartition.gorest-api/site-workflow/pkg/activity/instance.gorest-api/site-workflow/pkg/activity/instancetype.gorest-api/site-workflow/pkg/activity/latency.gorest-api/site-workflow/pkg/activity/machine.gorest-api/site-workflow/pkg/activity/networksecuritygroup.gorest-api/site-workflow/pkg/activity/nvlinklogicalpartition.gorest-api/site-workflow/pkg/activity/operatingsystem.gorest-api/site-workflow/pkg/activity/sshkeygroup.gorest-api/site-workflow/pkg/activity/subnet.gorest-api/site-workflow/pkg/activity/tenant.gorest-api/site-workflow/pkg/activity/vpc.gorest-api/site-workflow/pkg/activity/vpcpeering.gorest-api/site-workflow/pkg/activity/vpcprefix.go
| } else { | ||
| logger.Debug().Dur("duration", duration).Msgf("Completed %s", operation) | ||
| } |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win
Success logging at Debug level prevents production SLA monitoring.
The PR objectives state this change "enables monitoring of workflow SLA compliance and identification of slow operations." However, logging successful operations at Debug level means these latency measurements won't appear in typical production log configurations (which run at Info or Warn level). This prevents real-time SLA monitoring and latency analysis for the success path, which is critical for performance observability.
Consider changing line 16 to log at Info level to ensure successful operation latencies are visible in production:
📊 Proposed fix to enable production monitoring
func logGrpcCallLatency(logger *zerolog.Logger, operation string, duration time.Duration, err error) {
if err != nil {
logger.Warn().Err(err).Dur("duration", duration).Msgf("Failed to %s using Site Controller API", operation)
} else {
- logger.Debug().Dur("duration", duration).Msgf("Completed %s", operation)
+ logger.Info().Dur("duration", duration).Msgf("Completed %s", operation)
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/site-workflow/pkg/activity/latency.go` around lines 15 - 17, The
successful operation latency logging in the else block is currently using
logger.Debug() which will not appear in typical production log configurations,
preventing real-time SLA monitoring and latency analysis. Change the log level
from Debug to Info in the else block where the duration and "Completed %s"
operation message are logged to ensure these latency measurements are visible in
production environments and enable proper SLA compliance monitoring.
|
@ajf Good catch, that was incomplete. Pushed an update that wires the Read-only lookups ( |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
✅ Action performedFull review finished. |
|
/ok to test 8427f7c |
🔍 Container Scan SummaryNo Grype artifacts were found to aggregate. |
|
🌿 Preview your docs: https://nvidia-preview-pull-request-2702.docs.buildwithfern.com/infra-controller |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-23 23:05:46 UTC | Commit: 8427f7c |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
rest-api/site-workflow/pkg/activity/latency.go (1)
12-17: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse structured
operationfields instead ofMsgfinterpolation in latency logs.Line 14 and Line 16 currently embed
operationinside the log message, which is harder to query and aggregate than explicit fields.♻️ Proposed refactor
func logGrpcCallLatency(logger *zerolog.Logger, operation string, duration time.Duration, err error) { if err != nil { - logger.Warn().Err(err).Dur("duration", duration).Msgf("Failed to %s using Site Controller API", operation) + logger.Warn(). + Err(err). + Str("operation", operation). + Dur("duration", duration). + Msg("Site Controller API call failed") } else { - logger.Debug().Dur("duration", duration).Msgf("Completed %s", operation) + logger.Debug(). + Str("operation", operation). + Dur("duration", duration). + Msg("Site Controller API call completed") } }As per coding guidelines, "When writing log messages, prefer placing common fields as attributes passed to tracing function, instead of using string interpolation."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/site-workflow/pkg/activity/latency.go` around lines 12 - 17, In the logGrpcCallLatency function, the operation parameter is currently embedded in log messages using Msgf string interpolation on both the error path (Warn log) and success path (Debug log). Instead, add operation as a structured field by chaining .Str("operation", operation) before calling .Msg() with a static message string. Replace both the Msgf calls with Msg calls that use the structured operation field, which will improve log queryability and aggregation.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rest-api/site-workflow/pkg/activity/expectedswitch.go`:
- Around line 286-289: The logGrpcCallLatency helper function in
rest-api/site-workflow/pkg/activity/latency.go logs successful gRPC calls at
Debug level, which suppresses them in production deployments running at Info or
Warn level. Modify the logGrpcCallLatency function to log success cases at Info
level instead of Debug level, and use structured fields for consistency with
logging conventions. This single change will fix all 40+ call sites across the
activity package (including the AddExpectedSwitch call and others in
expectedswitch, vpc, tenant, subnet, instance, machine, and other files) by
ensuring successful latency records are visible in production for proper
observability and SLA visibility.
In `@rest-api/site-workflow/pkg/activity/networksecuritygroup.go`:
- Around line 59-62: The logGrpcCallLatency function in latency.go is logging
success-case latency metrics at Debug level, which makes them invisible at the
default Info log level used in production. Locate the logGrpcCallLatency
function definition in latency.go and change the logging level for successful
gRPC call completions from Debug to Info level, while keeping error logs at
their current level if appropriate. This will ensure that operation latency
metrics are visible in production logs without requiring debug logging to be
explicitly enabled. The function is called from three locations (around lines
59-62, 99-102, and 139-142 in networksecuritygroup.go), but you only need to fix
it once in the function definition itself.
---
Nitpick comments:
In `@rest-api/site-workflow/pkg/activity/latency.go`:
- Around line 12-17: In the logGrpcCallLatency function, the operation parameter
is currently embedded in log messages using Msgf string interpolation on both
the error path (Warn log) and success path (Debug log). Instead, add operation
as a structured field by chaining .Str("operation", operation) before calling
.Msg() with a static message string. Replace both the Msgf calls with Msg calls
that use the structured operation field, which will improve log queryability and
aggregation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 874a244a-c631-4712-8460-907c9dfbf798
📒 Files selected for processing (19)
rest-api/site-workflow/pkg/activity/dpuextensionservice.gorest-api/site-workflow/pkg/activity/expectedmachine.gorest-api/site-workflow/pkg/activity/expectedpowershelf.gorest-api/site-workflow/pkg/activity/expectedrack.gorest-api/site-workflow/pkg/activity/expectedswitch.gorest-api/site-workflow/pkg/activity/ibpartition.gorest-api/site-workflow/pkg/activity/instance.gorest-api/site-workflow/pkg/activity/instancetype.gorest-api/site-workflow/pkg/activity/latency.gorest-api/site-workflow/pkg/activity/machine.gorest-api/site-workflow/pkg/activity/networksecuritygroup.gorest-api/site-workflow/pkg/activity/nvlinklogicalpartition.gorest-api/site-workflow/pkg/activity/operatingsystem.gorest-api/site-workflow/pkg/activity/sshkeygroup.gorest-api/site-workflow/pkg/activity/subnet.gorest-api/site-workflow/pkg/activity/tenant.gorest-api/site-workflow/pkg/activity/vpc.gorest-api/site-workflow/pkg/activity/vpcpeering.gorest-api/site-workflow/pkg/activity/vpcprefix.go
| start := time.Now() | ||
| _, err = grpcServiceClient.AddExpectedSwitch(ctx, request) | ||
| duration := time.Since(start) | ||
| logGrpcCallLatency(&logger, "AddExpectedSwitch", duration, err) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify how site-workflow log level is configured/defaulted.
# Expected: if default/effective level is Info or higher, Debug success-latency logs are not emitted.
rg -n --type go -C3 'SetGlobalLevel|ParseLevel|WithLevel|LOG_LEVEL|zerolog' rest-api/site-workflow rest-api/common
# Show all current helper call sites touched by this PR cohort.
rg -n --type go 'logGrpcCallLatency\(' rest-api/site-workflow/pkg/activityRepository: NVIDIA/infra-controller
Length of output: 47687
🏁 Script executed:
# Read the latency.go file to see the actual log level used in success path
sed -n '12,20p' rest-api/site-workflow/pkg/activity/latency.goRepository: NVIDIA/infra-controller
Length of output: 480
Successful gRPC latency logs are suppressed in production deployments.
The logGrpcCallLatency helper in rest-api/site-workflow/pkg/activity/latency.go logs success cases at Debug level. In production deployments running at Info or Warn (standard configurations), these successful latency records are discarded, undermining observability and SLA visibility for normal traffic. Error cases remain visible at Warn level, creating an asymmetry where failures are instrumented but successes are not.
The fix elevates success logging to Info and uses structured fields for consistency with logging conventions:
Suggested change in `rest-api/site-workflow/pkg/activity/latency.go`
func logGrpcCallLatency(logger *zerolog.Logger, operation string, duration time.Duration, err error) {
if err != nil {
- logger.Warn().Err(err).Dur("duration", duration).Msgf("Failed to %s using Site Controller API", operation)
+ logger.Warn().
+ Err(err).
+ Str("operation", operation).
+ Dur("duration", duration).
+ Msg("Site Controller API call failed")
} else {
- logger.Debug().Dur("duration", duration).Msgf("Completed %s", operation)
+ logger.Info().
+ Str("operation", operation).
+ Dur("duration", duration).
+ Msg("Site Controller API call completed")
}
}This affects 40+ call sites across rest-api/site-workflow/pkg/activity/*.go (expectedswitch, vpc, tenant, subnet, instance, machine, and others).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| start := time.Now() | |
| _, err = grpcServiceClient.AddExpectedSwitch(ctx, request) | |
| duration := time.Since(start) | |
| logGrpcCallLatency(&logger, "AddExpectedSwitch", duration, err) | |
| func logGrpcCallLatency(logger *zerolog.Logger, operation string, duration time.Duration, err error) { | |
| if err != nil { | |
| logger.Warn(). | |
| Err(err). | |
| Str("operation", operation). | |
| Dur("duration", duration). | |
| Msg("Site Controller API call failed") | |
| } else { | |
| logger.Info(). | |
| Str("operation", operation). | |
| Dur("duration", duration). | |
| Msg("Site Controller API call completed") | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/site-workflow/pkg/activity/expectedswitch.go` around lines 286 -
289, The logGrpcCallLatency helper function in
rest-api/site-workflow/pkg/activity/latency.go logs successful gRPC calls at
Debug level, which suppresses them in production deployments running at Info or
Warn level. Modify the logGrpcCallLatency function to log success cases at Info
level instead of Debug level, and use structured fields for consistency with
logging conventions. This single change will fix all 40+ call sites across the
activity package (including the AddExpectedSwitch call and others in
expectedswitch, vpc, tenant, subnet, instance, machine, and other files) by
ensuring successful latency records are visible in production for proper
observability and SLA visibility.
| start := time.Now() | ||
| _, err = rpcServiceClient.CreateNetworkSecurityGroup(ctx, request) | ||
| duration := time.Since(start) | ||
| logGrpcCallLatency(&logger, "CreateNetworkSecurityGroup", duration, err) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate latency helper usage and definition
rg -n "func logGrpcCallLatency|logGrpcCallLatency\(" rest-api/site-workflow/pkg/activity
# 2) Inspect configured log levels in site-workflow and shared logging/config code
rg -n "SetGlobalLevel|GlobalLevel|LOG_LEVEL|log_level|zerolog\\.(DebugLevel|InfoLevel|WarnLevel)" \
rest-api/site-workflow rest-api/common
# 3) Find deployment/runtime env defaults for log level (if any)
rg -n "LOG_LEVEL|log-level|log_level" rest-api/deploy rest-api/helm rest-api/dockerRepository: NVIDIA/infra-controller
Length of output: 7842
🏁 Script executed:
# Inspect the latency helper implementation
cat -n rest-api/site-workflow/pkg/activity/latency.go
# Search for log level initialization in site-workflow
rg -n "zerolog|SetGlobalLevel|logger\.SetLevel" rest-api/site-workflow/pkg --max-depth=5
# Check main.go or similar entrypoint for log setup
find rest-api/site-workflow -name "main.go" -o -name "init.go" | head -5Repository: NVIDIA/infra-controller
Length of output: 6104
🏁 Script executed:
# Find site-workflow main entry point
find rest-api/site-workflow -name "main.go" -o -name "cmd" -type d
# Check what's in the cmd or main area
ls -la rest-api/site-workflow/cmd/ 2>/dev/null || ls -la rest-api/site-workflow/ | grep -E "main|cmd"
# Search for log initialization in common package
rg -n "log\.|Logger|zerolog" rest-api/common/pkg/config --max-depth=3 -A 3
# Check site-workflow main directly
find rest-api/site-workflow -maxdepth 3 -name "*.go" -path "*/cmd/*" -o -name "main.go" | xargs head -50Repository: NVIDIA/infra-controller
Length of output: 161
🏁 Script executed:
# Find where site-workflow package is imported/used
rg -n "site-workflow|site_workflow" rest-api --glob="*.go" | grep -v "test" | head -20
# Check the main site-workflow directory structure
ls -la rest-api/site-workflow/
# Search for logging setup in workflow initialization
rg -n "log\.Level|SetLevel|GlobalLevel" rest-api/workflow rest-api/site-workflow -A 2
# Check Helm/Kustomize for LOG_LEVEL env var defaults
rg -n "LOG_LEVEL|log.level" rest-api/helm rest-api/deploy -iRepository: NVIDIA/infra-controller
Length of output: 3813
🏁 Script executed:
# Check workflow's log level configuration
sed -n '250,280p' rest-api/workflow/internal/config/config.go
# Look for log level default in workflow config
rg -n "InfoLevel|DebugLevel|log.*level|LOG_LEVEL" rest-api/workflow/internal/config -B 2 -A 2
# Check site-workflow package to see if it sets up logging
rg -n "SetGlobalLevel|log\.Level|DEBUG|INFO" rest-api/site-workflow/pkg -i | head -20
# Check how site-agent (which runs site-workflow) initializes logging
rg -n "log\.|zerolog" rest-api/site-agent/cmd rest-api/site-agent/internal -i | grep -i "level\|init\|setup" | head -10Repository: NVIDIA/infra-controller
Length of output: 5375
🏁 Script executed:
# Look for default log level in config defaults
rg -n "log\.level|ConfigLogLevel" rest-api/workflow/internal/config -B 5 -A 5 | head -50
# Check viper defaults or config files
rg -n "log.*level|info|debug" rest-api/common/pkg/config --glob="*.go" -i
# Check if there's a default in GetLogLevel
rg -n "GetLogLevel" rest-api/workflow/internal/config -A 5
# Look for environment defaults
rg -n "log.*level.*info|log.*level.*debug|LOG_LEVEL.*=" rest-api -i | head -20Repository: NVIDIA/infra-controller
Length of output: 5665
Success-latency logs for gRPC calls now log at Debug level, making them invisible at the default Info log level.
Lines 59–62, 99–102, and 139–142 route completion latency through logGrpcCallLatency, which logs success at Debug level (latency.go:16). The workflow service defaults to log.level: "info" (workflow/internal/config/config.go:129), so these latency metrics will not appear in production logs unless debug logging is explicitly enabled—a configuration most deployments do not use.
Promote success-latency logs to Info level to preserve runtime visibility of operation performance, or document that debug logging must be enabled for latency observability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rest-api/site-workflow/pkg/activity/networksecuritygroup.go` around lines 59
- 62, The logGrpcCallLatency function in latency.go is logging success-case
latency metrics at Debug level, which makes them invisible at the default Info
log level used in production. Locate the logGrpcCallLatency function definition
in latency.go and change the logging level for successful gRPC call completions
from Debug to Info level, while keeping error logs at their current level if
appropriate. This will ensure that operation latency metrics are visible in
production logs without requiring debug logging to be explicitly enabled. The
function is called from three locations (around lines 59-62, 99-102, and 139-142
in networksecuritygroup.go), but you only need to fix it once in the function
definition itself.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rest-api/site-workflow/pkg/activity/latency.go (1)
12-17: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse structured log fields instead of formatted messages in the latency helper.
Msgfembeds operation context in free text, which makes querying less reliable. Emitoperationas a structured field and useMsg(...)for both branches.♻️ Proposed refinement
func logGrpcCallLatency(logger *zerolog.Logger, operation string, duration time.Duration, err error) { if err != nil { - logger.Warn().Err(err).Dur("duration", duration).Msgf("Failed to %s using Site Controller API", operation) + logger.Warn(). + Err(err). + Str("operation", operation). + Dur("duration", duration). + Msg("Site Controller API call failed") } else { - logger.Debug().Dur("duration", duration).Msgf("Completed %s", operation) + logger.Debug(). + Str("operation", operation). + Dur("duration", duration). + Msg("Site Controller API call completed") } }As per coding guidelines, “When writing log messages, prefer placing common fields as attributes passed to tracing function, instead of using string interpolation.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/site-workflow/pkg/activity/latency.go` around lines 12 - 17, The logGrpcCallLatency function currently embeds the operation context in free-text messages using Msgf, which reduces log query reliability. Refactor this function to emit operation as a structured field instead. Add a `.Str("operation", operation)` field to both the error and success log entries, then replace both Msgf calls with Msg calls using plain text messages that do not interpolate the operation variable. This applies to both branches of the if-else statement.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rest-api/site-workflow/pkg/activity/latency.go`:
- Around line 12-17: The logGrpcCallLatency function currently embeds the
operation context in free-text messages using Msgf, which reduces log query
reliability. Refactor this function to emit operation as a structured field
instead. Add a `.Str("operation", operation)` field to both the error and
success log entries, then replace both Msgf calls with Msg calls using plain
text messages that do not interpolate the operation variable. This applies to
both branches of the if-else statement.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d8113186-8387-4296-8a0e-c024bb35d427
📒 Files selected for processing (19)
rest-api/site-workflow/pkg/activity/dpuextensionservice.gorest-api/site-workflow/pkg/activity/expectedmachine.gorest-api/site-workflow/pkg/activity/expectedpowershelf.gorest-api/site-workflow/pkg/activity/expectedrack.gorest-api/site-workflow/pkg/activity/expectedswitch.gorest-api/site-workflow/pkg/activity/ibpartition.gorest-api/site-workflow/pkg/activity/instance.gorest-api/site-workflow/pkg/activity/instancetype.gorest-api/site-workflow/pkg/activity/latency.gorest-api/site-workflow/pkg/activity/machine.gorest-api/site-workflow/pkg/activity/networksecuritygroup.gorest-api/site-workflow/pkg/activity/nvlinklogicalpartition.gorest-api/site-workflow/pkg/activity/operatingsystem.gorest-api/site-workflow/pkg/activity/sshkeygroup.gorest-api/site-workflow/pkg/activity/subnet.gorest-api/site-workflow/pkg/activity/tenant.gorest-api/site-workflow/pkg/activity/vpc.gorest-api/site-workflow/pkg/activity/vpcpeering.gorest-api/site-workflow/pkg/activity/vpcprefix.go
Fixes #2649
Summary
Adds latency measurement and logging to gRPC calls in workflow activities. Workflow latency metrics were previously defined in site-agent's coregrpc manager but never actually recorded. This change instruments the activities to measure and log the duration of each gRPC call to NICo.
Changes
latency.gowith a shared helperlogGrpcCallLatency()to standardize duration logging across activitiesvpcprefix.goactivities (Create, Update, Delete) to measure gRPC call durationinstance.goUpdateInstanceConfig activity with latency measurementBenefits
Testing
All existing tests pass. Verified that duration logging appears in test output with actual measured latencies.