fix: more changes to support lenovo GB300s#2809
Conversation
Signed-off-by: Krish Dandiwala <kdandiwala@nvidia.com>
Summary by CodeRabbit
WalkthroughBumps the Changesnv-redfish upgrade, error wiring, regex fix, password fallback, and chassis-based BlueField detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/api-model/src/site_explorer/mod.rs (1)
1368-1371: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winConsider validating the single-PciRoot constraint.
The unanchored regex now matches the first occurrence of
PciRoot(...)if multiple exist in the path. While unlikely with compliant firmware, malformed paths could contain multiplePciRootnodes, leading to silent incorrect parsing.Adding validation that exactly one
PciRootnode exists would strengthen the parser and provide clearer error messages for malformed input. This could be achieved by checking thatPCI_ROOT_REGEX.captures_iter(st).count() == 1before proceeding.🛡️ Proposed validation enhancement
let mut push_group = |group: &str| -> Result<(), String> { // ... existing code ... }; +// Validate exactly one PciRoot exists +let pci_root_count = PCI_ROOT_REGEX.captures_iter(st).count(); +if pci_root_count != 1 { + return Err(format!( + "Expected exactly one PciRoot node, found {} in: {s}", + pci_root_count + )); +} + let root = PCI_ROOT_REGEX .captures(st) .and_then(|c| c.get(1)) .ok_or_else(|| format!("Could not match regex in PCI Device Path {s}."))?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-model/src/site_explorer/mod.rs` around lines 1368 - 1371, The current code using PCI_ROOT_REGEX.captures() only matches the first PciRoot occurrence, which could silently accept malformed paths with multiple PciRoot nodes. Add validation before the captures call to ensure exactly one PciRoot node exists by checking that PCI_ROOT_REGEX.captures_iter(st).count() == 1, returning a clear error message if the count is not exactly 1. This strengthens the parser to reject malformed input instead of silently parsing only the first match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/api-model/src/site_explorer/mod.rs`:
- Around line 1335-1341: The `PCI_ROOT_REGEX` modification in the site_explorer
module lacks test coverage for the parsing behavior change that now accommodates
GB300/Grace device paths with vendor/MMIO prefixes. Add table-driven tests using
carbide-test-support in a tests module that verify the following scenarios:
GB300/Grace paths with VenHw and MemoryMapped prefixes (e.g.,
`VenHw(...)/MemoryMapped(...)/PciRoot(0x16)/Pci(0x0,0x0)`), original unadorned
paths (e.g., `PciRoot(0x8)/Pci(0x2,0xa)/Pci(0x0,0x0)`), paths with optional MAC
suffixes, and invalid paths (missing Pci nodes, malformed PciRoot, no PciRoot
nodes). This ensures the regex change functions correctly and prevents
regressions in the hardware discovery path.
In `@crates/site-explorer/src/redfish.rs`:
- Around line 247-250: The PasswordChangeRequired error handler is applying an
AMI-specific fallback using hardcoded account ID "2" to all vendors, including
LenovoGB300, but GB300 admin accounts are at IDs "1" or "4", not "2". In the
Err(libredfish::RedfishError::PasswordChangeRequired) branch, add
vendor-specific logic to check whether the client is LenovoGB300 or another
vendor before applying the account ID "2" fallback. For LenovoGB300, either
return a clear error indicating the account cannot be resolved safely, or use
GB300-specific account IDs if a validated fallback path exists. This ensures the
fallback only applies to vendors where account "2" is appropriate and prevents
attempts to change the wrong account.
---
Nitpick comments:
In `@crates/api-model/src/site_explorer/mod.rs`:
- Around line 1368-1371: The current code using PCI_ROOT_REGEX.captures() only
matches the first PciRoot occurrence, which could silently accept malformed
paths with multiple PciRoot nodes. Add validation before the captures call to
ensure exactly one PciRoot node exists by checking that
PCI_ROOT_REGEX.captures_iter(st).count() == 1, returning a clear error message
if the count is not exactly 1. This strengthens the parser to reject malformed
input instead of silently parsing only the first match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b81cf9ab-8659-48cf-858e-ad912853dfd2
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
Cargo.tomlcrates/api-model/src/site_explorer/mod.rscrates/bmc-mock/src/test_support/axum_http_client.rscrates/site-explorer/src/redfish.rs
| // Not anchored at start: GB300/Grace UEFI device paths prefix the PciRoot | ||
| // node with vendor/MMIO nodes, e.g. | ||
| // VenHw(<guid>)/MemoryMapped(0xB,...)/PciRoot(0x16)/Pci(0x0,0x0)/Pci(0x0,0x0) | ||
| // An `^PciRoot` anchor never matches those and aborts the whole exploration | ||
| // (`Could not match regex in PCI Device Path`). Match PciRoot wherever it appears. | ||
| static ref PCI_ROOT_REGEX: Regex = | ||
| Regex::new(r"^PciRoot\(([^)]*)\)").expect("must always compile"); | ||
| Regex::new(r"PciRoot\(([^)]*)\)").expect("must always compile"); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Add comprehensive test coverage for the regex parsing change.
This modification to PCI_ROOT_REGEX alters parsing behavior to accommodate GB300/Grace device paths with vendor/MMIO prefixes, yet no tests were added to verify correctness. According to the style guide, parsing functions are ideal candidates for table-driven tests using carbide-test-support.
The following scenarios should be covered:
- GB300/Grace paths with vendor/MMIO prefixes:
VenHw(...)/MemoryMapped(...)/PciRoot(0x16)/Pci(0x0,0x0) - Original paths without prefixes:
PciRoot(0x8)/Pci(0x2,0xa)/Pci(0x0,0x0) - Paths with optional
/MAC(...)suffix - Invalid paths: missing Pci nodes, malformed PciRoot, multiple PciRoot nodes
Without test coverage, we cannot verify the fix works as intended or prevent regressions in this critical hardware discovery path. As per path instructions and STYLE_GUIDE.md testing guidance, add table-driven tests before merging.
📋 Example test structure using carbide-test-support
#[cfg(test)]
mod tests {
use super::*;
use carbide_test_support::{scenarios, Outcome::*};
#[test]
fn parse_uefi_device_path() {
scenarios!(UefiDevicePath::from_str:
"GB300/Grace paths with vendor/MMIO prefix" {
"VenHw(D3987D4B-971A-435F-8CAF-4967EB627241)/MemoryMapped(0xB,0x3FFBFE00000,0x3FFBFEFFFFF)/PciRoot(0x16)/Pci(0x0,0x0)"
=> Yields(UefiDevicePath("22.0.0".into())),
"VenHw(...)/PciRoot(0x8)/Pci(0x2,0xa)/Pci(0x0,0x0)/MAC(...)"
=> Yields(UefiDevicePath("8.2.10.0.0".into())),
}
"original paths without prefix" {
"PciRoot(0x8)/Pci(0x2,0xa)/Pci(0x0,0x0)"
=> Yields(UefiDevicePath("8.2.10.0.0".into())),
"PciRoot(0x7)/Pci(0x0,0x0)"
=> Yields(UefiDevicePath("7.0.0".into())),
}
"invalid paths" {
"PciRoot(0x8)" => Fails, // missing Pci nodes
"Pci(0x0,0x0)" => Fails, // missing PciRoot
"" => Fails,
}
);
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/api-model/src/site_explorer/mod.rs` around lines 1335 - 1341, The
`PCI_ROOT_REGEX` modification in the site_explorer module lacks test coverage
for the parsing behavior change that now accommodates GB300/Grace device paths
with vendor/MMIO prefixes. Add table-driven tests using carbide-test-support in
a tests module that verify the following scenarios: GB300/Grace paths with VenHw
and MemoryMapped prefixes (e.g.,
`VenHw(...)/MemoryMapped(...)/PciRoot(0x16)/Pci(0x0,0x0)`), original unadorned
paths (e.g., `PciRoot(0x8)/Pci(0x2,0xa)/Pci(0x0,0x0)`), paths with optional MAC
suffixes, and invalid paths (missing Pci nodes, malformed PciRoot, no PciRoot
nodes). This ensures the regex change functions correctly and prevents
regressions in the hardware discovery path.
Sources: Coding guidelines, Path instructions
| Err(libredfish::RedfishError::PasswordChangeRequired) => { | ||
| client | ||
| .change_password_by_id("2", new_password.as_str()) | ||
| .await |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not apply the AMI /Accounts/2 fallback to LenovoGB300.
This branch includes RedfishVendor::LenovoGB300, but the fallback still PATCHes account ID "2" when account reads are blocked. The PR objective says GB300 admin accounts are at ID 1 or 4, so this can either preserve the GB300 404 failure or rotate the wrong account while the caller proceeds with curr_user/new_password.
Split the PasswordChangeRequired fallback by vendor and use only a validated GB300-specific fallback path, or return a clear error for GB300 when the account cannot be resolved safely.
As per path instructions, BMC-facing code should account for “credential handling” and “vendor differences.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/site-explorer/src/redfish.rs` around lines 247 - 250, The
PasswordChangeRequired error handler is applying an AMI-specific fallback using
hardcoded account ID "2" to all vendors, including LenovoGB300, but GB300 admin
accounts are at IDs "1" or "4", not "2". In the
Err(libredfish::RedfishError::PasswordChangeRequired) branch, add
vendor-specific logic to check whether the client is LenovoGB300 or another
vendor before applying the account ID "2" fallback. For LenovoGB300, either
return a clear error indicating the account cannot be resolved safely, or use
GB300-specific account IDs if a validated fallback path exists. This ensures the
fallback only applies to vendors where account "2" is appropriate and prevents
attempts to change the wrong account.
Source: Path instructions
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Signed-off-by: Krish Dandiwala <kdandiwala@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/site-explorer/src/lib.rs`:
- Around line 1214-1226: The condition on line 1214 that checks
`chassis_is_bluefield && chassis_has_serial` is too permissive and causes
adapter serials to be skipped even when the chassis serial cannot actually pair.
Modify the condition to only take the chassis path (calling
record_host_dpu_device and skipping the adapter iteration in the else block) if
the chassis serial actually exists as a key in the dpu_sn_to_endpoint map.
Currently it skips adapters whenever chassis_has_serial is true, but it should
only skip them when the chassis serial can actually be paired based on what
exists in dpu_sn_to_endpoint. Check if chassis.serial_number.as_deref() exists
in the dpu_sn_to_endpoint map before deciding to take the chassis path and skip
the adapter iteration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 705c09d0-6d8d-4c21-a397-4ca39e73db41
📒 Files selected for processing (2)
crates/site-explorer/src/lib.rscrates/site-explorer/tests/site_explorer.rs
| if chassis_is_bluefield && chassis_has_serial { | ||
| self.record_host_dpu_device( | ||
| network_adapter.part_number.as_deref(), | ||
| network_adapter.serial_number.as_deref(), | ||
| chassis.part_number.as_deref(), | ||
| chassis.serial_number.as_deref(), | ||
| &dpu_sn_to_endpoint, | ||
| host_dpu_mode, | ||
| &ep, | ||
| &mut dpu_exploration, | ||
| metrics, | ||
| ) | ||
| .await; | ||
| } else { | ||
| for network_adapter in chassis.network_adapters.iter() { |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Do not skip adapter serials unless the chassis serial can actually pair.
Line 1214 selects the chassis path for any non-empty BlueField chassis serial, but record_host_dpu_device only pairs if that serial exists in dpu_sn_to_endpoint. If the chassis serial is an enclosure/card serial and the nested adapter has the DPU system serial, this skips the adapter and prevents pairing.
Proposed fix
- let chassis_has_serial = chassis
+ let chassis_serial = chassis
.serial_number
.as_deref()
.map(str::trim)
- .is_some_and(|serial| !serial.is_empty());
- if chassis_is_bluefield && chassis_has_serial {
+ .filter(|serial| !serial.is_empty());
+ let chassis_serial_matches_discovered_dpu = chassis_serial
+ .is_some_and(|serial| dpu_sn_to_endpoint.contains_key(serial));
+ if chassis_is_bluefield && chassis_serial_matches_discovered_dpu {
self.record_host_dpu_device(
chassis.part_number.as_deref(),
- chassis.serial_number.as_deref(),
+ chassis_serial,
&dpu_sn_to_endpoint,
host_dpu_mode,
&ep,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/site-explorer/src/lib.rs` around lines 1214 - 1226, The condition on
line 1214 that checks `chassis_is_bluefield && chassis_has_serial` is too
permissive and causes adapter serials to be skipped even when the chassis serial
cannot actually pair. Modify the condition to only take the chassis path
(calling record_host_dpu_device and skipping the adapter iteration in the else
block) if the chassis serial actually exists as a key in the dpu_sn_to_endpoint
map. Currently it skips adapters whenever chassis_has_serial is true, but it
should only skip them when the chassis serial can actually be paired based on
what exists in dpu_sn_to_endpoint. Check if chassis.serial_number.as_deref()
exists in the dpu_sn_to_endpoint map before deciding to take the chassis path
and skip the adapter iteration.
This PR introduces more changes to enable exploration and provisioning of Lenovo GB300 trays (AMI host BMC) in NICo, and fixes a compile break from the
nv-redfish 0.10.3bump.Changes
site-explorer): rotate the AMI/GB300 admin account by username instead of the hardcoded/Accounts/2(GB300's admin id is1/4, so the old PATCH 404'd). Falls back to id2only when account reads are blocked byPasswordChangeRequired(Viking factory state); any other error propagates.api-model): un-anchorPCI_ROOT_REGEXso GB300/Grace boot paths that prefixPciRootwithVenHw(...)/MemoryMapped(...)parse instead of aborting the whole exploration.bmc-mock): implement the newRequestErrortrait (added bynv-redfish 0.10.3) for theAxumRouterHttpClienttest error soServiceRoot<TestBmc>compiles.Type of Change
Breaking Changes
Testing
Additional Notes