Add retry for extension register#733
Conversation
|
Thanks for the fix — this is the right direction for the Datadog extension issue. A few comments before merge. Requested before merge1. Document the new env var
Commit 2912565 set the expectation that env vars get documented in lockstep. Please add a row to both with the default (20 ms) and a one-line description of when to tune it. 2. Guard against
|
| tracing::warn!(error = %e, "extension registration attempt failed, will retry"); | ||
| Err(Error::from(e)) | ||
| } | ||
| Ok(res) if res.status() != StatusCode::OK => { |
There was a problem hiding this comment.
seems now all non 200 statuses will be retried 5 times. However, what about the case of a deterministic 4xx (e.g., a malformed request shape from a future bug) which will also retry five times before exiting. We should consider short-circuiting on 4xx (except 408/429) to fail fast on non-transient errors:
Ok(res) if res.status().is_client_error() && res.status() != StatusCode::TOO_MANY_REQUESTS => Ok(res),
| // Extensions proxying AWS_LAMBDA_RUNTIME_API may not have bound their socket when LWA registers, retry the POST with exponential backoff to allow them to bind. | ||
| let base_ms: u64 = env::var(ENV_EXTENSION_REGISTER_BASE_MS) | ||
| .ok() | ||
| .and_then(|s| s.parse().ok()) |
There was a problem hiding this comment.
i think this would swallow typos too, can we add a warn on parse failures? could be something like
let base_ms: u64 = match env::var(ENV_EXTENSION_REGISTER_BASE_MS) {
Ok(raw) => match raw.parse::<u64>() {
Ok(n) => n,
Err(e) => {
tracing::warn!(
value = %raw,
error = %e,
"invalid {ENV_EXTENSION_REGISTER_BASE_MS}, using default {EXTENSION_REGISTER_BASE_MS}ms"
);
EXTENSION_REGISTER_BASE_MS
}
},
Err(_) => EXTENSION_REGISTER_BASE_MS, // unset is fine, no warning
};
|
This issue is addressed in #737. Closing. |
Issue #, if available:
Datadog Lambda Extension crashing with v1.0.0 (Github Issue)
Description of changes:
LWA 1.0.0 changed to read
AWS_LWA_LAMBDA_RUNTIME_API_PROXYon startupaws-lambda-web-adapter/src/main.rs
Lines 7 to 9 in 2912565
When LWA immediately POSTs /extension/register to that proxy address, it exits the process immediately since the the Datadog extension port from
AWS_LWA_LAMBDA_RUNTIME_API_PROXYisn’t available yet, as the extension is not ready early enough.This PR adds retry to the extension registration POST so LWA can connect once the extension's proxy port is ready. Uses exponential backoff with jitter, default base 20ms, configurable via
AWS_LWA_EXTENSION_REGISTER_BASE_MS. Added tests covering retry on extension registration failure.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.