Description
When the inference backend (Ollama) is stopped and a request is made to inference.local from inside the sandbox, the OpenShell gateway silently drops the connection with no HTTP response body. The user sees no error message — just a curl exit code 6 after ~5 seconds. The gateway should return a clear HTTP error response indicating the backend is unreachable.
Working State
With Ollama running, inference through the gateway works correctly:
# Inside sandbox — returns valid completion
curl -s https://inference.local/v1/chat/completions \
-H Content-Type: application/json \
-d '{ model : llama3.2:1b , messages :[{ role : user , content : say hello }]}'
Response:
{ id : chatcmpl-100 , object : chat.completion , created :1773620673, model : llama3.2:1b , choices :[{ index :0, message :{ role : assistant , content : Hello. How can I assist you today? }, finish_reason : stop }], usage :{ prompt_tokens :27, completion_tokens :10, total_tokens :37}}
Steps to reproduce
- Complete
nemoclaw setup successfully - Set inference route to ollama-local:
openshell inference set --provider ollama-local --model llama3.2:1b - Verify inference works inside sandbox (pre-condition above)
- Stop Ollama on the host:
pkill ollama - Verify Ollama is down:
curl -s http://localhost:11434/api/tags returns connection refused - Inside sandbox, re-run inference:
curl -s https://inference.local/v1/chat/completions \
-H Content-Type: application/json \
-d '{ model : llama3.2:1b , messages :[{ role : user , content : say hello }]}'
Actual Result
Steps to reproduce
No steps provided.
[NVB# 5982629]
[NVB#5982629]
Description
When the inference backend (Ollama) is stopped and a request is made to
inference.localfrom inside the sandbox, the OpenShell gateway silently drops the connection with no HTTP response body. The user sees no error message — just a curl exit code 6 after ~5 seconds. The gateway should return a clear HTTP error response indicating the backend is unreachable.Working State
With Ollama running, inference through the gateway works correctly:
# Inside sandbox — returns valid completion curl -s https://inference.local/v1/chat/completions \ -H Content-Type: application/json \ -d '{ model : llama3.2:1b , messages :[{ role : user , content : say hello }]}'Response:
{ id : chatcmpl-100 , object : chat.completion , created :1773620673, model : llama3.2:1b , choices :[{ index :0, message :{ role : assistant , content : Hello. How can I assist you today? }, finish_reason : stop }], usage :{ prompt_tokens :27, completion_tokens :10, total_tokens :37}}Steps to reproduce
- Complete
- Set inference route to ollama-local:
- Verify inference works inside sandbox (pre-condition above)
- Stop Ollama on the host:
- Verify Ollama is down:
- Inside sandbox, re-run inference:
curl -s https://inference.local/v1/chat/completions \
-H Content-Type: application/json \
-d '{ model : llama3.2:1b , messages :[{ role : user , content : say hello }]}'
Actual Resultnemoclaw setupsuccessfullyopenshell inference set --provider ollama-local --model llama3.2:1bpkill ollamacurl -s http://localhost:11434/api/tagsreturns connection refusedSteps to reproduce
No steps provided.
[NVB# 5982629]
[NVB#5982629]