Discrepancies in ONNX Runtime Inference Results on RISC-V

### Describe the issue

We have successfully built ONNX Runtime Python wheels targeting the RISC-V architecture, using both the cross-compilation process outlined in the documentation and an emulated RISC-V Docker container running Ubuntu 22.04. Both builds completed without errors.

However, after training a model in PyTorch and exporting it to the ONNX format, we observed that the inference results from the ONNX Runtime Python package vary significantly across platforms. Specifically, the results from the RISC-V wheels we built (both the cross-compiled and the emulated versions) do not match the expected outputs seen from running inference in PyTorch before the ONNX export, nor do they match the outputs produced by the ONNX Runtime x64 wheel on the same model.

This leads us to believe that the issue lies in the ONNX Runtime's support for RISC-V.

### Example Outputs

To illustrate the discrepancy, after training a PyTorch model, we get the following outputs for the input `[0]` when using the pre-built ONNX Runtime wheels for x64:

```python
[array([[ -9.021126,  17.9599  , -18.350208, -11.425449]], dtype=float32)]
```

In contrast, the output from the RISC-V wheel for the same model and input is:

```python
[array([[  5.5013514 , -13.528254  ,  -8.2745905 ,  -0.89257914]], dtype=float32)]
```

Both outputs are from the same model, using the same input, highlighting the inconsistency.

### Investigation

Through extensive troubleshooting, we have identified that this discrepancy occurs specifically when using `torch.nn.Linear` layers. Basic arithmetic operators (e.g., `+`, `-`, `*`, `/`) do not cause any issues. Furthermore, exporting the model using PyTorch's `.pth` format and running inference in a RISC-V environment works as expected, further reinforcing that the issue may reside within ONNX Runtime's handling of RISC-V architectures.
We are fairly sure this is a problem in ONNX Runtime since we have tested the model export using Pytorch's `.pth` format and it has worked fine in the RISC-V environment.

### Reproduction

We have included the PyTorch training code, the Dockerfile for the build environment, and the scripts used to compare inference results between the platforms below.

#### Model Training Code
```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np

np.random.seed(69)
torch.manual_seed(69)

# Define the dataset class
class CustomDataset(Dataset):
    def __init__(self, inputs, outputs):
        self.inputs = inputs
        self.outputs = outputs

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        input_val = self.inputs[idx]
        output_val = self.outputs[idx]
        return input_val, output_val

# Define the model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1, 20)  # input layer (1) -> hidden layer (20)
        self.fc2 = nn.Linear(20, 20)  # hidden layer (20) -> hidden layer (20)
        self.fc3 = nn.Linear(20, 4)  # hidden layer (20) -> output layer (4)

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # activation function for hidden layer
        x = torch.relu(self.fc2(x))  # activation function for hidden layer
        x = self.fc3(x)
        return x

# Define the inputs and outputs
inputs = np.array([0, 4, 8, 9, 10, 14, 15])
outputs = np.array([1, 1, 2, 2, 1, 2, 2])

# Create the dataset and data loader
dataset = CustomDataset(inputs.reshape(-1, 1), outputs)
data_loader = DataLoader(dataset, batch_size=7, shuffle=False)

# Initialize the model, loss function, and optimizer
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Train the model
for epoch in range(1000):  # loop over the dataset multiple times
    for i, data in enumerate(data_loader, 0):
        inputs, labels = data
        inputs = torch.tensor(inputs, dtype=torch.float32)
        labels = torch.tensor(labels, dtype=torch.long)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    # print statistics
    if epoch % 100 == 0:
        print('Epoch %d, Loss: %.3f' % (epoch+1, loss.item()))

# Evaluate the model
def evaluate_model(model, inputs, expected_outputs):
    inputs = torch.tensor(inputs.reshape(-1, 1), dtype=torch.float32)
    outputs = model(inputs)
    _, predicted = torch.max(outputs, dim=1)
    print('Expected Outputs: ', expected_outputs)
    print('Predicted Outputs: ', predicted.detach().numpy())

# Test the model
test_inputs = np.array([0, 4, 8, 9, 10, 14, 15])
test_outputs = np.array([1, 1, 2, 2, 1, 1, 2])
evaluate_model(model, test_inputs, test_outputs)

# Export the model to ONNX
dummy_input = torch.randn(1, 1)
torch.onnx.export(model, dummy_input, 'winner.onnx', export_params=True, opset_version=17)
```

#### Dockerfile for Build Environment
```Dockerfile
FROM --platform=linux/riscv64 ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    python3 \
    python3-pip \
    python3-dev \
    python3-numpy \
    libssl-dev \
    wget \
    && apt-get clean
WORKDIR /workspace
CMD ["/bin/bash"]
```
We also built CMake from source in order to obey the version requirements for building ONNX Runtime. I have pushed an image to Docker Hub with CMake 3.30 installed to save you the hassle: `docker pull sarmentow/onnxruntime-build-env-with-cmake`.

#### ONNX Runtime Inference Comparison Code
```python
import onnxruntime as ort
import numpy as np
MODEL_INPUT_SHAPE = (1,1)
model_path = "simple_nn.onnx"
session = ort.InferenceSession(model_path)
input_names = [session.get_inputs()[0].name]
output_names = [session.get_outputs()[0].name]

for i in [0, 4, 8, 9, 10, 14, 15]:
    print("Input:", i)
    outputs = session.run(output_names, {input_names[0]: np.array([i], dtype=np.float32).reshape(1, 1)})
    print(outputs)
    print(np.argmax(outputs))
```

### Build Process

We used the following command to build the ONNX Runtime wheel for RISC-V (the build.py file at `tool/ci_build/build.py`:
```sh
python3 build.py --parallel 6 --config MinSizeRel --skip_tests --cmake_extra_defines CMAKE_CXX_FLAGS="-pthread -latomic ${CMAKE_CXX_FLAGS}" CMAKE_C_FLAGS="-pthread -latomic ${CMAKE_C_FLAGS}" CMAKE_EXE_LINKER_FLAGS="-pthread -latomic" --enable_pybind --build_wheel --wheel_name_suffix=riscv --build_dir=/workspace/onnxruntime/build/older --compile_no_warning_as_error --allow_running_as_root
```

### Testing Environment

We utilized the following Dockerfile for the testing environment:
```Dockerfile
FROM --platform=linux/riscv64 ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
RUN <<EOF
set -e
apt-get update
apt-get install -y --no-install-recommends \
  busybox-static=1:1.30.1-7ubuntu3 \
  python3 python3-pip python3-dev \
  libatomic1 \
  libopenblas-dev
EOF

WORKDIR /workspace
COPY ./requirements.txt .
COPY ./winner.onnx .

RUN <<EOF
set -e
pip install -r requirements.txt
EOF


COPY ./simple_inference.py .

ENTRYPOINT ["bash"]

```

### Dependencies

We installed ONNX Runtime and other necessary packages as listed in the `requirements.txt` file:
```
https://github.com/sarmentow/riscv-wheels/raw/refs/heads/main/onnxruntime_riscv-1.19.2-cp310-cp310-linux_riscv64.whl
-i https://think-and-dev.github.io/riscv-python-wheels/pip-index/
numpy == 1.26.2
```
We used an alternative pip index to install Numpy RISC-V wheels, which we believe are not causing the issue.

### Conclusion

Based on our testing, it seems that the issue is specific to the ONNX Runtime's support for RISC-V, particularly when using certain layers such as `torch.nn.Linear`. One linear layer is enough to see the discrepancies between platforms. We hope this information helps in diagnosing the problem, and we are happy to assist further if needed.

Thank you for your attention to this matter. We look forward to your insights.

### Urgency

The issue is urgent as my team depends on this functionality to ship a project this week. We'd be extremely grateful for some attention on this.

### Target platform

RISC-V

### Build script

We used the following command to build the ONNX Runtime wheel for RISC-V (the build.py file at `tool/ci_build/build.py`:
```sh
python3 build.py --parallel 6 --config MinSizeRel --skip_tests --cmake_extra_defines CMAKE_CXX_FLAGS="-pthread -latomic ${CMAKE_CXX_FLAGS}" CMAKE_C_FLAGS="-pthread -latomic ${CMAKE_C_FLAGS}" CMAKE_EXE_LINKER_FLAGS="-pthread -latomic" --enable_pybind --build_wheel --wheel_name_suffix=riscv --build_dir=/workspace/onnxruntime/build/older --compile_no_warning_as_error --allow_running_as_root
```

Inside a container running the Docker image at  `sarmentow/onnxruntime-build-env-with-cmake`

### Error / output

To illustrate the discrepancy, after training a PyTorch model, we get the following outputs for the input `[0]` when using the pre-built ONNX Runtime wheels for x64:

```python
[array([[ -9.021126,  17.9599  , -18.350208, -11.425449]], dtype=float32)]
```

In contrast, the output from the RISC-V wheel for the same model and input is:

```python
[array([[  5.5013514 , -13.528254  ,  -8.2745905 ,  -0.89257914]], dtype=float32)]
```

Both outputs are from the same model, using the same input, highlighting the inconsistency.


### Visual Studio Version

_No response_

### GCC / Compiler Version

11.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies in ONNX Runtime Inference Results on RISC-V #22530

Describe the issue

Example Outputs

Investigation

Reproduction

Model Training Code

Dockerfile for Build Environment

ONNX Runtime Inference Comparison Code

Build Process

Testing Environment

Dependencies

Conclusion

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancies in ONNX Runtime Inference Results on RISC-V #22530

Description

Describe the issue

Example Outputs

Investigation

Reproduction

Model Training Code

Dockerfile for Build Environment

ONNX Runtime Inference Comparison Code

Build Process

Testing Environment

Dependencies

Conclusion

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions