Skip to content

Add x86_64-via-Apple-Rosetta translator support#41

Merged
jserv merged 1 commit into
mainfrom
rosetta
May 24, 2026
Merged

Add x86_64-via-Apple-Rosetta translator support#41
jserv merged 1 commit into
mainfrom
rosetta

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 23, 2026

This enables elfuse to load and run statically-linked x86_64-linux ELF binaries by hosting Apple's Rosetta Linux translator inside the guest. The translator lives in the primary buffer at a low GPA but is exposed at its statically-linked virtual address (0x800000000000) via a non-identity page-table mapping. This works around the macOS HVF 36-bit Stage-2 IPA cap on M1 while preserving the translator's link-time addresses, and broadens application coverage to pre-compiled x86_64 Linux software without paying box64's per-instruction translation tax.

Bootstrap path:

  • src/core/rosetta.{c,h} loads the Rosetta ELF, places its image and the TTBR1 kbuf inside the primary buffer, and builds a binfmt_misc argv [ROSETTA_PATH, binary, original argv[1..]].
  • src/core/guest.{c,h} gains guest_install_kbuf_user_alias to mirror the kbuf at KBUF_USER_VA under TTBR0 (for the translator's tagged pointer extraction that strips bits 63:48), and guest_install_va_pages to write 4 KiB L3 PTEs directly with the correct backing GPA.
  • src/runtime/forkipc.c + fork-state.h bump IPC to v10 to propagate rosetta_guest_base, rosetta_va_base, rosetta_size, kbuf_gpa, ttbr1, and TTBR1_EL1 across fork.

Runtime gates:

  • src/syscall/io.c traps the VZ ioctl trio (CHECK 0x80456125, CAPS 0x80806123, ACTIVATE 0x6124) and synthesises the responses the translator expects. The gate keys on g->is_rosetta plus an F_GETPATH match against the rosetta translator path so the trio triggers when rosetta opens /proc/self/exe.
  • src/runtime/procemu.c redirects /proc/self/exe (open and readlink) to the rosetta path when active, matching the binfmt_misc convention that exposes the interpreter as the running image.
  • src/core/rosetta.c implements the rosettad SCM_RIGHTS bridge. Socket interception in src/syscall/net.c forwards translation requests to a host-side handler thread that hashes the input binary, looks up a SHA-256-keyed AOT cache, and returns the translated artifact fd via SCM_RIGHTS.

High-VA mmap:

  • src/syscall/mem.c adds sys_mmap_fixed_high_va for MAP_FIXED requests above guest_size. Each iteration detects freshness of the enclosing 2 MiB block before guest_map_va_range so split-inherited L3 entries in fresh blocks get zeroed (no gap-page perm inheritance) while pre-existing blocks are untouched (preserving earlier mmaps in the same block). guest_install_va_pages then installs L3 PTEs for the requested range with the correct backing GPA.
  • Fail-path rollback narrowed to [addr, addr+length) so neighbor mappings survive a partial failure.

CLI surface:

  • src/main.c adds --no-rosetta and ELFUSE_NO_ROSETTA=1 to opt out (default-on with ELF-header auto-detect).
  • --gdb is refused for x86_64 guests; the current GDB stub serves the aarch64 view rosetta produces, not the original x86_64 state.
  • A 'rosettad translate' subcommand rewrites argv to invoke the translator directly for AOT cache priming.

execve fidelity:

  • src/syscall/exec.c clears proc_rosetta_active, resets HVF TCR_EL1 to TCR_EL1_VALUE, and clears TTBR1_EL1 on every image transition so a rosetta-to-aarch64 execve sees the right MMU state.

Known limits:

  • Mid-process aarch64-to-x86_64 execve still rejects -LINUX_ENOEXEC; the re-bootstrap path is not yet wired (initial-load only).
  • Dynamic x86_64 binaries fail at interpreter mmap with 'failed to mmap segment: 12'; static binaries work.
  • ROSETTA_CAPS_BINARY_PATH_LEN is 42 bytes; longer paths truncate before rosettad lookup. Test scripts stage /tmp/elfuse-*/ symlink farms to stay inside the cap.

Summary by cubic

Add x86_64‑via‑Apple‑Rosetta support to run statically linked x86_64 Linux ELFs on macOS through elfuse. Includes high‑VA mapping, TTBR1 kbuf, runtime gates, CLI controls, resilient slab sizing, and a long‑path fd fallback; supports rosetta‑to‑rosetta execve.

  • New Features

    • Load Rosetta at low GPA and expose it at 0x800000000000 via a non‑identity mapping; enable TTBR1 kbuf plus a TTBR0 user alias (new TCR_EL1_VALUE_KBUF).
    • High‑VA mmap above guest_size with safe 4 KiB L3 installs and narrow rollback; fixes split‑block aliasing.
    • Gate Rosetta’s VZ ioctl trio; redirect /proc/self/exe; add a rosettad SCM_RIGHTS bridge with SHA‑256 AOT cache and socket shims; add fuse_materialize_fd.
    • CLI: default‑on Rosetta with ELF auto‑detect; --no-rosetta and ELFUSE_NO_ROSETTA=1 to opt out; refuse --gdb for x86_64; add rosettad translate.
    • Mid‑process execve: support rosetta‑to‑rosetta re‑entry; keep correct TTBR0/TCR/TTBR1 state.
    • Resilient slab sizing: decouple VM IPA width from slab size and bisect on hv_vm_map failure (1 TiB → 256 GiB → 64 GiB); keep vm_ipa=48 for Rosetta; fork path inherits placement and kbuf (validator accepts ipa_bits up to 48).
    • Tests/tooling: new x86_64 static, Alpine, CLI, and failure‑mode suites plus a bench harness; CI‑safe busybox wget; futex canonical user‑VA check; improved per‑thread CPU clock IDs.
  • Migration

    • Requires Apple Rosetta for Linux; preflight shows an install hint if missing.
    • Supported: static x86_64 ELFs (initial load and rosetta‑to‑rosetta execve). Not supported: dynamic x86_64, mid‑process aarch64→x86_64 execve, --gdb on x86_64.
    • Long binary paths are handled via a pre‑opened fd 3 (/proc/self/fd/3) fallback to bypass Rosetta’s 42‑byte caps field; short paths still work. Use --no-rosetta or ELFUSE_NO_ROSETTA=1 to disable.

Written for commit c12dd42. Summary will update on new commits. Review in cubic

@jserv
Copy link
Copy Markdown
Contributor Author

jserv commented May 23, 2026

@devarajabc : Help validate on Apple M2 and M5 based machines
@Max042004 : Help validate on Apple M4 based machines

Steps:

make distclean
make test-rosetta-all

@jserv jserv requested a review from Max042004 May 23, 2026 10:37
cubic-dev-ai[bot]

This comment was marked as resolved.

@devarajabc
Copy link
Copy Markdown

devarajabc commented May 23, 2026

Still the same on the Mac M5:

FAIL rosetta-default: stderr did not contain requires the Rosetta Linux translator\|translate produced empty/missing output\|Translation failed, invalid path or invalid executable\|VMAllocationTracker\|Rosetta is only intended to run on Apple Silicon
22:16:23 ERROR src/core/guest.c:382: guest: hv_vm_map failed: -85377021
22:16:23 ERROR src/core/bootstrap.c:293: failed to initialize guest
make: *** [test-rosetta-cli] Error 1
devaraja@Chi-Kuans-MacBook-Air-3 elfuse % git log
commit c437f8e8af19c2eab9eaae92327caf5cda0a7804 (HEAD -> rosetta, origin/rosetta)

@jserv
Copy link
Copy Markdown
Contributor Author

jserv commented May 23, 2026

Still the same:

Always describe the hardware configurations properly.
See if commit 6beb553 helps.

@devarajabc
Copy link
Copy Markdown

All tests passed on the Mac M5.

devaraja@Chi-Kuans-MacBook-Air-3 elfuse % make test-rosetta-all
  GEN     build/dispatch.h
PASS rosetta-disabled-flag
PASS rosetta-disabled-env
PASS rosetta-gdb
PASS rosetta-default
   PASS: no-rosetta-flag (rc=1)
   PASS: no-rosetta-env (rc=1)
   PASS: gdb-x86_64 (rc=1)
   PASS: dynamic-x86_64-segment-mmap (rc=133)
   PASS: mid-process-execve-x86_64 (rc=126)

Results: 5 passed, 0 failed, 0 skipped (of 5)
elfuse:    /Users/devaraja/Work/elfuse/build/elfuse
fixtures:  /tmp/elfuse-r/bin -> /Users/devaraja/Work/elfuse/externals/test-fixtures/x86_64-musl/staticbin/bin
rosetta:   /Library/Apple/usr/libexec/oah/RosettaLinux/rosetta

   PASS: echo
   PASS: true
   PASS: false
   PASS: printenv
   PASS: expr-zero
   PASS: expr-mul
   PASS: basename
   PASS: dirname
   PASS: stat-self
   PASS: factor
   PASS: seq
   PASS: sha256sum
   PASS: md5sum
   PASS: uname-m
   PASS: date-utc
   PASS: id-u
   PASS: nproc
   PASS: env-execve-rejects (rc=126)

Results: 18 passed, 0 failed, 0 skipped (of 18)
elfuse:    /Users/devaraja/Work/elfuse/build/elfuse
fixtures:  /tmp/elfuse-ra/bin -> /Users/devaraja/Work/elfuse/externals/test-fixtures/x86_64-musl/staticbin/bin
rosetta:   /Library/Apple/usr/libexec/oah/RosettaLinux/rosetta

   PASS: cat-fruits-first-line
   PASS: wc-l-fruits
   PASS: wc-l-lines
   PASS: wc-c-lines
   PASS: ls-data
   PASS: stat-data
   PASS: find-by-name
   PASS: du-sk-data
   PASS: sha256-fruits
   PASS: sha256-lines-matches-host
   PASS: sha512-lines
   PASS: md5-fruits
   PASS: cksum-fruits
   PASS: sort-first
   PASS: sort-reverse-first
   PASS: pipe-sort-wc
   PASS: pipe-tr-uppercase
   PASS: pipe-cat-grep
   PASS: pipe-sed-subst
   PASS: pipe-awk-field
   PASS: head-n3
   PASS: tail-n3
   PASS: pipe-sort-uniq
   PASS: pipe-cut-field
   PASS: pipe-rev
   PASS: tac-reverse-first-line
   PASS: seq-1-5
   PASS: seq-step
   PASS: factor-prime
   PASS: factor-composite
   PASS: diff-identical
   PASS: diff-differs (rc=1)
   PASS: pipe-base64-decode

Results: 33 passed, 0 failed, 0 skipped (of 33)

cubic-dev-ai[bot]

This comment was marked as resolved.

@Max042004
Copy link
Copy Markdown
Collaborator

All test pass on M4

This enables elfuse to load and run statically-linked x86_64-linux ELF
binaries by hosting Apple's Rosetta Linux translator inside the guest.
The translator lives in the primary buffer at a low GPA but is exposed
at its statically-linked virtual address (0x800000000000) via a
non-identity page-table mapping. This works around the macOS HVF 36-bit
Stage-2 IPA cap on M1 while preserving the translator's link-time
addresses, and broadens application coverage to pre-compiled x86-64 Linux
software without paying box64's per-instruction translation tax.

Runtime layers landed
- Initial-load + bootstrap: src/core/rosetta.{c,h} loads the Rosetta
  ELF, places its image and the TTBR1 kbuf in the primary buffer,
  builds binfmt-misc argv. src/core/guest.{c,h} gains
  guest_install_kbuf_user_alias and guest_install_va_pages.
  src/core/bootstrap.{c,h} wires want_rosetta detection and exposes
  guest_bootstrap_rosetta_post_reset so both initial-load and execve
  share the rosetta setup sequence.
- VZ ioctl gate: src/syscall/io.c traps CHECK (0x80456125), CAPS
  (0x80806123), and ACTIVATE (0x6124). The gate keys on
  g->is_rosetta plus F_GETPATH match against ROSETTA_PATH so
  rosetta's /proc/self/exe open triggers it.
- /proc/self/exe redirection: src/runtime/procemu.c redirects open
  and readlink to ROSETTA_PATH when proc_rosetta_active() is set.
  Matches the binfmt-misc convention that exposes the interpreter as
  the running image.
- rosettad SCM_RIGHTS bridge: src/core/rosetta.c implements the
  socket bridge, AOT cache (SHA-256-keyed under
  $HOME/.cache/elfuse-rosettad), and a single-bridge guard via
  atomic_compare_exchange_strong on a process-global atomic. Socket
  interception in src/syscall/net.c forwards translator requests.
- High-VA mmap: src/syscall/mem.c sys_mmap_fixed_high_va detects
  fresh vs pre-existing 2 MiB blocks via guest_va_block_mapped.
  Fresh blocks get split-inherited L3 entries zeroed; pre-existing
  blocks survive so prior mmaps in the same block keep their PTEs.
  Post-loop guest_install_va_pages installs the requested-range L3
  entries with the correct backing GPA. Fail rollback narrowed to
  [addr, addr+length) so partial-failure neighbors survive.
- Process fork: src/runtime/forkipc.c removes the early reject. IPC
  v10 already carried rosetta state, kbuf_gpa, ttbr1, and (via
  ipc_registers_t) ttbr1_el1 / tcr_el1. guest_get_used_regions
  snapshots the rosetta image and kbuf so a fork child inherits them
  through the region-copy IPC path. The CoW shm fast-path stays
  disabled for rosetta because HVF caches host VA-to-PA at
  hv_vm_map time and the parent's MAP_SHARED slab cannot be remapped
  under a live vCPU.
- Mid-process execve: src/syscall/exec.c drops the prior -ENOEXEC
  rejection. guest_clear_rosetta_state is gated on leaving rosetta;
  rosetta-to-rosetta keeps placement so rosetta_prepare's execve
  re-entry branch reuses guest_base + ttbr1 + kbuf_gpa. The new
  branch drains the prior bridge via rosettad_wait_for_idle, calls
  guest_bootstrap_rosetta_post_reset (host vs guest path threaded
  explicitly), then writes TTBR0 + TCR_EL1_VALUE_KBUF +
  TTBR1_EL1=g->ttbr1 for kernel-VA execution.
- Long-path workaround: rosettad_set_binary_path publishes
  /proc/self/fd/3 to the 42-byte VZ_CAPS field when the host path is
  longer. rosetta_finalize pre-opens the target at guest fd 3 so
  rosetta can reopen via that proc path regardless of original
  length. The full path is preserved in a separate buffer for the
  host-side translator subprocess. Both buffers are accessed under
  a shared pthread_mutex via snapshot-into-caller-buffer helpers so
  a multi-vCPU guest doing concurrent execves cannot observe a torn
  string in the VZ_CAPS payload.
- M5 host support: HVF on newer Apple Silicon rejects the 1 TiB
  primary slab with HV_BAD_ARGUMENT even when max_ipa >= 40.
  guest_init now bisects the slab size (1 TiB -> 256 GiB -> 64 GiB)
  on hv_vm_map failure, decoupled from the VM IPA width which stays
  at 48 for rosetta so high-VA Stage-2 still resolves. forkipc.c
  widens the fork-child ipa_bits range to [36, 48].

CLI surface
- src/main.c adds --no-rosetta and ELFUSE_NO_ROSETTA=1 opt-out;
  rosetta is default-on with ELF-header auto-detect. --gdb is
  refused for x86_64 because the GDB stub serves the aarch64 view
  rosetta produces, not the original x86_64 architectural state.
  A 'rosettad translate' subcommand re-execs into Apple's translator
  for AOT cache priming.

Side fixes folded in
- src/syscall/time.c: dynamic CPU clock decoder accepts self-process
  and self-thread encoded ids (additive); foreign ids and the
  reserved 0b11 type-bit pattern return -EINVAL.
- src/syscall/fuse.c: fuse_materialize_fd opens a guest FUSE fd to a
  host path so the high-VA mmap path can back FUSE-backed mappings.
- Remove tests/haskell scratch files.

Known limits
- Dynamic x86_64 binaries (PT_INTERP) fail at interpreter mmap with
  "failed to mmap segment: 12"; covered as a known-fail probe.
- Acceptance corpus (Alpine x86_64 + pandoc + glibc x86_64 sysroot
  on M2/M3+ hosts) is wired but needs operator action.
@jserv jserv merged commit b066a0f into main May 24, 2026
4 checks passed
@jserv jserv deleted the rosetta branch May 24, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants