Conversation
Contributor
Author
|
@devarajabc : Help validate on Apple M2 and M5 based machines Steps: make distclean
make test-rosetta-all |
|
Still the same on the Mac M5: |
Contributor
Author
Always describe the hardware configurations properly. |
|
All tests passed on the Mac M5. |
Collaborator
|
All test pass on M4 |
This enables elfuse to load and run statically-linked x86_64-linux ELF
binaries by hosting Apple's Rosetta Linux translator inside the guest.
The translator lives in the primary buffer at a low GPA but is exposed
at its statically-linked virtual address (0x800000000000) via a
non-identity page-table mapping. This works around the macOS HVF 36-bit
Stage-2 IPA cap on M1 while preserving the translator's link-time
addresses, and broadens application coverage to pre-compiled x86-64 Linux
software without paying box64's per-instruction translation tax.
Runtime layers landed
- Initial-load + bootstrap: src/core/rosetta.{c,h} loads the Rosetta
ELF, places its image and the TTBR1 kbuf in the primary buffer,
builds binfmt-misc argv. src/core/guest.{c,h} gains
guest_install_kbuf_user_alias and guest_install_va_pages.
src/core/bootstrap.{c,h} wires want_rosetta detection and exposes
guest_bootstrap_rosetta_post_reset so both initial-load and execve
share the rosetta setup sequence.
- VZ ioctl gate: src/syscall/io.c traps CHECK (0x80456125), CAPS
(0x80806123), and ACTIVATE (0x6124). The gate keys on
g->is_rosetta plus F_GETPATH match against ROSETTA_PATH so
rosetta's /proc/self/exe open triggers it.
- /proc/self/exe redirection: src/runtime/procemu.c redirects open
and readlink to ROSETTA_PATH when proc_rosetta_active() is set.
Matches the binfmt-misc convention that exposes the interpreter as
the running image.
- rosettad SCM_RIGHTS bridge: src/core/rosetta.c implements the
socket bridge, AOT cache (SHA-256-keyed under
$HOME/.cache/elfuse-rosettad), and a single-bridge guard via
atomic_compare_exchange_strong on a process-global atomic. Socket
interception in src/syscall/net.c forwards translator requests.
- High-VA mmap: src/syscall/mem.c sys_mmap_fixed_high_va detects
fresh vs pre-existing 2 MiB blocks via guest_va_block_mapped.
Fresh blocks get split-inherited L3 entries zeroed; pre-existing
blocks survive so prior mmaps in the same block keep their PTEs.
Post-loop guest_install_va_pages installs the requested-range L3
entries with the correct backing GPA. Fail rollback narrowed to
[addr, addr+length) so partial-failure neighbors survive.
- Process fork: src/runtime/forkipc.c removes the early reject. IPC
v10 already carried rosetta state, kbuf_gpa, ttbr1, and (via
ipc_registers_t) ttbr1_el1 / tcr_el1. guest_get_used_regions
snapshots the rosetta image and kbuf so a fork child inherits them
through the region-copy IPC path. The CoW shm fast-path stays
disabled for rosetta because HVF caches host VA-to-PA at
hv_vm_map time and the parent's MAP_SHARED slab cannot be remapped
under a live vCPU.
- Mid-process execve: src/syscall/exec.c drops the prior -ENOEXEC
rejection. guest_clear_rosetta_state is gated on leaving rosetta;
rosetta-to-rosetta keeps placement so rosetta_prepare's execve
re-entry branch reuses guest_base + ttbr1 + kbuf_gpa. The new
branch drains the prior bridge via rosettad_wait_for_idle, calls
guest_bootstrap_rosetta_post_reset (host vs guest path threaded
explicitly), then writes TTBR0 + TCR_EL1_VALUE_KBUF +
TTBR1_EL1=g->ttbr1 for kernel-VA execution.
- Long-path workaround: rosettad_set_binary_path publishes
/proc/self/fd/3 to the 42-byte VZ_CAPS field when the host path is
longer. rosetta_finalize pre-opens the target at guest fd 3 so
rosetta can reopen via that proc path regardless of original
length. The full path is preserved in a separate buffer for the
host-side translator subprocess. Both buffers are accessed under
a shared pthread_mutex via snapshot-into-caller-buffer helpers so
a multi-vCPU guest doing concurrent execves cannot observe a torn
string in the VZ_CAPS payload.
- M5 host support: HVF on newer Apple Silicon rejects the 1 TiB
primary slab with HV_BAD_ARGUMENT even when max_ipa >= 40.
guest_init now bisects the slab size (1 TiB -> 256 GiB -> 64 GiB)
on hv_vm_map failure, decoupled from the VM IPA width which stays
at 48 for rosetta so high-VA Stage-2 still resolves. forkipc.c
widens the fork-child ipa_bits range to [36, 48].
CLI surface
- src/main.c adds --no-rosetta and ELFUSE_NO_ROSETTA=1 opt-out;
rosetta is default-on with ELF-header auto-detect. --gdb is
refused for x86_64 because the GDB stub serves the aarch64 view
rosetta produces, not the original x86_64 architectural state.
A 'rosettad translate' subcommand re-execs into Apple's translator
for AOT cache priming.
Side fixes folded in
- src/syscall/time.c: dynamic CPU clock decoder accepts self-process
and self-thread encoded ids (additive); foreign ids and the
reserved 0b11 type-bit pattern return -EINVAL.
- src/syscall/fuse.c: fuse_materialize_fd opens a guest FUSE fd to a
host path so the high-VA mmap path can back FUSE-backed mappings.
- Remove tests/haskell scratch files.
Known limits
- Dynamic x86_64 binaries (PT_INTERP) fail at interpreter mmap with
"failed to mmap segment: 12"; covered as a known-fail probe.
- Acceptance corpus (Alpine x86_64 + pandoc + glibc x86_64 sysroot
on M2/M3+ hosts) is wired but needs operator action.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This enables elfuse to load and run statically-linked x86_64-linux ELF binaries by hosting Apple's Rosetta Linux translator inside the guest. The translator lives in the primary buffer at a low GPA but is exposed at its statically-linked virtual address (0x800000000000) via a non-identity page-table mapping. This works around the macOS HVF 36-bit Stage-2 IPA cap on M1 while preserving the translator's link-time addresses, and broadens application coverage to pre-compiled x86_64 Linux software without paying box64's per-instruction translation tax.
Bootstrap path:
Runtime gates:
High-VA mmap:
CLI surface:
execve fidelity:
Known limits:
Summary by cubic
Add x86_64‑via‑Apple‑Rosetta support to run statically linked x86_64 Linux ELFs on macOS through elfuse. Includes high‑VA mapping, TTBR1 kbuf, runtime gates, CLI controls, resilient slab sizing, and a long‑path fd fallback; supports rosetta‑to‑rosetta execve.
New Features
TCR_EL1_VALUE_KBUF)./proc/self/exe; add arosettadSCM_RIGHTS bridge with SHA‑256 AOT cache and socket shims; addfuse_materialize_fd.--no-rosettaandELFUSE_NO_ROSETTA=1to opt out; refuse--gdbfor x86_64; addrosettad translate.hv_vm_mapfailure (1 TiB → 256 GiB → 64 GiB); keepvm_ipa=48for Rosetta; fork path inherits placement and kbuf (validator acceptsipa_bitsup to 48).Migration
execve,--gdbon x86_64./proc/self/fd/3) fallback to bypass Rosetta’s 42‑byte caps field; short paths still work. Use--no-rosettaorELFUSE_NO_ROSETTA=1to disable.Written for commit c12dd42. Summary will update on new commits. Review in cubic