Skip to content

Halve engineLoadTXD memory by moving textures to D3DPOOL_DEFAULT (#4062)#4898

Open
Zephkek wants to merge 1 commit intomultitheftauto:masterfrom
Zephkek:fix/4062-d3d-pool-conversion
Open

Halve engineLoadTXD memory by moving textures to D3DPOOL_DEFAULT (#4062)#4898
Zephkek wants to merge 1 commit intomultitheftauto:masterfrom
Zephkek:fix/4062-d3d-pool-conversion

Conversation

@Zephkek
Copy link
Copy Markdown
Contributor

@Zephkek Zephkek commented May 6, 2026

Summary

GTA SA's RenderWare loader hardcodes D3DPOOL_MANAGED for every script-loaded TXD texture. That creates a system-memory copy for every texture byte already in VRAM, which is the memory doubling reported in #4062.

This PR converts script-loaded TXD textures to D3DPOOL_DEFAULT after CRenderWareSA::ReadTXD finishes. That removes the managed shadow copy and adds device-reset recovery so textures survive alt-tab and display resets.

The memory saving happens as soon as engineLoadTXD returns.

Path Master, per 10 MB TXD This PR Saving
File path, common for vehicle and skin packs ~10 MB sysmem shadow + 10 MB VRAM 0 MB sysmem + 10 MB VRAM ~50% per TXD
Raw-data path, engineLoadTXD with a buffer ~10 MB m_FileData + 10 MB sysmem shadow + 10 MB VRAM ~10 MB m_FileData + 10 MB VRAM ~33% per TXD

Coverage in this PR includes regular 2D rasters and cube maps.

Palettised rasters and rasters using D3DUSAGE_AUTOGENMIPMAP stay managed by design.

Motivation

Resolves #4062.

The reporter measured a 10 MB TXD costing about 20 MB of working set. The loader path explains why: D3DResourceSystem::CreateTexture at 0x730510 always passes D3DPOOL_MANAGED to IDirect3DDevice9::CreateTexture. The cube branch inside _rwD3D9NativeTextureRead at 0x4CD982 does the same with CreateCubeTexture.

Per the D3D9 spec, managed resources keep a system-memory copy alongside their VRAM copy so the runtime can restore them automatically after a device loss. For replacement TXDs this is wasteful, because these textures are static for the lifetime of the script and MTA can rebuild them during reset.

How It Works

After RwTexDictionaryGtaStreamRead decodes the TXD, this PR walks every raster and creates a new D3DPOOL_DEFAULT texture.

For 2D rasters, it allocates a fresh IDirect3DTexture9.

For cube rasters, it allocates a fresh IDirect3DCubeTexture9.

Each mip is copied through a temporary D3DPOOL_SYSTEMMEM scratch texture using IDirect3DDevice9::UpdateTexture. That is the documented path for copying into default-pool resources without involving the managed pool.

Once the copy succeeds, the original managed texture is released and rasterExt->texture is replaced with the new default-pool texture.

Important details:

  • Managed-only texture cache: D3DResourceSystem::DestroyTexture feeds gD3DTextureBuffer, which recycles managed textures by width, height, format, and mip count. A default-pool texture must not enter that cache. The destroy intercept in CRenderWareSA::DestroyTexture nulls rasterExt->texture before RwTextureDestroy runs, so _rwD3D9RasterDestroy hits its existing null early-out at 0x4CBC1A.
  • Shader matching: ScriptAddedTxd keys m_D3DDataTexInfoMap by the current IDirect3DTexture9 pointer. Conversion runs before ScriptAddedTxd, so the map stores the final default-pool pointer that the renderer will bind. Custom shaders applied through engineApplyShaderToWorldTexture continue matching the replaced texture.
  • Cube maps: Cube conversion follows the same pipeline with CreateCubeTexture and a face-by-level copy loop. The RwD3D9Raster::texture field is already used by GTA as a union slot for cube textures at 0x4CD982, so the conversion swaps that same slot.
  • OOM behavior: CreateTexture and CreateCubeTexture retry once after IDirect3DDevice9::EvictManagedResources when D3DERR_OUTOFVIDEOMEMORY is returned. This mirrors the behavior already used by CDirect3DEvents9::CreateTexture.
  • Safe fallback: Anything that cannot be safely converted stays managed. That includes palettised rasters, autogen-mipmap rasters, unknown formats, lock failures, driver oddities, and out-of-VRAM failures after retry.
  • Stale lock state: When replacing the texture pointer, the code clears lockedLevel, lockedSurface, and lockedRect on the raster extension so no later code sees a pointer to a freed surface.
  • Device reset: Default-pool resources are destroyed on device loss. CRenderWareSA now has OnDeviceLost and OnDeviceReset hooks called from CGraphics::OnDeviceInvalidate and CGraphics::OnDeviceRestore. CClientTXD implements CRwReplacementOwner and rebuilds TXDs on reset. File-path TXDs are read again from disk. Raw-data TXDs rebuild from the existing m_FileData buffer.

Skipped Cases

These remain D3DPOOL_MANAGED intentionally:

  • Palettised rasters: D3DFMT_P8 textures with the 1024-byte palette buffer are not reliable as default-pool textures on modern hardware.
  • Autogen-mipmap rasters: D3DUSAGE_AUTOGENMIPMAP is incompatible with D3DPOOL_SYSTEMMEM, which breaks the staging-copy path.
  • Unknown formats: Any format outside the explicit byte-count table is skipped instead of guessing row pitch.
  • Per-texture failures: If a texture cannot be converted safely, only that texture stays managed. The TXD still loads and renders normally.

Memory Profile

Trace from a single engineLoadTXD on test1.txd, captured with an instrumented build:

engineLoadTXD: test1.txd
  process private working set delta after call returns
    master  +3.8 MB sysmem  +3.8 MB vram
    PR       0.0 MB sysmem  +3.8 MB vram

The 3.8 MB drop is the managed shadow copy reported in #4062. VRAM usage is unchanged because the texture still needs to live on the GPU.

There is a short per-texture peak during conversion:

xychart-beta
    title "sysmem cost during conversion of one N-byte texture"
    x-axis "step" 0 --> 6
    y-axis "multiples of N" 0 --> 2
    line [1, 1, 2, 2, 2, 1, 0]
Loading
Step Action Sysmem
0 Managed texture exists after RenderWare load N
1 Create default-pool texture N
2 Create system-memory scratch texture 2N peak
3 Lock and copy mips from managed to system memory 2N
4 Upload system-memory texture to default-pool texture 2N
5 Release system-memory scratch texture N
6 Release managed texture 0 permanent

The peak is per texture, not per TXD. The conversion loop processes rasters one at a time, so earlier textures have already released their managed shadows by the time later textures are converted.

Worst case during a single engineLoadTXD is about twice the size of the largest individual texture in the TXD, lasting for one UpdateTexture call. Once engineLoadTXD returns, the sysmem cost is gone.

For cube maps, the same pattern applies across 6 faces. In practice, cube replacement textures are usually small environment maps.

Test Plan

Reviewers and testers, please run through the checklist below.

Memory Baseline
  • Load a real vehicle pack through engineLoadTXD with a file path, then apply it with engineImportTXD. Compare process working set against an unmodified master build with the same resource. Expected delta is roughly the decoded TXD size.
  • Repeat with the raw-data path: fileOpen, fileRead, and engineLoadTXD(buffer). Expected delta is still roughly the decoded TXD size. The managed shadow is gone, while m_FileData remains as it already did on master.
  • Load several large TXDs at once. A pack of 100 TXDs at 4 MB each should save roughly 400 MB of sysmem compared with master.
  • Test a cube-heavy TXD, such as vehicle reflection environment maps. Memory should drop the same way as 2D textures, with a smaller absolute saving.
Visual Correctness
  • Custom vehicle skin: spawn the vehicle, drive it, alt-tab out and back.
  • Custom ped skin: spawn a player and inspect it in third person and first person.
  • Custom object texture: spawn the object and view it from multiple angles.
  • One engineLoadTXD followed by several engineImportTXD calls onto different model IDs. All target models should show the replacement.
  • Vehicle with a custom env-map cube TXD, such as a replacement for vehicleenvmap128. Reflections should look correct from all angles.
  • Mix one 2D-only TXD and one cube TXD in the same resource. Both should render correctly.
Custom Shaders
  • Apply a custom shader with engineApplyShaderToWorldTexture to a texture that is also replaced by a script TXD. The shader should still bind to the replaced texture after conversion.
  • Repeat with a cube replacement texture if you have a setup for it.
Device Reset
  • With custom TXDs applied, alt-tab out of fullscreen and back in. File-path TXDs should restore transparently.
  • Repeat with raw-data TXDs whose m_FileData buffer is still alive. They should restore too.
  • Change display mode at runtime to force a reset.
  • Reset with converted cube TXDs in use. All 6 faces should return correctly.
  • TXDs that cannot be rebuilt should leave affected models on original textures instead of rendering garbage.
  • Stop a resource between OnDeviceLost and OnDeviceReset. This should not crash or leak.
Lifecycle
  • Stop the resource. TXD destruction should not crash, models should restream their original textures, and process memory should return to baseline.
  • Restart the resource repeatedly. There should be no leaks. m_DefaultPoolRasters should return to zero between cycles.
  • Clothes path: resources using CLOTHES_TEX_ID_FIRST..LAST model IDs should still work.
  • Call engineLoadTXD, import first to a clothes ID, then import to a model ID afterward. The lazy reload and reconversion path should not get confused by clothes-path cleanup.
Edge Formats
  • DXT1, DXT3, DXT5, A8R8G8B8, R5G6B5, and L8 should convert cleanly.
  • A cube TXD with DXT5 faces should convert and render.
  • A cube TXD with uncompressed A8R8G8B8 faces should convert and render.
  • A palettised TXD using D3DFMT_P8 should load and render normally. It should stay managed by design.
  • A TXD with autoMipMaps set in the native header should load and render normally. It should stay managed by design.
  • A TXD with an unsupported exotic format should stay managed with no visible breakage.
Right-Sizing
  • Enable RightSizeTxd in core settings and load a TXD large enough to trigger shrinking. The shrink path uses RwTexDictionaryGtaStreamRead directly, not ReadTXD, so the shrunk output should still load through the converted path normally.
OOM And Low VRAM
  • Force a low-VRAM scenario with a small GPU or many preloaded models, then load a heavy TXD pack. The conversion should retry after EvictManagedResources and either succeed or cleanly fall back to managed for that texture. No half-converted or broken TXDs should appear.

Risks

Device reset now rebuilds script-loaded default-pool TXDs sequentially. If a server has hundreds of replacement TXDs loaded, alt-tab or display reset can cause a few seconds of stutter. That is acceptable for a rare path and avoids the permanent memory cost.

Cube conversion follows the same native flow GTA uses: same CreateCubeTexture arguments, same union-slot pointer handling, and the same destroy path. Cube replacements are less common than 2D replacements, so extra real-world cube TXDs are useful for testing.

DFFs and COLs are intentionally out of scope. COLs do not allocate D3D resources. DFF vertex and index buffers are managed too, but some are locked at runtime for animation and skinning. That is a separate, riskier change that needs its own PR and test plan.

Checklist

  • Code follows the coding guidelines, with clang-format applied.
  • Smaller pull requests are easier to review. If your pull request is beefy, your pull request should be reviewable commit-by-commit.

@Zephkek Zephkek force-pushed the fix/4062-d3d-pool-conversion branch from 782358f to 02c820b Compare May 7, 2026 12:44
GTA SA's RenderWare loader (D3DResourceSystem::CreateTexture @ 0x730510
and the inline CreateCubeTexture branch in _rwD3D9NativeTextureRead @
0x4CD982) hardcodes D3DPOOL_MANAGED for every texture in a TXD. MANAGED
textures are mirrored 1:1 in system memory so D3D9 can auto-restore them
on a device reset, which is exactly what makes a 10 MB DXT TXD count as
~20 MB of working set in MTA's process.

After RW finishes decoding a script-loaded TXD in CRenderWareSA::ReadTXD
we now walk every raster, allocate a fresh IDirect3DTexture9 (or
IDirect3DCubeTexture9) in D3DPOOL_DEFAULT, copy each mip through a
transient SYSTEMMEM scratch + UpdateTexture, release the original
MANAGED texture, and swap the pointer in rasterExt->texture. The release
deliberately bypasses D3DResourceSystem::DestroyTexture so the
gD3DTextureBuffer cache (MANAGED-only, keyed by w/h/format/levels) never
sees DEFAULT-pool entries; the destroy intercept in
CRenderWareSA::DestroyTexture NULLs rasterExt->texture before
RwTextureDestroy so _rwD3D9RasterDestroy hits its existing early-out.

Conversion happens before ScriptAddedTxd so the shader-matching map
(m_D3DDataTexInfoMap) is keyed against the new IDirect3DTexture9
pointer the renderer will see when GTA later calls SetTexture.

Coverage:
  - Regular 2D rasters (RwRaster::type == 4, no cube flag, no palette,
    no D3DUSAGE_AUTOGENMIPMAP): converted via CreateTexture.
  - Cube-map rasters (cubeTextureFlags & 0x01): converted via
    CreateCubeTexture; both faces and sub-levels are explicitly locked
    on the SYSTEMMEM scratch so UpdateTexture copies the full mipchain
    of every face. EdgeLength == desc.Width == desc.Height (D3D9
    enforces square cube faces).
  - Palettised rasters (P8 + 1024-byte palette buffer) and rasters
    flagged with auto-mipmap generation are skipped: D3DUSAGE_AUTOGENMIPMAP
    is incompatible with D3DPOOL_SYSTEMMEM and palette-format DEFAULT
    textures aren't supported by modern drivers; both paths leave the
    raster MANAGED so behaviour matches today.
  - Unrecognised formats (anything not in our explicit byte-count table)
    stay MANAGED.
  - CreateTexture / CreateCubeTexture for the DEFAULT pool retry once
    after IDirect3DDevice9::EvictManagedResources on
    D3DERR_OUTOFVIDEOMEMORY, mirroring CDirect3DEvents9::CreateTexture.

Because DEFAULT-pool resources are auto-destroyed on a D3D9 cooperative
loss, CRenderWareSA gains OnDeviceLost / OnDeviceReset hooks called from
CGraphics::OnDeviceInvalidate / OnDeviceRestore around the existing
CRenderItemManager handling. CClientTXD implements a new
CRwReplacementOwner interface and re-decodes on Reset: file-path TXDs
re-read from disk, raw-data TXDs use the m_FileData buffer that
LoadFromBuffer already keeps for the clothes system.

Memory impact for a 10 MB TXD:
  - file path (the common case for vehicle/skin packs):
    ~10 MB sysmem dropped, VRAM usage unchanged. ~50% total saving.
  - raw-data path (rare; engineLoadTXD with a buffer):
    m_FileData was already kept by master for clothes compatibility, so
    the MANAGED shadow is replaced by that buffer at no net memory cost.

For texture-heavy servers (large vehicle / skin / map packs) this
typically reclaims hundreds of megabytes of sysmem with zero per-frame
cost (DEFAULT and MANAGED render identically once bound).
@Zephkek Zephkek force-pushed the fix/4062-d3d-pool-conversion branch from 02c820b to f643640 Compare May 7, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TXD loading uses double memory

1 participant