Halve engineLoadTXD memory by moving textures to D3DPOOL_DEFAULT (#4062)#4898
Open
Zephkek wants to merge 1 commit intomultitheftauto:masterfrom
Open
Halve engineLoadTXD memory by moving textures to D3DPOOL_DEFAULT (#4062)#4898Zephkek wants to merge 1 commit intomultitheftauto:masterfrom
Zephkek wants to merge 1 commit intomultitheftauto:masterfrom
Conversation
782358f to
02c820b
Compare
GTA SA's RenderWare loader (D3DResourceSystem::CreateTexture @ 0x730510
and the inline CreateCubeTexture branch in _rwD3D9NativeTextureRead @
0x4CD982) hardcodes D3DPOOL_MANAGED for every texture in a TXD. MANAGED
textures are mirrored 1:1 in system memory so D3D9 can auto-restore them
on a device reset, which is exactly what makes a 10 MB DXT TXD count as
~20 MB of working set in MTA's process.
After RW finishes decoding a script-loaded TXD in CRenderWareSA::ReadTXD
we now walk every raster, allocate a fresh IDirect3DTexture9 (or
IDirect3DCubeTexture9) in D3DPOOL_DEFAULT, copy each mip through a
transient SYSTEMMEM scratch + UpdateTexture, release the original
MANAGED texture, and swap the pointer in rasterExt->texture. The release
deliberately bypasses D3DResourceSystem::DestroyTexture so the
gD3DTextureBuffer cache (MANAGED-only, keyed by w/h/format/levels) never
sees DEFAULT-pool entries; the destroy intercept in
CRenderWareSA::DestroyTexture NULLs rasterExt->texture before
RwTextureDestroy so _rwD3D9RasterDestroy hits its existing early-out.
Conversion happens before ScriptAddedTxd so the shader-matching map
(m_D3DDataTexInfoMap) is keyed against the new IDirect3DTexture9
pointer the renderer will see when GTA later calls SetTexture.
Coverage:
- Regular 2D rasters (RwRaster::type == 4, no cube flag, no palette,
no D3DUSAGE_AUTOGENMIPMAP): converted via CreateTexture.
- Cube-map rasters (cubeTextureFlags & 0x01): converted via
CreateCubeTexture; both faces and sub-levels are explicitly locked
on the SYSTEMMEM scratch so UpdateTexture copies the full mipchain
of every face. EdgeLength == desc.Width == desc.Height (D3D9
enforces square cube faces).
- Palettised rasters (P8 + 1024-byte palette buffer) and rasters
flagged with auto-mipmap generation are skipped: D3DUSAGE_AUTOGENMIPMAP
is incompatible with D3DPOOL_SYSTEMMEM and palette-format DEFAULT
textures aren't supported by modern drivers; both paths leave the
raster MANAGED so behaviour matches today.
- Unrecognised formats (anything not in our explicit byte-count table)
stay MANAGED.
- CreateTexture / CreateCubeTexture for the DEFAULT pool retry once
after IDirect3DDevice9::EvictManagedResources on
D3DERR_OUTOFVIDEOMEMORY, mirroring CDirect3DEvents9::CreateTexture.
Because DEFAULT-pool resources are auto-destroyed on a D3D9 cooperative
loss, CRenderWareSA gains OnDeviceLost / OnDeviceReset hooks called from
CGraphics::OnDeviceInvalidate / OnDeviceRestore around the existing
CRenderItemManager handling. CClientTXD implements a new
CRwReplacementOwner interface and re-decodes on Reset: file-path TXDs
re-read from disk, raw-data TXDs use the m_FileData buffer that
LoadFromBuffer already keeps for the clothes system.
Memory impact for a 10 MB TXD:
- file path (the common case for vehicle/skin packs):
~10 MB sysmem dropped, VRAM usage unchanged. ~50% total saving.
- raw-data path (rare; engineLoadTXD with a buffer):
m_FileData was already kept by master for clothes compatibility, so
the MANAGED shadow is replaced by that buffer at no net memory cost.
For texture-heavy servers (large vehicle / skin / map packs) this
typically reclaims hundreds of megabytes of sysmem with zero per-frame
cost (DEFAULT and MANAGED render identically once bound).
02c820b to
f643640
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GTA SA's RenderWare loader hardcodes
D3DPOOL_MANAGEDfor every script-loaded TXD texture. That creates a system-memory copy for every texture byte already in VRAM, which is the memory doubling reported in #4062.This PR converts script-loaded TXD textures to
D3DPOOL_DEFAULTafterCRenderWareSA::ReadTXDfinishes. That removes the managed shadow copy and adds device-reset recovery so textures survive alt-tab and display resets.The memory saving happens as soon as
engineLoadTXDreturns.engineLoadTXDwith a bufferm_FileData+ 10 MB sysmem shadow + 10 MB VRAMm_FileData+ 10 MB VRAMCoverage in this PR includes regular 2D rasters and cube maps.
Palettised rasters and rasters using
D3DUSAGE_AUTOGENMIPMAPstay managed by design.Motivation
Resolves #4062.
The reporter measured a 10 MB TXD costing about 20 MB of working set. The loader path explains why:
D3DResourceSystem::CreateTextureat0x730510always passesD3DPOOL_MANAGEDtoIDirect3DDevice9::CreateTexture. The cube branch inside_rwD3D9NativeTextureReadat0x4CD982does the same withCreateCubeTexture.Per the D3D9 spec, managed resources keep a system-memory copy alongside their VRAM copy so the runtime can restore them automatically after a device loss. For replacement TXDs this is wasteful, because these textures are static for the lifetime of the script and MTA can rebuild them during reset.
How It Works
After
RwTexDictionaryGtaStreamReaddecodes the TXD, this PR walks every raster and creates a newD3DPOOL_DEFAULTtexture.For 2D rasters, it allocates a fresh
IDirect3DTexture9.For cube rasters, it allocates a fresh
IDirect3DCubeTexture9.Each mip is copied through a temporary
D3DPOOL_SYSTEMMEMscratch texture usingIDirect3DDevice9::UpdateTexture. That is the documented path for copying into default-pool resources without involving the managed pool.Once the copy succeeds, the original managed texture is released and
rasterExt->textureis replaced with the new default-pool texture.Important details:
D3DResourceSystem::DestroyTexturefeedsgD3DTextureBuffer, which recycles managed textures by width, height, format, and mip count. A default-pool texture must not enter that cache. The destroy intercept inCRenderWareSA::DestroyTexturenullsrasterExt->texturebeforeRwTextureDestroyruns, so_rwD3D9RasterDestroyhits its existing null early-out at0x4CBC1A.ScriptAddedTxdkeysm_D3DDataTexInfoMapby the currentIDirect3DTexture9pointer. Conversion runs beforeScriptAddedTxd, so the map stores the final default-pool pointer that the renderer will bind. Custom shaders applied throughengineApplyShaderToWorldTexturecontinue matching the replaced texture.CreateCubeTextureand a face-by-level copy loop. TheRwD3D9Raster::texturefield is already used by GTA as a union slot for cube textures at0x4CD982, so the conversion swaps that same slot.CreateTextureandCreateCubeTextureretry once afterIDirect3DDevice9::EvictManagedResourceswhenD3DERR_OUTOFVIDEOMEMORYis returned. This mirrors the behavior already used byCDirect3DEvents9::CreateTexture.lockedLevel,lockedSurface, andlockedRecton the raster extension so no later code sees a pointer to a freed surface.CRenderWareSAnow hasOnDeviceLostandOnDeviceResethooks called fromCGraphics::OnDeviceInvalidateandCGraphics::OnDeviceRestore.CClientTXDimplementsCRwReplacementOwnerand rebuilds TXDs on reset. File-path TXDs are read again from disk. Raw-data TXDs rebuild from the existingm_FileDatabuffer.Skipped Cases
These remain
D3DPOOL_MANAGEDintentionally:D3DFMT_P8textures with the 1024-byte palette buffer are not reliable as default-pool textures on modern hardware.D3DUSAGE_AUTOGENMIPMAPis incompatible withD3DPOOL_SYSTEMMEM, which breaks the staging-copy path.Memory Profile
Trace from a single
engineLoadTXDontest1.txd, captured with an instrumented build:The 3.8 MB drop is the managed shadow copy reported in #4062. VRAM usage is unchanged because the texture still needs to live on the GPU.
There is a short per-texture peak during conversion:
xychart-beta title "sysmem cost during conversion of one N-byte texture" x-axis "step" 0 --> 6 y-axis "multiples of N" 0 --> 2 line [1, 1, 2, 2, 2, 1, 0]The peak is per texture, not per TXD. The conversion loop processes rasters one at a time, so earlier textures have already released their managed shadows by the time later textures are converted.
Worst case during a single
engineLoadTXDis about twice the size of the largest individual texture in the TXD, lasting for oneUpdateTexturecall. OnceengineLoadTXDreturns, the sysmem cost is gone.For cube maps, the same pattern applies across 6 faces. In practice, cube replacement textures are usually small environment maps.
Test Plan
Reviewers and testers, please run through the checklist below.
Memory Baseline
engineLoadTXDwith a file path, then apply it withengineImportTXD. Compare process working set against an unmodified master build with the same resource. Expected delta is roughly the decoded TXD size.fileOpen,fileRead, andengineLoadTXD(buffer). Expected delta is still roughly the decoded TXD size. The managed shadow is gone, whilem_FileDataremains as it already did on master.Visual Correctness
engineLoadTXDfollowed by severalengineImportTXDcalls onto different model IDs. All target models should show the replacement.vehicleenvmap128. Reflections should look correct from all angles.Custom Shaders
engineApplyShaderToWorldTextureto a texture that is also replaced by a script TXD. The shader should still bind to the replaced texture after conversion.Device Reset
m_FileDatabuffer is still alive. They should restore too.OnDeviceLostandOnDeviceReset. This should not crash or leak.Lifecycle
m_DefaultPoolRastersshould return to zero between cycles.CLOTHES_TEX_ID_FIRST..LASTmodel IDs should still work.engineLoadTXD, import first to a clothes ID, then import to a model ID afterward. The lazy reload and reconversion path should not get confused by clothes-path cleanup.Edge Formats
DXT1,DXT3,DXT5,A8R8G8B8,R5G6B5, andL8should convert cleanly.DXT5faces should convert and render.A8R8G8B8faces should convert and render.D3DFMT_P8should load and render normally. It should stay managed by design.autoMipMapsset in the native header should load and render normally. It should stay managed by design.Right-Sizing
RightSizeTxdin core settings and load a TXD large enough to trigger shrinking. The shrink path usesRwTexDictionaryGtaStreamReaddirectly, notReadTXD, so the shrunk output should still load through the converted path normally.OOM And Low VRAM
EvictManagedResourcesand either succeed or cleanly fall back to managed for that texture. No half-converted or broken TXDs should appear.Risks
Device reset now rebuilds script-loaded default-pool TXDs sequentially. If a server has hundreds of replacement TXDs loaded, alt-tab or display reset can cause a few seconds of stutter. That is acceptable for a rare path and avoids the permanent memory cost.
Cube conversion follows the same native flow GTA uses: same
CreateCubeTexturearguments, same union-slot pointer handling, and the same destroy path. Cube replacements are less common than 2D replacements, so extra real-world cube TXDs are useful for testing.DFFs and COLs are intentionally out of scope. COLs do not allocate D3D resources. DFF vertex and index buffers are managed too, but some are locked at runtime for animation and skinning. That is a separate, riskier change that needs its own PR and test plan.
Checklist