Skip to content

Speed up Image.blend()#9649

Open
akx wants to merge 3 commits into
python-pillow:mainfrom
akx:faster-blend
Open

Speed up Image.blend()#9649
akx wants to merge 3 commits into
python-pillow:mainfrom
akx:faster-blend

Conversation

@akx
Copy link
Copy Markdown
Contributor

@akx akx commented Jun 2, 2026

I was working on a project that uses Image.blend() and it showed up pretty red in a Viztracer timeline, so I decided to investigate a little.

Very small patch, but allows the C compiler to optimize the loops better:

  • we know ysize and linesize can't change during iteration (but the compiler can't statically know that), so hoisting those to locals
  • we know imOut is a fresh allocation and can't alias another pointer, so adding restrict to note that.
    • restrict should be available on MSVC in C99 mode without having to use the MS extension name __restrict; we'll see how it pans out in CI and adjust accordingly.

In addition, added a little fast-path if you pass in the same image object twice.

On my machine, this shows a 300% speed improvement (with blend_bench.py being a pytest-benchmark suite doing benchmark(Image.blend, im1, im2, 0.5) on two 2048x2048 RGB images).

(main) $ uv pip install -e . && uv run --no-sync pytest --benchmark-autosave blend_bench.py
[...]
Name (time in ms)        Min      Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
test_blend_perf       9.1552  11.0795  9.7906  0.2351  9.7934  0.1972      17;5  102.1389      90           1

(faster-blend) $ uv pip install -e . && uv run --no-sync pytest --benchmark-autosave blend_bench.py
[...]
Name (time in ms)        Min     Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------
test_blend_perf       2.1012  3.2935  2.2357  0.1044  2.2116  0.1040      43;7  447.2908     284           1
------------------------------------------------------------------------------------------------------------

(faster-blend) $ pytest-benchmark compare .benchmarks/Darwin-CPython-3.14-64bit/*.json --between=ops

----------------- benchmark: 1 tests, 2 sources ------------------
Name (time in ms)     0001_2d4bc04 OPS  0002_9da64fc OPS      ΔOPS
------------------------------------------------------------------
test_blend_perf               102.1389          447.2908   +337.9%
------------------------------------------------------------------

I also looked at using the integer math from AlphaComposite.c, but that only made things a little slower.

Comment thread src/libImaging/Blend.c Outdated
@akx akx requested a review from radarhere June 2, 2026 07:35
@hugovk
Copy link
Copy Markdown
Member

hugovk commented Jun 2, 2026

Please can you share blend_bench.py or a similar benchmark?

@akx
Copy link
Copy Markdown
Contributor Author

akx commented Jun 2, 2026

@hugovk Sure thing:

import pytest

from PIL import Image


@pytest.fixture(scope="module")
def images() -> tuple[Image.Image, Image.Image]:
    size = (2048, 2048)
    im1 = Image.new("RGB", size)
    im2 = Image.new("RGB", size)
    im1.frombytes(bytes(i % 256 for i in range(im1.width * im1.height * 3)))
    im2.frombytes(bytes((i * 7 + 31) % 256 for i in range(im2.width * im2.height * 3)))
    return im1, im2


def test_blend_perf(benchmark, images: tuple[Image.Image, Image.Image]) -> None:
    im1, im2 = images
    result = benchmark(Image.blend, im1, im2, 0.5)
    assert result.size == im1.size

@akx
Copy link
Copy Markdown
Contributor Author

akx commented Jun 2, 2026

Btw, if this is merged, I can take a look at other similar loops and apply the same optimizations - looks like there could be lots of easy perf pickings 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants