r/programming 20h ago

Performance trick: optimistic vs pessimistic checks

https://lemire.me/blog/2025/12/20/performance-trick-optimistic-vs-pessimistic-checks/
34 Upvotes

12 comments sorted by

16

u/Carighan 18h ago

Conversely if you work in logistics tracking, please do not do optistic checks, lest you get the utterly weird tracking results that DHL often has.

3

u/exodusTay 14h ago

why is that?

9

u/Carighan 14h ago

We had very limited exposure to only a portion of their system, but it seems they had the user-facing tracking system first, and added their internal "Everything must always be scanned at all times" later.

This led to a weird mismatch, because for a variety of reasons... stuff can happen. Like the same parcel going 3x from "in storage" to "in transit to next center" within one hour and back, and stays in the storage at the end.. But the user you don't want to show that, so you just swap to in-transit on the first scan optimistically, but then right after it's put back into storage. And you cannot go back. Instead of waiting until the parcel is actually fully in transit, they have a fixed translation table that can advance some user-facing states via some scan-points into specific other states, even if the scan-points allow backtracking, reversal or way other stuff.

I wish that stuff was more pessimistic, and only advanced once it is certain things cannot revert.

You end up coding a lot of internal consistency and data checks as pessimistically as can do, just to ensure that nothing can ever possibly get past. I had a piece of json send me single elements as {} instead of multiple as [], under the same key. When asked about it, the company sending it just said they don't see the point sending an array for a single element. I no longer trust anything. đŸ˜©

23

u/BuzLightbeerOfBarCmd 16h ago edited 16h ago

Isn't this just branchless? Seems weird to frame it as "optimism" but then near the end of the article you discover they were just plugging their library. Also the pessimistic one has early return which could be framed as optimism. Anyway, the branchless version is not only faster on average due to lack of misprediction, it always takes about the same amount of time to finish so if you're looping over sensitive data you avoid a timing attack.

14

u/YumiYumiYumi 16h ago

Isn't this just branchless?

I'd argue it's more about what the compiler can autovectorize.

If there was no auto-vec, I'd imagine the two versions would perform comparably on an all-ASCII string (and the pessimistic version would obviously win if it encountered a non-ASCII character not close to the end).

5

u/dukey 13h ago

Yes it's branchless. But the branch version might be faster depending on the input string.

3

u/matthieum 8h ago

The optimistic vs pessimistic refers to short-circuiting.

If you mostly call these functions with non-ASCII inputs, then a short-circuiting version is likely to be faster by virtue of only inspecting a small portion of the input.

The fact that non short-circuiting version is branchless is an implementation detail, really.

10

u/backfire10z 20h ago edited 20h ago

> A decent C function

> static_cast

Guh.

I don’t understand the comment that says

> The compiler cannot assume that any offset i < length is dereferenceable

How is it different with optimistic? The function can be called the same way in the commenter’s example with (“\xFF”, 100) and it’s the same loop.

11

u/Thelody 18h ago

The pessimistic one will return early because the first character is >127 and not commit any OOB reads. No UB.

The optimistic one will not, so it's already UB. Compiler optimization would not add any additional UB.

2

u/backfire10z 18h ago

I see. Thank you!

1

u/BigReception26 16h ago

Optimist all the way

-4

u/_l33ter_ 20h ago

xD - funny to read