this post was submitted on 28 Aug 2023

54 points (96.6% liked)

Programming

17374 readers

253 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 1 year ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

The Little Things: The Missing Performance in std::vector (codingnest.com)

submitted 1 year ago by MarekKnapek@programming.dev to c/programming@programming.dev

11 comments fedilink hide all child comments

top 11 comments

sorted by: hot top controversial new old

[–] MarekKnapek@programming.dev 12 points 1 year ago

std::vector::reserve + std::vector::push_back in loop is sub-optimal, because push_back needs to check for re-allocation, but that never comes.

std::vector::resize + std::vector::operator[] in loop is also sub-optimal, because resize default-initializes all elements only to be overwritten soon anyway.

This article's author suggests push_back_unchecked.

I suggest std::vector::insert with pair of random access iterators with custom dereference operator that does the "transform element" or "generate element" functionality. The standard will have resize_and_overwrite hopefully soon.

Moar discussion:

https://codingnest.com/the-little-things-the-missing-performance-in-std-vector/

https://twitter.com/horenmar_ctu/status/1695823724673466532

https://twitter.com/horenmar_ctu/status/1695331079165489161

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/jy21hgd/

https://twitter.com/basit_ayantunde/status/1644895468399337473

https://twitter.com/MarekKnapek/status/1645272474517422081

https://www.reddit.com/r/cpp/comments/cno9ep/improving_stdvector/

[–] robinm@programming.dev 6 points 1 year ago (2 children)

Couldn't this be solved by having push_back being an inline function (or at least the check on capacity being inlined and the rest of the non-trivial part being in a sub non-inline function)?

[–] kornel@programming.dev 6 points 1 year ago* (last edited 1 year ago) (1 children)

I don't know about C++, but in Rust the push is inline, and still doesn't always optimize checks away due to an annoying edge case: integer overflow. Reserving (old_len + new_len) could give you a smaller buffer than new_len. The optimizer sees it and is pedantic about it.

[–] robinm@programming.dev 1 points 1 year ago (1 children)

In C++ integer overflow is UB so this edge case cannot exist

[–] kornel@programming.dev 3 points 1 year ago (1 children)

Only signed overflow. size_t is unsigned.

[–] robinm@programming.dev 1 points 1 year ago* (last edited 1 year ago)

That's totally right but I thought you were talking about signed numbers since you said “integer overflow”. I forgot that len is usually unsigned in C++.

[–] Gladaed@feddit.de 0 points 1 year ago (1 children)

Ok, but why did you not use emplace?

[–] robinm@programming.dev 1 points 1 year ago

emplace controls the construction of the object added to the collection. It's also important but not related to the problem exposed by OP which is “how to remove the capacity check when we know statically that there is enough space”.

[–] CasualTee@beehaw.org 4 points 1 year ago (1 children)

The benchmark looks off. The msvc one may be the only one vaguely reliable. I suspect clang and GCC were able to optimize the synthetic benchmark to a little more than a loop doing additions. At 96ns for 1000 iteration, you are looking at 10G iterations per seconds. Which can only be achieved by a loop of two instructions executing at 2 inst/s on a 5GHz processor. And you will not get a 5x just for removing a highly predictable branch.

So yes, std::vector leaves performance on the table, but no more than 10~15% for trivial loops that are not that uncommon but are rarely a bottleneck.

Then you have to ask yourself, is it worth it to add yet another function that can crash your program if misused just for that 10% in a situation where they might not even matter. I mean, I know, it's c++, zero cost abstraction, yadi yada, but if you're looking for consistent performance you should have moved away from the STL already. As this post shows, your STL vendor already has a huge impact on the performance, and there are widely available options to optimize specific cases.

So I'd rather keep the STL fairly simple. Add one function to work well with generators/iterators that have a known size if you want, but adding unchecked versions of every insertion function of every STL container is not worth it IMO.

[–] potterman28wxcv@beehaw.org 1 points 1 year ago* (last edited 1 year ago) (1 children)

Then you have to ask yourself, is it worth it to add yet another function that can crash your program if misused just for that 10% in a situation where they might not even matter

C/C++ already exposes a ton of undefined behaviors: it is part of the language to give full control to the programmer. If you want a language that minimizes the number of undefined behaviors you can get into, C/C++ is not the right candidate anyway. Something like Ada or Rust is much more relevant for that.

So I would say yes, just as long as it is properly documented.

[–] CasualTee@beehaw.org 3 points 1 year ago

I disagree. The question is not really "should we give programmer more power at the cost of yet another UB" but more "should we grow the API and add another UB for the select few for whom it might matter". When you consider choices made on other parts of the STL, such as std::unordered_map, then you realize the STL is not about being the most performant things around, but rather a collection of reliable tools covering basic usage for 80% of the user base.

With that in mind, I am against adding yet another function, which has its pitfalls, for minimal benefits. Again, such a function would be made almost entirely obsolete by a safe function that works with iterators/generators of known sizes. So I see even less benefit in adding a function that will just become yet another liability down the line.