2
lorentz
1y

A beautiful but fundamentally flawed Rust one-liner I found in my own code during refactoring today:

iter.size_hint().1.map_or_else(Vec::new, Vec::with_capacity)

Comments
  • 0
    How about iterator -> next ...

    Or am I missing sth?
  • 1
    @IntrusionCM This preallocates space based on the iterator's upper bound if one exists, or defaults to empty. The problem is that whether the upper bound is available is fairly arbitrary and this code has vastly different performance characteristics depending on it. It's pretty much only useful if CPU throughput is very valuable but neither latency nor memory matters, because it opportunistically but unpredictably trades a potentially very large amount of memory for CPU.
  • 0
    @lorentz ah...

    Isn't this essentially against the whole idea of an iterable xD

    I don't know rust, but usually there's either a chunk or take call available to create an iterable with top of n elements.
  • 0
    @IntrusionCM The elements stay in the iterator. Rust iterators have a size_hint function which returns a lower and optional upper bound for the number of elements in the iterator. This is forwarded by iterator transformers and used by consumers for various optimizations.
  • 0
    @IntrusionCM Subsequent code uses the vector produced by this code as a stack while consuming the iterator, and the most likely use case is to push an element corresponding to each element of the iterator. There are other use cases though, and the iterator may be defined in an odd way (i.e. with filter) which could make the upper bound known but very large, wasting a lot of memory.
  • 0
    Iterators by definition do not need to know the size of the container they come from.

    This is a big code smell from rust itself IMO.

    Then again, from what seems to be your use case I don't understand why it can't be solved with a reduce...
  • 0
    @CoreFusionX The purpose of this line doesn't overlap with reduce. This just creates an empty vector. The overall strategy also uses a reduce.

    Iterators don't need to know their size, but they might, and when they do a lot of operations can be optimized, eg. by preallocating space.
  • 0
    Shouldn't that be fine though? It the upper bound is Some, I would assume that it would be somewhat reasonable, although maybe clamping it to a big enough size would probably not be a bad idea

    Something like

    Vec::with_capacity(iter.size_hint().1.unwrap_or(0) .min(128*1024*1024))
  • 0
    @12bitfloat Most of the time, yeah, but I'm worried about the case where eg. a filter_map discards the majority of elements.
  • 1
    @lorentz Ah, fair enough. Although in that case, unless you know more about the data, it would probably still be better to allocate the size hint and then do shrink_to_fit afterwards instead of having reallocate all the time
  • 1
    @12bitfloat I hadn't thought of shrinking the vector, I'll see if there's a good point to insert one.
  • 1
    Iter? I hardly knew her
  • 1
    @12bitfloat shrink won, for this recursive function and this one alone of its size in the whole app I can predict the number of allocations fairly well.
  • 1
    I kinda feel like this should be a bit more widespread to be useful, but also I can benchmark it anytime.
Add Comment