for loops

Ranter

shinobiultra

1699

Comments

0

BigBoo

2267

8y

What language are you talking about?
0

j4cobgarby

7942

8y

Maybe threading, or GPU utilization?
0

BigBoo

2267

8y

@j4cobgarby GPU is slower for arrays than CPU though.
0

shinobiultra

1699

8y

@BigBoo Not a particular one. I just saw this as a general concept utilized from C++ to Python

@NickyBones This is what I think. The underlying code written almost in assembly gives it so much power

So pound to pound, are these libraries really only meant to simplify the code ? Andrew Ng said in one of his videos that using for loops is actually SLOWER so I'm just confused now
0

BigBoo

2267

8y

@shinobiultra Check the implementation you use. Generally it's the same speed. Writing stuff in assembly does not make things faster than c++ just because. C++ compilers are usually well optimized.

But using others implementations is nice for one reason. And it's apparent if you watch the cppcon talk about Facebookstrings.

Using others implementations can cover more use cases than you usually do on your own unless you have a really well optimized structure.

For example, it's beneficial for speed to have things on the stack. But the stack is small.
So one way to do this is to, for example. Let a string of a smaller size be allocated on the stack but strings of a longer length be allocated to heap.

This will make smaller strings faster but it does not mean that all strings would be faster.

There is no easy answer. It all comes down to specific implementation of the specific library. There might be some cases that perform better. But overall it's the same.
1

psukys

227

8y

SIMD optimizations
0

shinobiultra

1699

8y

@NickyBones Yeah that's what I meant, that matrix operations are actually faster than for loops even tho the result is the same.
1

aritzh

733

8y

Modern CPUs are able to operate on numbers bigger than 64bits. Let's say you CPU has 256bit operations. If you wanted to add two vector of four 64 bit integers each, if done manually, you'd need 4 add instructions. Vectorization libraries, however, abstract the low-level non-portable instructions that your CPU has to operate on 256 bits, which allows you to add the two vectors in a single instruction. On a perfect world, you'd get a 4x speedup in this particular case, although it is rarely that high.

Related Rants

Add Comment

question

vectorization

faster?