Ranter
Join devRant
Do all the things like
				++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
				Sign Up
			Pipeless API
 
				From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
				Learn More
			Comments
		
- 
				
				@BigBoo Not a particular one. I just saw this as a general concept utilized from C++ to Python
 
 @NickyBones This is what I think. The underlying code written almost in assembly gives it so much power
 
 So pound to pound, are these libraries really only meant to simplify the code ? Andrew Ng said in one of his videos that using for loops is actually SLOWER so I'm just confused now
- 
				
				 BigBoo22677y@shinobiultra Check the implementation you use. Generally it's the same speed. Writing stuff in assembly does not make things faster than c++ just because. C++ compilers are usually well optimized. BigBoo22677y@shinobiultra Check the implementation you use. Generally it's the same speed. Writing stuff in assembly does not make things faster than c++ just because. C++ compilers are usually well optimized.
 
 But using others implementations is nice for one reason. And it's apparent if you watch the cppcon talk about Facebookstrings.
 
 Using others implementations can cover more use cases than you usually do on your own unless you have a really well optimized structure.
 
 For example, it's beneficial for speed to have things on the stack. But the stack is small.
 So one way to do this is to, for example. Let a string of a smaller size be allocated on the stack but strings of a longer length be allocated to heap.
 
 This will make smaller strings faster but it does not mean that all strings would be faster.
 
 There is no easy answer. It all comes down to specific implementation of the specific library. There might be some cases that perform better. But overall it's the same.
- 
				
				@NickyBones Yeah that's what I meant, that matrix operations are actually faster than for loops even tho the result is the same.
- 
				
				 aritzh7337yModern CPUs are able to operate on numbers bigger than 64bits. Let's say you CPU has 256bit operations. If you wanted to add two vector of four 64 bit integers each, if done manually, you'd need 4 add instructions. Vectorization libraries, however, abstract the low-level non-portable instructions that your CPU has to operate on 256 bits, which allows you to add the two vectors in a single instruction. On a perfect world, you'd get a 4x speedup in this particular case, although it is rarely that high. aritzh7337yModern CPUs are able to operate on numbers bigger than 64bits. Let's say you CPU has 256bit operations. If you wanted to add two vector of four 64 bit integers each, if done manually, you'd need 4 add instructions. Vectorization libraries, however, abstract the low-level non-portable instructions that your CPU has to operate on 256 bits, which allows you to add the two vectors in a single instruction. On a perfect world, you'd get a 4x speedup in this particular case, although it is rarely that high.
Related Rants






Why is vectorization library faster than hand-written for loops ? I mean, somewhere down the line, the matrices/vectors must be multiplied (or any other operation) and thus be one-by-one (for loop??) calculated and stored.
Why is it then faster to use these libraries than just manually writing for loops all over the place ?
I guess some low level magic (OpenBLAS ?) goes on there but I just don't see it..
P.S. [Would have posted it on stack overflow but I'd be ripped apart so I'm pioneering new ways]
question
for loops
vectorization
faster?