Top reasons to be a dev?

Ranter

OmerFlame

3652

Comments

1

RememberMe

13617

5y

You considering a job in high performance computing by any chance? Because that peak performance bit if taken literally is exactly what HPC is about
And it's a ton of fun
1

OmerFlame

3652

5y

@RememberMe it's a *FASCINATING* topic to say the least, though I don't know if that's my general direction.

I will make the big choice later in my life, I still have a few years and I would like to savor it :D

I like being a teen.
0

RememberMe

13617

5y

@OmerFlame fair enough, have fun :p

If you ever feel a compulsion to suffer for performance, as my advisor likes to say, do come to uni for computer engineering. These bois get pretty extreme when talking about performance.

Just to put it into perspective with other kinds of dev, this is a field in which DRAM access is considered slow as molasses and to be strictly avoided unless necessary, and when needed only done using a DMA to prefetch large chunks in a carefully controlled access pattern so that it maximises row buffer hits and bank parallelism. SSDs give these guys nightmares and network latency is the first sign of madness.

Fun stuff.
1

OmerFlame

3652

5y

@RememberMe WHA
HOW CAN RAM BE SLOW?

Well, if you’re comparing it to on-die CPU cache, well then you have a point.
1

RememberMe

13617

5y

@OmerFlame yup, caches are where it's at. RAM can take up to hundreds of CPU cycles to access, caches are 1-tens. You really really twist your code around to make sure you're using caches properly, because that's like 50% or more of your CPU die area and the speedup is enormous. On fancier devices like FPGAs, that's replaced by on-chip block RAMs that you need to manage manually (think manual caches), you have full control but that also means you need to do it properly. On GPUs, you need to make sure all your computation stays in local memory that's accessible to threads directly, and only access VRAM in long, synchronised loads that bring in a ton of data at once. On special accelerators it can vary depending on the architecture of the device.

Then there's exploiting superscalar out of order execution in modern CPUs - they have many functional units with deep-ish pipelines and have WAY more registers than you think they do or than the ISA (x86, armv8, risc-v etc. standards) says they do. Keeping that execution engine fed at more than 2 instructions per clock (the average parallelism in everyday code) is a whole challenge in itself, especially when your arithmetic intensity (number of arithmetic ops done per memory op basically) is low, because memory is slow latency, high throughput.

If you're interested, check out something simple like tiled matrix multiplication (importantly, *why* it's much faster)
1

OmerFlame

3652

5y

@RememberMe so this is a whole rabbit hole.

Damn.

Related Rants

Add Comment

rant

wk232

honestly

idk