12bitfloat

1y

Sometimes I just don't know what to say anymore

I'm working on my engine and I really wanna push high triangle counts. I'm doing a pretty cool technique called visibility rendering and it's great because it kind of balances out some known causes of bad performance on GPUs (namely that pixels are always rasterized in quads, which is especially bad for small triangles)

So then I come across this post https://tellusim.com/compute-raster... which shows some fantastic results and just for the fun of it I implement it. Like not optimized or anything just a quick and dirty toy demo to see what sort of performance I can get

... I just don't know what to say. Using actual hardware accelerated rasterization, which GPUs are literally designed to be good at, I render about 37 million triangles in 3.6 ms. Eh, fine but not great. Then I implement this guys unoptimized(!) software rasterizer and I render the same scene in 0.5 ms?!

IT'S LITERALLY A COMPUTE SHADER. I rasterize the triangles manually IN SOFTWARE and write them out with 64-bit atomic image stores. HOW IS THIS FASTER THAN ACTUAL HARDWARE!???
AND BY LIKE A ORDER OF MAGNITUDE AT THAT???

Like I even tried doing some optimizations like backface cone culling on the meshlets, but doing that makes it slower. HOW. Im rendering 37 million triangles without ANY fancy tricks. No hi-z depth culling which a GPU would normally do. No backface culling which a GPU with normally do. Not even damn clipping of triangles. I render ALL of them ALL the time. At 0.5 ms

rant

wtf

wtaf

gamedev

shader

Ranter

Comments

5

12bitfloat

10775

1y

Oh and the worst thing is, even if I remove the call to the rasterize function, at which point the shader has no externally visible side effects and thus should be optimized to basically a no-op, it still takes 0.25 ms... of 0.6 ms total... without doing literally anything

Does jensen huang have a deal with the devil or something, sacrificing babies in order for this graphics cards to be able to go backwards in time and compute things before they even exist??
5

jestdotty

6507

1y

iirc GPUs now have more overhead cost to do simple functionality because they're optimized to do complicated functionality

so basically if you do something simple it goes through the pipeline to do complex things and the two have the same performance

this way they didn't have to put different pipelines in the GPU and could stuff more raw power into it for advanced games without wasting space real-estate for basic old video games functionality... basically power over adaptability
1

12bitfloat

10775

1y

@Demolishun No, it's basically what UE5's Nanite is doing. It's pretty crazy though because "compute shader all the things" has been a meme for quite a time now, but holy shit, I didn't know that "don't use the literal built-in hardware at all" was also a thing
3

12bitfloat

10775

1y

@jestdotty I feel like theres an xkcd about this.. the more features you have the slower it gets. And then some newcomer comes and is insanely fast.... until they have feature parity, at which time they are just as slow lol

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service