Details
-
AboutAAAAAAAAAAAAAAAAAAA
-
SkillsRust and other things
-
Locationhere
-
Website
Joined devRant on 12/8/2018
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
-
Man going from Rust to other languages is making me go insane
Why does no other language have a high quality, standard documentation tool!? I just want to know what classes and functions you have 😭21 -
Just had the realization that the reason why the internet is so toxic isn't really because of anonymity
It's because if you're a massive asshole to someone, that person can't punch you in the face
I mean this for real, and it's kinda counter intuitive, but the underlying threat of violence is what keeps society civil and polite22 -
The license for assets on epics new asset store is insane:
For any Content licensed to you under a Standard License, you may not:
i. attempt to reverse engineer, decompile, translate, disassemble, or derive source code or data from the Content;
HOW IN GODS NAME AM I SUPPOSED TO USE ASSETS I BOUGHT WHEN I CAN'T EVEN DERIVE FUCKING DATA FROM THEM???? Like how in hell am I supposed to load textures and meshes when the only thing I'm allowed to do is to maybe look at them
They DO know graphics cards can do sweet fuck all with a JPG texture, right? Like I /genuinely/ have to translate that thing into a proper format for it to have any use11 -
PSA: All quixel assets are completely free on the new fab store until the end of the year!
If you need any high quality models, make sure to check it out
(why does this read like an ad lol)24 -
Okay so my last idea has one big problem: I need to project vertices into a single space which encompasses an entire hemisphere. AND straight lines need to remain straight when projected.
That's not something a typical projection matrix can do. Damn. I'm thinking maybe something like octahedral projection? [1]
But I'm not sure there's an answer. Else I would have to chop up the hemisphere into parts and try rastering each tri for each view. Ugh, that sucks
[1] https://researchgate.net/figure/...5 -
New idea: Fuck raytracing for global illumination because you just need too many rays for it to converge
What if we do surfels (to keep the number of probes down and relevant to our scene) and we update the 4x4-ish sized hemisphere irradiance maps not by tracing a single ray per frame per surfel. I have a fast as shit compute shader rasterizer... What if I just raster each surfel each frame? Should be around the same number of pixels as the primary visibility so totally feasible....
Each frame just jitter the projection a bit and voila. Should have extremely high quality diffuse global illumination at well below 1 ms. Holy shit this might just work3 -
It just hit me. If you wanna achieve something great you have to dream big. If you aim for something humble, you'll just get lost in the endless land of diminishing returns where you take forever to make little progress
If you aim high, you are naturally inclined to do big strides since you feel that there is still so much left to be done
Just a thought I had working on my engine2 -
Tangent space normal maps are going to drive me insane I swear to god
Why can't you just work??? 😭😭😭 -
There's been talk that UE5's Nanite isn't actually all that efficient (sometimes slower than the alternative) and that kind of got me thinking.
You give developers very high end machines so that they can move quickly. But that doesn't always translate to lower machines. When benchmarking how would you even target lower machines in a simple way? Like for me, I have two GPUs in my system, but one is passed through to a Windows VM. I'd love to test on that GPU but it's just not feasible
All the great test results I (and others) have been seeing might just be a result of the newest cards being insanely fast in relation to cache. Is visibility rendering really faster on a few generation old card? I don't know! Nvidia MASSIVELY beefed up L2 cache on the 4000 series. Does that play a role? Maybe even a big one...2 -
My dumbass thinking it would be easy to get a string value of an exported symbol in a .so and now I'm manually parsing and applying symbol relocations3
-
Diffi-Hellman is actual magic. You can exchange keys over an unencrypted channel and end up with guarenteed unique keys, on which you can start a secure channel
Like how??17 -
PSA: The smaller the compute shader workgroups the more efficient they are, down to the wave size (32 on nvidia). Not exactly sure why, but looks like if you don't need group shared memory always have your workgroups be wave sized
Just this alone gave me a 30%+ performance increase. And combined with a few other changes got me from 50 µs to 10 µs, yay!5 -
Name one thing more fun than atomically writing values into a gpu buffer and them mysteriously vanishing into the aether immediately after the compute shader invocation
I can literally see them in the buffer using RenderDoc and then as soon as I go to the next command the buffer is completely filled with zeros again as if the values never existed
?? like how ??11 -
Visibility rendering using traditional vertex/fragment shaders does 39 million tris in about 3.6 ms
With my newest renderer I can push 314 million triangles in about 6 ms right now
And this is just visibility, factoring in material evaluation of traditional deferred it would be at least like 10x worse. Meanwhile everything expensive about materials is completely independent of geometric complexity in my renderer
Literally me rn: https://youtube.com/watch/...
(cant include image because devrant doesn't want to)7 -
Guess who has to finally swallow his pride and implement traditional deferred rendering with a traditional gbuffer even though he swore to never do that
This guy right here2 -
Nothing says "punk" and "against the system" more than having the same political world view as the WEF and Blackrock6
-
The artifical character limit in google notes is so damn annoying
They put actual effort into an anti feature with a "convert into google doc" button. Like they clearly understand one might want to type more than 50k characters, they just dont allow you to do that in notes for no reason besides fuck you
Hey google if i wanted to use google docs, i would have used google docs. Now I just have to split my notes into two because clearly im doing it wrong and google knows better >.>12 -
Since when is blender utterly unusable for meshes > 500k tris. I have 32 gig RAM and it's literally unusable. You try to do anything and it fills up your entire RAM and dies. No matter what you do
Like fucking really? I can't add a subdivision surface modifier to a mesh with 800k tris? Is that too much to ask for
I'm so fucking pissed off right now. I've already wasted an hour trying to export ANY high res model and zero luck so far. Either blender just crashes. Or the exported model doesn't contain any geometry. Or the exported model doesn't contain tangents (even though I explicitely enabled them). Or I try do enter edit mode and it crashes. And then every damn time I have to renavigate to the blender folder (because of course you can't start blender just normally, no no that doesn't work) and when blender crashes it nukes my terminal as well. And then I have to reload the stupid model. And then I have to do what I'm trying to do hoping it doesn't crash. And then it crashes anyway8 -
Just when I thought that surely a single SIMD dot product instruction must be the fastest way to calculate a damn dot product on a processor, agner fogs instruction tables come flying out, hitting me over the head telling me that manually calculating a dot product may actually be faster sometimes
Why must computers be like this?
I just came out of a bad relationship with hardware rasterization being horribly slow and now I can't even trust my processor to do things properly
This is how people develop trust issues1 -
Is writing hand-optimized SIMD code even still worth it? Thinking about writing my own little math library for my game engine but I've tried writing a hand-optimized `dot(normalize(b - a), foo) >= bar` and somehow it's actually slower than writing the same thing using a math lib which is implemented exclusively with scalar math and auto vectorized by llvm
LLVM... I kneel5 -
So I got my compute shader rasterizer working pretty well now which is great. I now also have a fallback to hardware rasterization for triangles which are a bit sussy (mostly just too large) and getting that implemented without tanking performance (gazillion threads hitting the same atomic variable at the same time) involved some tricky workgroup/subgroup hackery but I'm happy with it
Only problem... I have like 90%+ SM occupancy (which is great) but I also have 90%+ SM occupancy which means the nvidia drivers think I'm mining cryptocurrency and start bottlenecking my compute performance at random. It slowly goes up to 3x, then it slowly goes down again, then it slowly goes up again... argh
Thanks, miners 😐8 -
Sometimes I just don't know what to say anymore
I'm working on my engine and I really wanna push high triangle counts. I'm doing a pretty cool technique called visibility rendering and it's great because it kind of balances out some known causes of bad performance on GPUs (namely that pixels are always rasterized in quads, which is especially bad for small triangles)
So then I come across this post https://tellusim.com/compute-raster... which shows some fantastic results and just for the fun of it I implement it. Like not optimized or anything just a quick and dirty toy demo to see what sort of performance I can get
... I just don't know what to say. Using actual hardware accelerated rasterization, which GPUs are literally designed to be good at, I render about 37 million triangles in 3.6 ms. Eh, fine but not great. Then I implement this guys unoptimized(!) software rasterizer and I render the same scene in 0.5 ms?!
IT'S LITERALLY A COMPUTE SHADER. I rasterize the triangles manually IN SOFTWARE and write them out with 64-bit atomic image stores. HOW IS THIS FASTER THAN ACTUAL HARDWARE!???
AND BY LIKE A ORDER OF MAGNITUDE AT THAT???
Like I even tried doing some optimizations like backface cone culling on the meshlets, but doing that makes it slower. HOW. Im rendering 37 million triangles without ANY fancy tricks. No hi-z depth culling which a GPU would normally do. No backface culling which a GPU with normally do. Not even damn clipping of triangles. I render ALL of them ALL the time. At 0.5 ms7 -
Fuck you AMD for being too lazy to implement VK_EXT_fragment_shader_interlock even though your hardware supports it [1]
It's literally *the* best way to implement any sort of order independent transparency ( https://web.archive.org/web/... )
But noo, not enough people are using it so too bad. Now you just have to render transparent objects all fucked up and bad looking on AMD hardware because "we don't feel like it"
[1] https://github.com/GPUOpen-Drivers/...65 -
As an anti-aliasing nerd it's sad to say but there is really is only one* good choice left: Just use DLSS
(* and if you aren't on nvidia then sucks to be you, FSR for you I guess)4