2

Currently working on:

Conversion of existing models to full closed form fusion representations. Built a toy example with just the dense layers, got it up to 65m parameters with minimal accuracy loss and almost 80% reduction in memory requirements.

Figured out how to also handle convolution layers in this new representation, and am figuring out skip connections this week so I can implement an example of the full closed form variety. All the math checks out.

Full CFF should let me reduce memory requirements anywhere from ten times to fifty times whats typical, as well as similar improvements in inference times and compute.

The partial implementation also still allows for training.

I'm working toward a demonstration of GPT-2 training and running on a consumer desktop, without quantization.

Whether it will work out, who knows¯\_(ツ)_/¯.

But if it does, we're greenlight on the research lab being launched in late 2026. And when it does, I'll have to steal a couple of you as 'entourage'/VIP hires because devranters are really the only group of people I like to hang out and do things with anymore.

Comments
  • 3
    Fuck, I need a job now. lol

    Great job bro!
  • 3
    @YourMom "Fuck, I need a job now. lol"

    You're speaking the truth!

    I've had to do a bunch of shit side gigs to raise any money at all for this and am still doing as much because no one would fund it or give me the time of day.

    They're all gonna wish they had.
  • 1
    Honestly, if anyone is going to figure out AGI, it will be @Wisecrack.
  • 3
    That's amazing stuff!

    By the way, to keep up with mentions to you on dR, please consider using the tool that @SoldierOfCode wrote. It is a notification system that works perfectly (on at least Linux, Windows he was working on). But I think you're too cool for Windows probably.

    Else, consider this every 5 minutes updated html file: https://static.molodetz.nl/dr.menti...

    If you are using iPhone, consider to use joyRant from @lensflare which still has working mentioning system. The notifications are just broke.

    All these systems use a different way of noticing mentions than the broken devRant notification system.

    I'm telling you now because you're not a super frequent user anymore, and so you can keep track :) Friendly advi(s/c?)e.
  • 2
    @YourMom one can dream. I could explain it or post the sketch proof of how an exact replication of the output of an amortized beam search through a reuse of a common metric, gives us a much more generalist framework, but in the first case it just sounds like handwavium, and in the second case, people's eyes would glaze over, so its a lose-lose proposition.
  • 2
    @retoor thank you, and its good to see you're still around retoor.

    I'm not super frequent any more because I'm pulling 16-20 hour days in 3-4 day stretches.

    No one told me being self-employed was a synonym for torture.

    Literally this is the shit on my plate:

    - An FHE based compute platform, with improvements to the underlying math and methods, solving both the clock-reset issue and re-imaging issue that allows piracy on some software, and improving the noise propagation issues

    - To run unpirateable SOTA commercially-sized ML stateless CFF models on a fraction of existing hardware with built in limits on inference counts based on licensing

    - To implement on closed-platform custom ASICs for secure uses like finance and military

    - Actual cryptography improvements after I changed my approach (we're running the same complexity class as GNFS now) to 1, secure all this against quantum improvements, 2. win RSA prize money for bootstrapping.

    What is this myth called sleep?
  • 1
    That's really cool :D

    Is it going fast enough that bandwidth is becoming a problem?
  • 1
    Nice, attention to the AI bubble, it can explode (right inside your head).
  • 0
    @BordedDev it is pretty cool, thanks, and i'm glad you enjoyed it.

    Bandwidth isn't a problem yet.

    bigger problem is, instead of training from scratch, as I scale up, I have to transfer existing models because I don't have the compute to train larger ones at this time.

    Which means doing ablations and model surgery if I want very neutral models. And while I'm familiar with fine tuning I'm new to ablations.

    I'm implementing techniques that normalize scores across tasks, in order to stabilize loss cross-task rather than having loss being task specific, because that makes it easier to study what network components contribute to performance overall versus per-task.
  • 0
    @afaIk actually a big part of it is democratizing models.

    Theres a small vocal minority that really fucking hate AI, trying to project that voice to seem larger than it is, but a lot of the criticism coming out of that crowd is valid concerns and real worries.

    The thinking and approach is two fold:

    1. by creating models that are small but can compete with cutting edge commercial-scale SOTA offerings, anyone can run them on their desktops or laptops, defeating the corporate moat and popping the bubble

    2. at the same time I build highly performant, technically-enforced licensing for big, black box models, that guarantee non-replicability, that are non-transferable outside their environment, with guarantees about cost, compute, memory, and latency, and strong privacy guarantees. Pharmaceutical companies, military, finance, and .gov won't be able to resist.

    That second group then lose to the open market through Embrace, Extend, Extinguish.

    It's using evil for good.
Add Comment