Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "variance"
-
Heres some research into a new LLM architecture I recently built and have had actual success with.
The idea is simple, you do the standard thing of generating random vectors for your dictionary of tokens, we'll call these numbers your 'weights'. Then, for whatever sentence you want to use as input, you generate a context embedding by looking up those tokens, and putting them into a list.
Next, you do the same for the output you want to map to, lets call it the decoder embedding.
You then loop, and generate a 'noise embedding', for each vector or individual token in the context embedding, you then subtract that token's noise value from that token's embedding value or specific weight.
You find the weight index in the weight dictionary (one entry per word or token in your token dictionary) thats closest to this embedding. You use a version of cuckoo hashing where similar values are stored near each other, and the canonical weight values are actually the key of each key:value pair in your token dictionary. When doing this you align all random numbered keys in the dictionary (a uniform sample from 0 to 1), and look at hamming distance between the context embedding+noise embedding (called the encoder embedding) versus the canonical keys, with each digit from left to right being penalized by some factor f (because numbers further left are larger magnitudes), and then penalize or reward based on the numeric closeness of any given individual digit of the encoder embedding at the same index of any given weight i.
You then substitute the canonical weight in place of this encoder embedding, look up that weights index in my earliest version, and then use that index to lookup the word|token in the token dictionary and compare it to the word at the current index of the training output to match against.
Of course by switching to the hash version the lookup is significantly faster, but I digress.
That introduces a problem.
If each input token matches one output token how do we get variable length outputs, how do we do n-to-m mappings of input and output?
One of the things I explored was using pseudo-markovian processes, where theres one node, A, with two links to itself, B, and C.
B is a transition matrix, and A holds its own state. At any given timestep, A may use either the default transition matrix (training data encoder embeddings) with B, or it may generate new ones, using C and a context window of A's prior states.
C can be used to modify A, or it can be used to as a noise embedding to modify B.
A can take on the state of both A and C or A and B. In fact we do both, and measure which is closest to the correct output during training.
What this *doesn't* do is give us variable length encodings or decodings.
So I thought a while and said, if we're using noise embeddings, why can't we use multiple?
And if we're doing multiple, what if we used a middle layer, lets call it the 'key', and took its mean
over *many* training examples, and used it to map from the variance of an input (query) to the variance and mean of
a training or inference output (value).
But how does that tell us when to stop or continue generating tokens for the output?
Posted on pastebin if you want to read the whole thing (DR wouldn't post for some reason).
In any case I wasn't sure if I was dreaming or if I was off in left field, so I went and built the damn thing, the autoencoder part, wasn't even sure I could, but I did, and it just works. I'm still scratching my head.
https://pastebin.com/xAHRhmfH33 -
After months and months of waiting for the devRant mousepad to become available again in their store ... it turns out it's going to be ducking expensive to get that item (shipping costs as much as the product itself... and it could take 6 more weeks to arrive!) Came on, 1-6 weeks ... the variance of the estimation is huge ... I have lost the motivation :(10
-
Rust should support explicit variance declarations. Explicit declarations are like the main feature of the language, variance is a critically important part of a type's public interface, and &mut-s that are never reassigned and should thus inherit the referee's variance are extremely common. If the language can't recognize this, I should be able to declare it with a single unsafe rather than constantly casting to and from 'static.3
-
!rant
So got into a small debate (actually a civil one, surprise surprise) about the final project for a class. Basically the final project involves a team of 3-4 coders making a website for an actual client that either they find or provided by the professor.
The exact point of conflict was that the work is pro bono. The student argued that the work should be paid since after all, real work, real client. My argument is that because the clients don’t exactly choose the designers (or have little to no knowledge of most of their work) there will be high variance in quality and contract work would cause more conflict if done in class.
So just wondering, what do people think about this? Logistical issues aside (earning money for technically school property/ownership and money for learning essentially)6 -
Question for the electrically minded.
I have a laptop with a 19v input.
I have a portable UPC with 2 voltage options in the range of this, I can undervolt at 16v (the laptop battery voltage) which works with a small firmware correction to ignore a board sensor, the other option is to slightly overvolt to 19.5v which I assume the laptop could handle through its input regulation.
Can anyone confirm if a .5v variance at charger is within tolerance? It would be an overvolt of 2.5%5 -
Any advice on how to deal with gatekeeping developers? How to deal with red tape?
I work with people that are resistant to code and process change. Continuous pedantic pushback on nearly anything; one raised a fuss over metrics not being satisfactory at a 5% threshold for alerting stating that 4.99% metrics variance wouldn't trigger an alert.
It's genuinely as though my coworkers are all scared of code based on the way they behave. They don't seem to code very often either.
I'm someone that codes quickly but I have to constantly write proposals for quite literally any change to the codebase. Even IF there were issues we could always rollback (and even then we have metrics, alerts, canary rollouts, feature flags, etc etc). As a quick aside, my pace isn't related to the pushback nor experience/skill level. Just affects my morale and mental heth to be blocked.
I can communicate effectively and I try to be as clear as possible in my proposals but this is absolutely driving me up the wall and killing my motivation.
This is a faang-level company and I would've expected better.
Any advice on how to best navigate this? Is this the norm???4 -
I find myself thinking that lack of boredom related to
Unfulfilled relationship quality is what is killing the world
We require interaction as humans to spawn variance in our lives4