devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "variance"

8

Wisecrack

9365

112d

Adaptive Latent Hypersurfaces

The idea is rather than adjusting embedding latents, we learn a model that takes
the context tokens as input, and generates an efficient adapter or transform of the latents,
so when the latents are grabbed for that same input, they produce outputs with much lower perplexity and loss.

This can be trained autoregressively.
This is similar in some respects to hypernetworks, but applied to embeddings.

The thinking is we shouldn't change latents directly, because any given vector will general be orthogonal to any other, and changing the latents introduces variance for some subset of other inputs over some distribution that is partially or fully out-of-distribution to the current training and verification data sets, thus ultimately leading to a plateau in loss-drop.

Therefore, by autoregressively taking an input, and learning a model that produces a transform on the latents of a token dictionary, we can avoid this ossification of global minima, by finding hypersurfaces that adapt the embeddings, rather than changing them directly.

The result is a network that essentially acts a a compressor of all relevant use cases, without leading to overfitting on in-distribution data and underfitting on out-of-distribution data.

random machine learning

13
6

Wisecrack

9365

308d

Heres some research into a new LLM architecture I recently built and have had actual success with.

The idea is simple, you do the standard thing of generating random vectors for your dictionary of tokens, we'll call these numbers your 'weights'. Then, for whatever sentence you want to use as input, you generate a context embedding by looking up those tokens, and putting them into a list.

Next, you do the same for the output you want to map to, lets call it the decoder embedding.

You then loop, and generate a 'noise embedding', for each vector or individual token in the context embedding, you then subtract that token's noise value from that token's embedding value or specific weight.

You find the weight index in the weight dictionary (one entry per word or token in your token dictionary) thats closest to this embedding. You use a version of cuckoo hashing where similar values are stored near each other, and the canonical weight values are actually the key of each key:value pair in your token dictionary. When doing this you align all random numbered keys in the dictionary (a uniform sample from 0 to 1), and look at hamming distance between the context embedding+noise embedding (called the encoder embedding) versus the canonical keys, with each digit from left to right being penalized by some factor f (because numbers further left are larger magnitudes), and then penalize or reward based on the numeric closeness of any given individual digit of the encoder embedding at the same index of any given weight i.

You then substitute the canonical weight in place of this encoder embedding, look up that weights index in my earliest version, and then use that index to lookup the word|token in the token dictionary and compare it to the word at the current index of the training output to match against.

Of course by switching to the hash version the lookup is significantly faster, but I digress.

That introduces a problem.
If each input token matches one output token how do we get variable length outputs, how do we do n-to-m mappings of input and output?

One of the things I explored was using pseudo-markovian processes, where theres one node, A, with two links to itself, B, and C.
B is a transition matrix, and A holds its own state. At any given timestep, A may use either the default transition matrix (training data encoder embeddings) with B, or it may generate new ones, using C and a context window of A's prior states.

C can be used to modify A, or it can be used to as a noise embedding to modify B.

A can take on the state of both A and C or A and B. In fact we do both, and measure which is closest to the correct output during training.

What this *doesn't* do is give us variable length encodings or decodings.

So I thought a while and said, if we're using noise embeddings, why can't we use multiple?

And if we're doing multiple, what if we used a middle layer, lets call it the 'key', and took its mean
over *many* training examples, and used it to map from the variance of an input (query) to the variance and mean of
a training or inference output (value).

But how does that tell us when to stop or continue generating tokens for the output?

Posted on pastebin if you want to read the whole thing (DR wouldn't post for some reason).

In any case I wasn't sure if I was dreaming or if I was off in left field, so I went and built the damn thing, the autoencoder part, wasn't even sure I could, but I did, and it just works. I'm still scratching my head.

https://pastebin.com/xAHRhmfH

random llm machine learning

33
5

embeddedmaikel

336

6y

After months and months of waiting for the devRant mousepad to become available again in their store ... it turns out it's going to be ducking expensive to get that item (shipping costs as much as the product itself... and it could take 6 more weeks to arrive!) Came on, 1-6 weeks ... the variance of the estimation is huge ... I have lost the motivation :(

devrant

8
5

nikmanG

1521

7y

!rant

So got into a small debate (actually a civil one, surprise surprise) about the final project for a class. Basically the final project involves a team of 3-4 coders making a website for an actual client that either they find or provided by the professor.
The exact point of conflict was that the work is pro bono. The student argued that the work should be paid since after all, real work, real client. My argument is that because the clients don’t exactly choose the designers (or have little to no knowledge of most of their work) there will be high variance in quality and contract work would cause more conflict if done in class.

So just wondering, what do people think about this? Logistical issues aside (earning money for technically school property/ownership and money for learning essentially)

rant university paid web development

6
4

lorentz

15750

1y

Rust should support explicit variance declarations. Explicit declarations are like the main feature of the language, variance is a critically important part of a type's public interface, and &mut-s that are never reassigned and should thus inherit the referee's variance are extremely common. If the language can't recognize this, I should be able to declare it with a single unsafe rather than constantly casting to and from 'static.

rant variance inference bullshit rust

3
3

seraphimsystems

3728

7y

Question for the electrically minded.

I have a laptop with a 19v input.

I have a portable UPC with 2 voltage options in the range of this, I can undervolt at 16v (the laptop battery voltage) which works with a small firmware correction to ignore a board sensor, the other option is to slightly overvolt to 19.5v which I assume the laptop could handle through its input regulation.

Can anyone confirm if a .5v variance at charger is within tolerance? It would be an overvolt of 2.5%

rant electrical engineering hardware power

5
3

fffrrraaannnkkk

88

3y

Any advice on how to deal with gatekeeping developers? How to deal with red tape?

I work with people that are resistant to code and process change. Continuous pedantic pushback on nearly anything; one raised a fuss over metrics not being satisfactory at a 5% threshold for alerting stating that 4.99% metrics variance wouldn't trigger an alert.

It's genuinely as though my coworkers are all scared of code based on the way they behave. They don't seem to code very often either.

I'm someone that codes quickly but I have to constantly write proposals for quite literally any change to the codebase. Even IF there were issues we could always rollback (and even then we have metrics, alerts, canary rollouts, feature flags, etc etc). As a quick aside, my pace isn't related to the pushback nor experience/skill level. Just affects my morale and mental heth to be blocked.

I can communicate effectively and I try to be as clear as possible in my proposals but this is absolutely driving me up the wall and killing my motivation.

This is a faang-level company and I would've expected better.

Any advice on how to best navigate this? Is this the norm???

question faang questions motivation fear gatekeeping developer advice help

4
2

AvatarOfKaine

3902

4y

I find myself thinking that lack of boredom related to
Unfulfilled relationship quality is what is killing the world
We require interaction as humans to spawn variance in our lives

random

4
0

AvatarOfKaine

3902

3y

No variance detected
Defaulting

BRAWNDO HAS WHAT PLANTS NEED GODDAMN IT !

random

Top Tags

rant linux code windows fuck i java c programming android dev the is javascript js a life joke python

Weekly Rant

Most unrealistic deadline you've had?

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service