Details
-
AboutInstalling skynet. Please wait...
-
SkillsUI Design (3 years), Javascript, Python, and levels of shitposting that aren't even supposed to be possible.
-
Location28.5° N, 80.63° W
Joined devRant on 5/5/2019
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
-
I discussed using page-rank for ML a while back here - https://devrant.com/rants/11237909/...
I talk about something vaguely similar in "scoring the matches" here though - https://pastebin.com/YwjCMvRp
Incidentally the machine learning community finally caught up and did something similar on a RAG
https://news.ycombinator.com/item/... -
If you're using random in python, and need arbitrary precision, use mpmath.
If you're using it with the decimal module, it doesn't automatically convert just so you know.
Instead convert the output of arb_uniform to a string before passing to the decimal module.3 -
Turns out you can treat a a function mapping parameters to outputs as a product that acts as a *scaling* of continuous inputs to outputs, and that this sits somewhere between neural nets and regression trees.
Well thats what I did, and the MAE (or error) of this works out to about ~0.5%, half a percentage point. Did training and a little validation, but the training set is only 2.5k samples, so it may just be overfitting.
The idea is you have X, y, and z.
z is your parameters. And for every row in y, you have an entry in z. You then try to find a set of z such that the product, multiplied by the value of yi, yields the corresponding value at Xi.
Naturally I gave it the ridiculous name of a 'zcombiner'.
Well, fucking turns out, this beautiful bastard of a paper just dropped in my lap, and its been around since 2020:
https://mimuw.edu.pl/~bojan/papers/...
which does the exact god damn thing.
I mean they did't realize it applies to ML, but its the same fucking math I did.
z is the monoid that finds some identity that creates an isomorphism between all the elements of all the rows of y, and all the elements of all the indexes of X.
And I just got to say it feels good. -
To build a good game follow the opposite advice for building a successful SaaS product:
https://slimsaas.com/blog/...6 -
I FINALLY comprehend list comprehensions.
I can write an unlimited amount of nested loops on a single line and make other less experienced people hate me for fun and profit.
Also learned about map() #I hate it#, zip(which is awesome), and the utility of lambdas (they're okay).
Enumerate is pretty nifty too, only thing I lose is setting the initial value of the iterator index.15 -
Wherein I disprove Goldbachs Conjecture (in one specific case)
golbach conjecture:
every even number is the sum of two primes
lets call the primes p and q
lets call our even number p+q=n
we can go further by establishing two additional variables
u=p-1, v=q-1
therefore every even number is the sum of u+v+2, according to goldbach's own reasoning.
in the simplest case...
p=2, q=2, p+q=4
u=1, v=1, u+v+2 = 4
We can therefore make a further conjecture in the simplest case every sum of two primes, less 2, is the sum of two composites. This likely has connections to the abc conjecture for a variety of reasons. But leaving ancillary discussion aside for a moment...
We can generalize to a statement that every even number is the sum of two odd numbers. And every odd number greater than 1 is the sum of an odd number and an even number.
Finding an even number that is not the sum of (p-1)+(q-1) would therefore be equivalent to disproving the goldbach conjecture. Likewise proving every even number is the sum of (p-1)+(q-1) would be the equivalent of proving it.
Proving all even numbers greater than 2 are the sum of two composites + 2 would be proof of goldbachs conjecture, and finding any example or an equation that proves an example exists such that *some* subset of even numbers are NOT the sum of two composites +2, would disprove the conjecture.
Lets start with a simple example:
2+2=4
because 4-2=2, and two is not the sum of two composite numbers goldbachs conjecture must ipso facto be false.
QED
If I've wildly misapprehended the math, please, somebody who is better at it, correct me.
Honestly if this is actually anything, I'd be floored to discover no one has stumbled on this line of reasoning before.8 -
Apparently inverse sigmoid is how logits are calculated.
Here I am reinventing the fucking wheel.17 -
https://simulator.io/board
Lets you place clocks, full and half adders, D latches, RS and JK flip-flops, shift registers, demultiplexers, multiplexers, and decoders, as well as all the standard gates. It also has buttons, switches, and individual LEDs.
Pretty close to what I would make myself.6 -
So I realized if done correctly, an autoencoder is really just a bootleg token dictionary.
If we take some input, and pass it through a custom hashfunction that strictly produces hashes with only digits as output, then we can train a network, store the weights and biases, and then train a decoder on top of that.
Using random drop out on the input-output pairs, we can do distillation of the weights and biases to find subgraphs that further condense this embedding.
Why have a token dictionary at all?10 -
research 10.09.2024
I successfully wrote a model verifier for xor. So now I know it is in fact working, and the thing is doing what was previously deemed impossible, calculating xor on a single hidden layer.
Also made it generalized, so I can verify it for any type of binary function.
The next step would be to see if I can either train for combinations of logical operators (or+xor, and+not, or+not, xor+and+..., etc) or chain the verifiers.
If I can it means I can train models that perform combinations of logical operations with only one hidden layer.
Also wrote a version that can sum a binary vector every time but I still have
to write a verification table for that.
If chaining verifiers or training a model to perform compound functions of multiple operations is possible, I want to see about writing models that can do neighborhood max pooling themselves in the hidden layer, or other nontrivial operations.
Lastly I need to adapt the algorithm to work with values other than binary, so that means divorcing the clamp function from the entire system. In fact I want to turn the clamp and activation into a type of bias, so a network
that can learn to do binary operations can also automatically learn to do non-binary functions as well.10 -
Had an individual financial advisor worth 7-8 figures and with hundreds of thousands of followers, spontaneously follow me on twitter and start a conversation. He only follows a few hundred others.
Is this what it is like to meet a celebrity?
What does this person even want?
I don't know whether to be annoyed or flattered.
We're in completely different financial classes, and have nothing in common other than being trapped like rats in a cage by our own circumstances.13 -
by simply making the bias random on the second input for a two bit binary input during activation calculation, it's possible to train a neural net to calculate the XOR function in one layer.
I know for a fact. I just did it.16 -
Let me arrogantly brag for a moment, and let us never forget
that I front-ran GPT's o1 development by more than a week, posted
here:
https://devrant.com/rants/11257717/...
And I know what their next big development will be too. I just haven't shared it yet because it blows backpropagation out of the fucking water.
I may not be super competent at anything but I'm a god damn autistic accidental oracle when it comes to knowing what comes next in the industry.
relevant youtube video and screenshot:
https://youtu.be/6xlPJiNpCVw/...9 -
This shit is fascinating, especially reading about the variations in function of the various brodmann areas:
https://en.wikipedia.org/wiki/...
My favorite schizo-interpretation of this is "your head is full of bees."
i.e. brodmann area 10, which is thought to be responsible for memory recall strategies (basically an adaptive memory allocator and heap) has about 250 million neurons in a pinprick of a volume.
A bee has about 1 million neurons.
In otherwords: the part of your brain that decides how memory is managed has only the equivalent brainpower of 250 bees, lol.
Obviously a simplification-to-the-level-of-absurdity but it's fun to intentionally interpret something to the level of distortion.11 -
Chinese remainder theorem
So the idea is that a partial or zero knowledge proof is used for not just encryption but also for a sort of distributed ledger or proof-of-membership, in addition to being used to add new members where additional layers of distributive proofs are at it, so that rollbacks can be performed on a network to remove members or revoke content.
Data is NOT automatically distributed throughout a network, rather sharing is the equivalent of replicating and syncing data to your instance.
Therefore if you don't like something on a network or think it's a liability (hate speech for the left, violent content for the right for example), the degree to which it is not shared is the degree to which it is censored.
By automatically not showing images posted by people you're subscribed to or following, infiltrators or state level actors who post things like calls to terrorism or csam to open platforms in order to justify shutting down platforms they don't control, are cut off at the knees. Their may also be a case for tools built on AI that automatically determine if something like a thumbnail should be censored or give the user an NSFW warning before clicking a link that may appear innocuous but is actually malicious.
Server nodes may be virtual in that they are merely a graph of people connected in a group by each person in the group having a piece of a shared key.
Because Chinese remainder theorem only requires a subset of all the info in the original key it also Acts as a voting mechanism to decide whether a piece of content is allowed to be synced to an entire group or remain permanently.
Data that hasn't been verified yet may go into a case for a given cluster of users who are mutually subscribed or following in a small world graph, but at the same time it doesn't get shared out of that subgraph in may expire if enough users don't hit a like button or a retain button or a share or "verify" button.
The algorithm here then is no algorithm at all but merely the natural association process between people and their likes and dislikes directly affecting the outcome of what they see via that process of association to begin with.
We can even go so far as to dog food content that's already been synced to a graph into evolutions of the existing key such that the retention of new generations of key, dependent on the previous key, also act as a store of the data that's been synced to the members of the node.
Therefore remember that continually post content that doesn't get verified slowly falls out of the node such that eventually their content becomes merely temporary in the cases or index of the node members, driving index and node subgraph membership in an organic and natural process based purely on affiliation and identification.
Here I've sort of butchered the idea of the Chinese remainder theorem in shoehorned it into the idea of zero knowledge proofs but you can see where I'm going with this if you squint at the idea mentally and look at it at just the right angle.
The big idea was to remove the influence of centralized algorithms to begin with, and implement mechanisms such that third-party organizations that exist to discredit or shut down small platforms are hindered by the design of the platform itself.
I think if you look over the ideas here you'll see that's what the general design thrust achieves or could achieve if implemented into a platform.
The addition of indexes in a node or "server" or "room" (being a set of users mutually subscribed to a particular tag or topic or each other), where the index is an index of text audio videos and other media including user posts that are available on the given node, in the index being titled but blind links (no pictures/media, or media verified as safe through an automatic tool) would also be useful.12 -
My own little version of moore's law:
In 1986 the connectome (the brain) of c. elegans, a small worm, was mapped. It would take decades before the research caught up to the point where we had the hardware to simulate it.
In 2024, we have successfully mapped, and fully simulated (to matching observed behavioral data) the brain of a fruit fly, a total of 139,255 neurons and corresponding connections.
Thats a 38 year period.
If the period is roughly 40 years, and the leap in successful neurons mapped *and simulated* is by an average of 461 times the prior number of neurons, then by 2062-2064 we will be simulating box jellyfish, fruit flys, zebrafish, bees, ants, honey bees, cockroachs, coconut crabs, geckos, guppys, sand lizards, snakes, skinks, toirtoises, frogs, iguanas, shrews, bats, and even moles.
By the dozens or hundreds in any given simulation.
By the year 2100-2104 we'll be fully simulating the brains of mice, quill, crocodiles, birds such as doves, rats, zebra finchs,
guinea pigs, lemurs, ducks, ferrets, cockatiels, squirrels, mongoose, prairie dogs, rabbits, octopi, house cats, buzzards, parakeets, grey parrots, snowy owls, racoons, and even domestic pigs.
And in the years between 2100 to 2140, starting immediately with domestic dogs, we will ramp up and end with the capacity to simulate human brains in full, probably by the dozens or hundreds.
This assumes we can break the quantum barrier of course.20 -
Wherein I bait reddit into over 70k views with some bullshit about AI:
https://reddit.com/r/Futurology/...
I almost wonder if their viewcount/usercount is real.
The model is real, my consultant background is not.5 -
A kind of verbose discussion of my earliest ideas and discussion with Nous LLM (Claude) about my new NAS/CL LLM model:
https://pastebin.com/YwjCMvRp2 -
You know the old adage "cost, speed, quality, pick two."
I've come up with my own.
Shitty jobs you'll forever be stuck in with no way out: unreasonable demands and coworkers that drive you insane, pay below the poverty line, no sleep.
Pick two.
More likely, pick three.6 -
Some notes from prior to developing my current language model:
https://miro.com/app/board/...
Started with ngrams, moved on from that, and the whole thing got away from me fast.
Working on building and training it on rgb-to-color categorization this week. Experiments designed just gotta implement it now.1 -
Remember my LLM post about 'ephemeral' tokens that aren't visible but change how tokens are generated?
Now GPT has them in the form of 'hidden reasoning' tokens:
https://simonwillison.net/2024/Sep/...
Something I came up with a year prior and put in my new black book, and they just got to the idea a week after I posted it publicly.
Just wanted to brag a bit. Someone at OpenAI has the same general vision I do.15 -
So apparently I own land in dubai. Like three separate mortgages based on the email I received.
Your request (Mortgage Registration)
with request number xxxxx / 2024
has been completed
and you can print your issued certificate from this [link]
I've stripped out the numbers and link.
After confirming it was safe I followed through on a old spare cellphone, and yep, I own three mortgages for properties in dubai.
Except obviously I don't.
Someone used my name, an american, to register mortgages in dubai. *Nice* properties according to the pictures.
What started out as a scam email, or what looked like a scam email, went to an actual government of dubai website, with real mortgage registrations.
How in the fuck does that happen?
The only thing I can think of is someone committed identity fraud, and/or an alphabet agency went through the list of known political dissidents, set up a bullshit mortgage in a questionable territory, and are now using that as a pretext to monitor 'extremists with foreign ties.'
All that for some guy on the west coast that hasn't attended a political rally in his entire life.
Must have been that sign I held at sixteen years old by the side of the road that said "bush lied us into a war, and people died."
or maybe it was that time I told a really enthusiastic obama supporting police officer that it amazed me obama had time to win the nobel peace prize what with all the bombings he carried out against foreign civilians.8 -
After a lot of work I figured out how to build the graph component of my LLM. Figured out the basic architecture, how to connect it in, and how to train it. The design and how-to is 100%.
Ironically generating the embeddings is slower than I expect the training itself to take.
A few extensions of the design will also allow bootstrapped and transfer learning, and as a reach, unsupervised learning but I still need to work out the fine details on that.
Right now because of the design of the embeddings (different from standard transformers in a key aspect), they're slow. Like 10 tokens per minute on an i5 (python, no multithreading, no optimization at all, no training on gpu). I've came up with a modification that takes the token embeddings and turns them into hash keys, which should be significantly faster for a variety of reasons. Essentially I generate a tree of all weights, where the parent nodes are the mean of their immediate child nodes, split the tree on lesser-than-greater-than values, and then convert the node values to keys in a hashmap to make lookup very fast.
Weight comparison can be done either directly through tree traversal, or using normalized hamming distance between parent/child weight keys and the lookup weight.
That last bit is designed already and just needs implemented but it is completely doable.
The design itself is 100% attention free incidentally.
I'm outlining the step by step, only the essentials to train a word boundary detector, noun detector, verb detector, as I already considered prior. But now I'm actually able to implement it.
The hard part was figuring out the *graph* part of the model, not the NN part (if you could even call it an NN, which it doesn't fit the definition of, but I don't know what else to call it). Determining what the design would look like, the necessary graph token types, what function they should have, *how* they use the context, how thats calculated, how loss is to be calculated, and how to train it.
I'm happy to report all that is now settled.
I'm hoping to get more work done on it on my day off, but thats seven days away, 9-10 hour shifts, working fucking BurgerKing and all I want to do is program.
And all because no one takes me seriously due to not having a degree.
Fucking aye. What is life.
If I had a laptop and insurance and taxes weren't a thing, I'd go live in my car and code in a fucking mcdonalds or a park all day and not have to give a shit about any of these other externalities like earning minimum wage to pay 25% of it in rent a month and 20% in taxes and other government bullshit.4 -
Heres some research into a new LLM architecture I recently built and have had actual success with.
The idea is simple, you do the standard thing of generating random vectors for your dictionary of tokens, we'll call these numbers your 'weights'. Then, for whatever sentence you want to use as input, you generate a context embedding by looking up those tokens, and putting them into a list.
Next, you do the same for the output you want to map to, lets call it the decoder embedding.
You then loop, and generate a 'noise embedding', for each vector or individual token in the context embedding, you then subtract that token's noise value from that token's embedding value or specific weight.
You find the weight index in the weight dictionary (one entry per word or token in your token dictionary) thats closest to this embedding. You use a version of cuckoo hashing where similar values are stored near each other, and the canonical weight values are actually the key of each key:value pair in your token dictionary. When doing this you align all random numbered keys in the dictionary (a uniform sample from 0 to 1), and look at hamming distance between the context embedding+noise embedding (called the encoder embedding) versus the canonical keys, with each digit from left to right being penalized by some factor f (because numbers further left are larger magnitudes), and then penalize or reward based on the numeric closeness of any given individual digit of the encoder embedding at the same index of any given weight i.
You then substitute the canonical weight in place of this encoder embedding, look up that weights index in my earliest version, and then use that index to lookup the word|token in the token dictionary and compare it to the word at the current index of the training output to match against.
Of course by switching to the hash version the lookup is significantly faster, but I digress.
That introduces a problem.
If each input token matches one output token how do we get variable length outputs, how do we do n-to-m mappings of input and output?
One of the things I explored was using pseudo-markovian processes, where theres one node, A, with two links to itself, B, and C.
B is a transition matrix, and A holds its own state. At any given timestep, A may use either the default transition matrix (training data encoder embeddings) with B, or it may generate new ones, using C and a context window of A's prior states.
C can be used to modify A, or it can be used to as a noise embedding to modify B.
A can take on the state of both A and C or A and B. In fact we do both, and measure which is closest to the correct output during training.
What this *doesn't* do is give us variable length encodings or decodings.
So I thought a while and said, if we're using noise embeddings, why can't we use multiple?
And if we're doing multiple, what if we used a middle layer, lets call it the 'key', and took its mean
over *many* training examples, and used it to map from the variance of an input (query) to the variance and mean of
a training or inference output (value).
But how does that tell us when to stop or continue generating tokens for the output?
Posted on pastebin if you want to read the whole thing (DR wouldn't post for some reason).
In any case I wasn't sure if I was dreaming or if I was off in left field, so I went and built the damn thing, the autoencoder part, wasn't even sure I could, but I did, and it just works. I'm still scratching my head.
https://pastebin.com/xAHRhmfH33 -
I wasn't gone, I was just working.
Anyway, I had some fun and wrote a simple 10 minute little precursor to an ngram implementation:
(when your comments are as long as the code, lol)
https://pastebin.com/bZVh8YSP
It obviously doesn't do type checking, or valid value checking, or any of that, and there may be an old comment or two adding cruft but whatever39 -
I want to remake hunger games.
But its just that last 2nd or 3rd scene where catpiss everclear is ugly crying and neurotically yelling at her sisters cat, for like two hours straight.
RIP Finnick. The only good character. Otherwise it would have been a completely unwatchable series of movies.
Sometimes I wake up and just choose violence. Fight me.16 -
I had the idea that part of the problem of NN and ML research is we all use the same standard loss and nonlinear functions. In theory most NN architectures are universal aproximators. But theres a big gap between symbolic and numeric computation.
But some of our bigger leaps in improvement weren't just from new architectures, but entire new approaches to how data is transformed, and how we calculate loss, for example KL divergence.
And it occured to me all we really need is training/test/validation data and with the right approach we can let the system discover the architecture (been done before), but also the nonlinear and loss functions itself, and see what pops out the other side as a result.
If a network can instrument its own code as it were, maybe it'd find new and useful nonlinear functions and losses. Networks wouldn't just specificy a conv layer here, or a maxpool there, but derive implementations of these all on their own.
More importantly with a little pruning, we could even use successful examples for bootstrapping smaller more efficient algorithms, all within the graph itself, and use genetic algorithms to mix and match nodes at training time to discover what works or doesn't, or do training, testing, and validation in batches, to anneal a network in the correct direction.
By generating variations of successful nodes and graphs, and using substitution, we can use comparison to minimize error (for some measure of error over accuracy and precision), and select the best graph variations, without strictly having to do much point mutation within any given node, minimizing deleterious effects, sort of like how gene expression leads to unexpected but fitness-improving results for an entire organism, while point-mutations typically cause disease.
It might seem like this wouldn't work out the gate, just on the basis of intuition, but I think the benefit of working through node substitutions or entire subgraph substitution, is that we can check test/validation loss before training is even complete.
If we train a network to specify a known loss, we can even have that evaluate the networks themselves, and run variations on our network loss node to find better losses during training time, and at some point let nodes refer to these same loss calculation graphs, within themselves, switching between them dynamically..via variation and substitution.
I could even invision probabilistic lists of jump addresses, or mappings of value ranges to jump addresses, or having await() style opcodes on some nodes that upon being encountered, queue-up ticks from upstream nodes whose calculations the await()ed node relies on, to do things like emergent convolution.
I've written all the classes and started on the interpreter itself, just a few things that need fleshed out now.
Heres my shitty little partial sketch of the opcodes and ideas.
https://pastebin.com/5yDTaApS
I think I'll teach it to do convolution, color recognition, maybe try mnist, or teach it step by step how to do sequence masking and prediction, dunno yet.6 -
pandas can suck my balls.
N
I
H
I'd rather roll my own.
edit: but also xgboost can suck my balls.
Treating every OBVIOUSLY continuous-valued entry as a 'category'.
All searches for this problem turn up tutorials and documentation on how to CONVERT continuous and numeric values into classes or categories.
Not a single fucking document addresses the problem of when pandas or xgboost refuses to treat numeric inputs as numerics and insists on pushing an error that your data is categorical when every fucking inspection shows the type as numeric.9