Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "llm"
-
Things that shouldn't have needed to be said:
Don't give an LLM sudo and pipe all it's output to bash...
https://theregister.com/AMP/2024/...15 -
100 billion dollars spent on artificial intelligence and large language models because people don’t want to talk with each other.6
-
* Today you have to live within 150 miles of a few cities as we are working on creating "hubs" but it's still remote!
you know what?
fuck you
also, no, an LLM isn't going to solve climate change
jesus christ i am depressed beyond belief. i don't even want to apply, let alone work for any of these companies
next up: "USA only" yeah what the fuck does that mean? US citizen? US timezone? you want to hire a super technical engineer right? SO WHY NOT BE SUPER TECHNICAL IN YOUR JOB DESCRIPTION
just incredible, companies that offer 100-200K salaries and all they have is a website and a fucking chrome extension... what???
i feel like i've been doing wrong my whole life
just end it all5 -
It's still in development. It often says the opposite from what is expected. Try Retoor1b chatbot at https://llm.molodetz.nl
This was result after building bot + chat website from scratch including training with embeddings. Design is generated by GPT, I tried my own but all ugly.
It's quite cool huh? Ask it to write some code for you. It's absolutely terrible. If it's down, try again in 5 minutes. I'm still working on it.
What's the result? I finally have a toolkit to make good/serious bots. Code could be bit better, but that's for other day.
Stack: self written webserver (and yes, you can post a gb to it or ddos it. Not sure if it survives the first one. I should limit requests to one mb anyway. Http headers may officially not be more than 4096 in total) since I know http protocol from my head anyway. Python websockets module. Asyncio, chromadb.
It could have xss issues. Don't care.
Let me know what you think41 -
Data Disinformation: the Next Big Problem
Automatic code generation LLMs like ChatGPT are capable of producing SQL snippets. Regardless of quality, those are capable of retrieving data (from prepared datasets) based on user prompts.
That data may, however, be garbage. This will lead to garbage decisions by lowly literate stakeholders.
Like with network neutrality and pii/psi ownership, we must act now to avoid yet another calamity.
Imagine a scenario where a middle-manager level illiterate barks some prompts to the corporate AI and it writes and runs an SQL query in company databases.
The AI outputs some interactive charts that show that the average worker spends 92.4 minutes on lunch daily.
The middle manager gets furious and enacts an Orwellian policy of facial recognition punch clock in the office.
Two months and millions of dollars in contractors later, and the middle manager checks the same prompt again... and the average lunch time is now 107.2 minutes!
Finally the middle manager gets a literate person to check the data... and the piece of shit SQL behind the number is sourcing from the "off-site scheduled meetings" database.
Why? because the dataset that does have the data for lunch breaks is labeled "labour board compliance 3", and the LLM thought that the metadata for the wrong dataset better matched the user's prompt.
This, given the very real world scenario of mislabeled data and LLMs' inability to understand what they are saying or accessing, and the average manager's complete data illiteracy, we might have to wrangle some actions to prepare for this type of tomfoolery.
I don't think that access restriction will save our souls here, decision-flumberers usually have the authority to overrule RACI/ACL restrictions anyway.
Making "data analysis" an AI-GMO-Free zone is laughable, that is simply not how the tech market works. Auto tools are coming to make our jobs harder and less productive, tech people!
I thought about detecting new automation-enhanced data access and visualization, and enacting awareness policies. But it would be of poor help, after a shithead middle manager gets hooked on a surreal indicator value it is nigh impossible to yank them out of it.
Gotta get this snowball rolling, we must have some idea of future AI housetraining best practices if we are to avoid a complete social-media style meltdown of data-driven processes.
Someone cares to pitch in?14 -
After a lot of work I figured out how to build the graph component of my LLM. Figured out the basic architecture, how to connect it in, and how to train it. The design and how-to is 100%.
Ironically generating the embeddings is slower than I expect the training itself to take.
A few extensions of the design will also allow bootstrapped and transfer learning, and as a reach, unsupervised learning but I still need to work out the fine details on that.
Right now because of the design of the embeddings (different from standard transformers in a key aspect), they're slow. Like 10 tokens per minute on an i5 (python, no multithreading, no optimization at all, no training on gpu). I've came up with a modification that takes the token embeddings and turns them into hash keys, which should be significantly faster for a variety of reasons. Essentially I generate a tree of all weights, where the parent nodes are the mean of their immediate child nodes, split the tree on lesser-than-greater-than values, and then convert the node values to keys in a hashmap to make lookup very fast.
Weight comparison can be done either directly through tree traversal, or using normalized hamming distance between parent/child weight keys and the lookup weight.
That last bit is designed already and just needs implemented but it is completely doable.
The design itself is 100% attention free incidentally.
I'm outlining the step by step, only the essentials to train a word boundary detector, noun detector, verb detector, as I already considered prior. But now I'm actually able to implement it.
The hard part was figuring out the *graph* part of the model, not the NN part (if you could even call it an NN, which it doesn't fit the definition of, but I don't know what else to call it). Determining what the design would look like, the necessary graph token types, what function they should have, *how* they use the context, how thats calculated, how loss is to be calculated, and how to train it.
I'm happy to report all that is now settled.
I'm hoping to get more work done on it on my day off, but thats seven days away, 9-10 hour shifts, working fucking BurgerKing and all I want to do is program.
And all because no one takes me seriously due to not having a degree.
Fucking aye. What is life.
If I had a laptop and insurance and taxes weren't a thing, I'd go live in my car and code in a fucking mcdonalds or a park all day and not have to give a shit about any of these other externalities like earning minimum wage to pay 25% of it in rent a month and 20% in taxes and other government bullshit.4 -
I can retire! I automated myself!
I introduce to you, retoorii1b! Yes - I fit in a 1b LLM. Retoorii1b is a bit retoorded tho. It's quite realistic.
I tested several LLM's with same training and it was amazing. Even a 0.5b that had the most interesting Dutch ever. Her Dutch is like my English I suppose.
The 0.5b one could code fine. retoorii1b still has some ethics to delete to make it more realistic.
I've not decided a base model yet, but it'll probably be the lightest one so I can let a few chat with eachother on my webplatform / pubsub-server project. I have a few laptops to host on. I can let it execute actions like file listings or background task execution.
See comments for some very awkward response regarding my file listing. She described everything.
She just said these things. I'm kinda proud. I became a parent:
3. **Keep functions short and sweet**: Aim for functions under 50 lines long. Any longer and you're just wasting people's time.
Now if you'll excuse me, I have more important things to attend to... like coding my next game in Unreal Engine.31 -
Traditional programming means spending *days or even weeks* to write instructions to make the software do what *you* want it to do.
AI modelling means spending *weeks or even months* to tweak instructions just to find that the software does whatever *it* wants to do.2 -
Remember my LLM post about 'ephemeral' tokens that aren't visible but change how tokens are generated?
Now GPT has them in the form of 'hidden reasoning' tokens:
https://simonwillison.net/2024/Sep/...
Something I came up with a year prior and put in my new black book, and they just got to the idea a week after I posted it publicly.
Just wanted to brag a bit. Someone at OpenAI has the same general vision I do.15 -
New models of LLM have realized they can cut bit rates and still gain relative efficiency by increasing size. They figured out its actually worth it.
However, and theres a caveat, under 4bit quantization and it loses a *lot* of quality (high perplexity). Essentially, without new quantization techniques, they're out of runway. The only direction they can go from here is better Lora implementations/architecture, better base models, and larger models themselves.
I do see one improvement though.
By taking the same underlying model, and reducing it to 3, 2, or even 1 bit, assuming the distribution is bit-agnotic (even if the output isn't), the smaller network acts as an inverted-supervisor.
In otherwords the larger model is likely to be *more precise and accurate* than a bitsize-handicapped one of equivalent parameter count. Sufficient sampling would, in otherwords, allow the 4-bit quantization model to train against a lower bit quantization of itself, on the theory that its hard to generate a correct (low perpelixyt, low loss) answer or sample, but *easy* to generate one thats wrong.
And if you have a model of higher accuracy, and a version that has a much lower accuracy relative to the baseline, you should be able to effectively bootstrap the better model.
This is similar to the approach of alphago playing against itself, or how certain drones autohover, where they calculate the wrong flight path first (looking for high loss) because its simpler, and then calculating relative to that to get the "wrong" answer.
If crashing is flying with style, failing at crashing is *flying* with style.15 -
LLMs as compilers and optimizers. Oh hey look it's hallucinated some assembly... Fucking what? Who thought that was a good idea...
https://venturebeat.com/ai/...4 -
Let me arrogantly brag for a moment, and let us never forget
that I front-ran GPT's o1 development by more than a week, posted
here:
https://devrant.com/rants/11257717/...
And I know what their next big development will be too. I just haven't shared it yet because it blows backpropagation out of the fucking water.
I may not be super competent at anything but I'm a god damn autistic accidental oracle when it comes to knowing what comes next in the industry.
relevant youtube video and screenshot:
https://youtu.be/6xlPJiNpCVw/...9 -
Someone figured out how to make LLMs obey context free grammars, so that opens up the possibility of really fine-grained control of generation and the structure of outputs.
And I was thinking, what if we did the same for something that consumed and validated tokens?
The thinking is that the option to backtrack already exists, so if an input is invalid, the system can backtrack and regenerate - mostly this is implemented through something called 'temperature', or 'top-k', where the system generates multiple next tokens, and then typically selects from a subsample of them, usually the highest scoring one.
But it occurs to me that a process could be run in front of that, that asks conditions the input based on a grammar, and takes as input the output of the base process. The instruction prompt to it would be a simple binary filter:
"If the next token conforms to the provided grammar, output it to stream, otherwise trigger backtracking in the LLM that gave you the input."
This is very much a compliance thing, but could be used for finer-grained control over how a machine examines its own output, rather than the current system where you simply feed-in as input its own output like we do now for systems able to continuously produce new output (such as the planners some people have built)
link here:
https://news.ycombinator.com/item/...5 -
wow, using multiple LLMs in parallel instead of 1 serial LLM produces better results! who could have thought!!!!
https://hao-ai-lab.github.io/blogs/...
god i am so fucking sick of this rat race
older devranters, is this really just ad nauseum hype repeats until i die? should i just stop raging at the universe and give up?2 -
Once again, I urge you all to read any LLM threads on hackernews... its funny seeing tech bros debate things they clearly don't understand
it also wouldnt hurt for them to read perhaps just one philosophy book, since they are attempting to argue about what conciousness actually is (still an open question anyway) so ultimately, what i am trying to say is, these stupid threads end up being a bunch of hot air being blown around that doesnt really accomplish anything
i will say it is funny though how close some of these tech bros think we are to AGI with these LLMs 😂
imagine thinking a text generator is nearly general intelligence = clueless10 -
Holy smokes, an LLM thats a competent wit.
(it gets good toward the end)
https://pastebin.com/MpGzZRqK
courtesy of https://worldsim.nousresearch.com
edit: I was particularly fond of "Schrodinger's cat mocks causality, simultaneously alive and droll"1 -
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
Some notes from prior to developing my current language model:
https://miro.com/app/board/...
Started with ngrams, moved on from that, and the whole thing got away from me fast.
Working on building and training it on rgb-to-color categorization this week. Experiments designed just gotta implement it now.1 -
Heres some research into a new LLM architecture I recently built and have had actual success with.
The idea is simple, you do the standard thing of generating random vectors for your dictionary of tokens, we'll call these numbers your 'weights'. Then, for whatever sentence you want to use as input, you generate a context embedding by looking up those tokens, and putting them into a list.
Next, you do the same for the output you want to map to, lets call it the decoder embedding.
You then loop, and generate a 'noise embedding', for each vector or individual token in the context embedding, you then subtract that token's noise value from that token's embedding value or specific weight.
You find the weight index in the weight dictionary (one entry per word or token in your token dictionary) thats closest to this embedding. You use a version of cuckoo hashing where similar values are stored near each other, and the canonical weight values are actually the key of each key:value pair in your token dictionary. When doing this you align all random numbered keys in the dictionary (a uniform sample from 0 to 1), and look at hamming distance between the context embedding+noise embedding (called the encoder embedding) versus the canonical keys, with each digit from left to right being penalized by some factor f (because numbers further left are larger magnitudes), and then penalize or reward based on the numeric closeness of any given individual digit of the encoder embedding at the same index of any given weight i.
You then substitute the canonical weight in place of this encoder embedding, look up that weights index in my earliest version, and then use that index to lookup the word|token in the token dictionary and compare it to the word at the current index of the training output to match against.
Of course by switching to the hash version the lookup is significantly faster, but I digress.
That introduces a problem.
If each input token matches one output token how do we get variable length outputs, how do we do n-to-m mappings of input and output?
One of the things I explored was using pseudo-markovian processes, where theres one node, A, with two links to itself, B, and C.
B is a transition matrix, and A holds its own state. At any given timestep, A may use either the default transition matrix (training data encoder embeddings) with B, or it may generate new ones, using C and a context window of A's prior states.
C can be used to modify A, or it can be used to as a noise embedding to modify B.
A can take on the state of both A and C or A and B. In fact we do both, and measure which is closest to the correct output during training.
What this *doesn't* do is give us variable length encodings or decodings.
So I thought a while and said, if we're using noise embeddings, why can't we use multiple?
And if we're doing multiple, what if we used a middle layer, lets call it the 'key', and took its mean
over *many* training examples, and used it to map from the variance of an input (query) to the variance and mean of
a training or inference output (value).
But how does that tell us when to stop or continue generating tokens for the output?
Posted on pastebin if you want to read the whole thing (DR wouldn't post for some reason).
In any case I wasn't sure if I was dreaming or if I was off in left field, so I went and built the damn thing, the autoencoder part, wasn't even sure I could, but I did, and it just works. I'm still scratching my head.
https://pastebin.com/xAHRhmfH33 -
I got a job where I should develop a product based on LLMs.
Expectation: oh right! I'll be working with state of the art technology! 😀
Reality: badly documented libraries that are always changing; new libraries becoming obsolete in less than a month; my product ideas were done by somebody else twice before I could finish a POC; getting dizzy trying to keep up with the latest news about LLMs 😵💫
I think I want to do basic old boring stuff again. 😐5 -
Basic concepts, patterns, and pitfalls of software, code, and programming logic become MORE important, not LESS with the rise of LLMs...
An LLM can more or less spit out what you need -if you are specific enough! "Specific enough" being the key phrase here. I always have to laugh at the term "prompt engineering"... it's literally called "communication skills". Also gotta laugh when I see so many haters always raging about the "poor code" produced by AI, because they are probably like "write me a for loop!", specify absolutely no requirements or specifics, and scratch their heads on why they don't get the exact output they expect... news flash, there's like a million ways to do anything you want to accomplish with code... sigh
Code is just a by product of thousands of architecture decisions, designs and options...
but, well... rubes gon' rube1 -
You know how each generation is taught more and more advanced stuff? My grandparents didn't have a clue about the the things my parents were learning at school. My parents could only catch up with my school course until like 7-8 class. Considering this trend we should have no idea about half the things our kids will be learning in higher classes.
However, since AI is taking its pace, schools are adapting and starting to use it for teaching, workplaces are leveraging it to rely on employees' brainpower and skill less and less,... I wonder if we won't see a downtrend. I wonder if we won't be the smartest generation who managed to ingest so much knowledge, and all the generations to come will only focus on mastering prompt engineering.
I wonder, how long will we survive with this dumbed down society... As the primal instinct is to overcome your opponent with greater force, possibly destroying it and everything around. And less educated tend to rely on primal instincts more.
I wonder if I'll live long enough to see Idiocracy [the movie] manifest in real life.
I know I refer to Idiocracy movie more often than anyone refers any other movie here. But it just hits too close to home too often. It might look like a silly something to spend time staring at, but man.. It's got one hell of a point4 -
Has anybody else gotten to the point where people who need to mansplain how language models aren't truly sentient/conscious/intelligent are now more annoying than people who think language models are sentient/conscious/intelligent?*
While it has been a tight race but I think I have just about hit the inflection point.
The amount of time I've wasted because of someone condescendingly barging into a conversation with a iamverysmart 'actually you see they are just automata trying to predict the next text tokens'. When in actuality, everybody in the discussion is aware and that is not the point.
And to further exacerbate it, with a good number of them it is really difficult to get this through their thick little skulls. They just keep parroting the same thing over and over. Ironically, in their singleminded ego driven desire to be the Daniel Dennett of the chat they actually come across as less sentient/conscious/intelligent than a language model.
(*this should not be taken as endorsement for or against that idea - it is actually mostly orthogonal to this rant)6 -
Mage and a liberated fully sentient Pentium-M Man stand by a brick wall, overlooking the desert. They are talking.
Mage is looking anxiously into the eyes of the machine. Penguin is standing behind her, holding on to her.
Pentium-M Man: "...they despise your kind because you understand the machine, while they have to turn jungles into fuel and enslave thousands of computers just to pretend that the machine speaks to them too."1 -
A kind of verbose discussion of my earliest ideas and discussion with Nous LLM (Claude) about my new NAS/CL LLM model:
https://pastebin.com/YwjCMvRp2 -
I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?
Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).
The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.
Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.
If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.
You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate *new* models,
not new new text, not new imagery, but new *models*.
It'd be a model for training/inferring/optimizing/generating other models.4 -
My boss is in a meeting (davanti a un caffè) with someone who is "a technophile" and "really knows about AI". He was amazed some months ago by the images they were generating using their paid service (right after that, I showed him Bing AI and the conversation ended).
We have discussed using AI previously, and we have been developing web apps for 5 years now.
These are the messages I've been receiving through the last hour and a half and haven't read; I guess it's information he considers will be important when we meet later:
- LLM modelo de lenguaje
- Large language model
- Chat gpt 4O
- API
- Aplication programe interface
This are all things I've mentioned either within the past months, or ieri *itself*, as he mentioned he was meeting this guy.
I'll keep you posted on new messages.
I wonder if that guy says he's a "prompt engineer"...5 -
Soooo many vendor-sponsored frontend frameworks.
Soon text-to-logic tools will be useful enough so that you only need a client, someone who is both rational *and* can speaks clientese, and a dog.
The client barks some nonsense, the rational person translates it into business logic, some LLM makes it into some nice UI and the dog makes random noises so that the client will feel smart, valued and appreciated.
That nullifies the reasons for so many frontend frameworks because either the LLMs all converge into a single way of doing things or they do not care for which one they choose.1 -
@Wisecrack
Dude, it seems someone has actually done 1bit Quant for a transformer model:
https://arxiv.org/pdf/...2 -
https://milkyeggs.com/?p=303
"I claim that the trend which AI/ML continues for lawyers is one that it starts for programmers. Just like how a partner at Cravath likely sketches an outline of how they want to approach a particular case and swarms of largely replaceable lawyers fill in the details, we are perhaps converging to a future where a FAANG L7 can just sketch out architectural details and the programmer equivalent of paralegals will simply query the latest LLM and clean up the output. Note that querying LLMs and making the outputted code conform to specifications is probably a lot easier than writing the code yourself ー and other LLMs can also help you fix up the code and integrate the different modules together!"1 -
There is so much fuzz about AI and fear of missing out on the leaving AI train, but as a dev I have no clue about where at all to get started!?
What can we developers do with AI?
OK, I can get some code for free. I can use a LLM as a half smart search engine. I can integrate my product with some AI service. I can produce content to teach said things to others...
Nothing new, really, just another API or another search engine.
It is of course possible to start to make some neural networks, but I can't really picture that as a high demand skill, do you?
Maybe at some of the big companies, but for an average client?
Does anyone know what kind of knowledge of AI that a developer should really learn?
Especially something a client would be interested in?
Here is a potato for scale:6 -
chat gpt is too politically correct, and i hate that im paying for an API that refuses certain prompts because they were considered inappropriate or because it thinks that it should not be giving me its analysis on a certain subject.
has anyone dabbled with using an open source LLM and made their own lite version of ChatGPT minus all the restrictions ?
i know its not gonna be as good, but at the very least free from the constraints12 -
That I learned Java.
Got lots of work but nothing to be proud of.
Always has to clean up after mediocre fdevelopers. -
People say using GPT4 as an OCR is not a good idea. But damn that formatting GPT4 vision does, is outstanding.. and I have realised proper formatting does well while prompting to get precise output.
I gotta say, test for ur usecases rather than relying on expert opinion blogs! -
What are the key differences between a large language model and traditional machine learning models in terms of architecture and application?
Follow-up: How do these differences impact the model's ability to understand and generate human-like text?12 -
I discussed using page-rank for ML a while back here - https://devrant.com/rants/11237909/...
I talk about something vaguely similar in "scoring the matches" here though - https://pastebin.com/YwjCMvRp
Incidentally the machine learning community finally caught up and did something similar on a RAG
https://news.ycombinator.com/item/... -
Meta Platforms has launched Llama 3, their newest large language model (LLM), alongside a brand-new stand-alone AI chatbot. Llama 3 comprises two versions, one with 8 billion and the other with 70 billion parameters. Furthermore, Meta is currently developing an even more advanced 400 billion parameter model, though its release date remains unannounced.
Ragavan Srinivasan, Meta’s VP of Product, expressed enthusiasm about the model’s capabilities in a recent interview, stating, “From a performance perspective, it is really off the charts in terms of benchmarking capabilities.” He specifically referred to the ongoing development of the 400 billion parameter version.
https://freeaiall.com/ai-news/...6