Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "stable diffusion"
-
I setup stable diffusion today. Still figuring it out but I'm like an artist now right? Right?
Next step is figuring out how to train models.
Then I have to make some samples of various words in spectrogram form for training.
After that we'll see if stable diffusion can reconstruct phonemes.
I'll train using both my voice and a couple others, and apply them as styles.
And then finally, I can accomplish my lifes goal.
To have the voice of morgan freeman with me at all times, everywhere I go.5 -
When Everybody Is Digging for Gold, It’s Good To Be in the Pick and Shovel Business
- ai is just another squeeze of money to cloud from our pockets, no matter what you do as long as you’re not selling/renting hardware or have high profit customers your product will die
I don’t believe in any ai product right now that can’t be self hosted and opensource and many of them are not.
I use mac, like 64GB m1 mac book pro so I can host load of things like llama, wizzard-lm, mistral, any yolo, whisper, gpt, fucking midjourney or other stable diffusion for me is no drama.
I’d say there is no consumer product for ai right now. OpenAI is scam given what we got from mistral.
We are very early in this new but old technology and my worries are that we are not there yet. We will need to wait for another iteration that is approximately 10 years to achieve what we have in mind because current hardware is 10 years behind software.
We don’t have an affordable computing power to go for our dreams.
Sad but true.5 -
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
1/2 dev and a fair warning: do not go into the comments.
You're going anyway? Good.
I began trying to figure out how to use stable diffusion out of boredom. Couldn't do shit at first, but after messing around for a few days I'm starting to get the hang of it.
Writing long prompts gets tiresome, though. Think I can build myself a tool to help with this. Nothing fancy. A local database to hold trees of tokens, associate each tree to an ID, like say <class 'path'> or some such. Essentially, you use this to save a description of any size.
The rest is textual substitution, which is trivial in devil-speak. Off the top of my head:
my $RE=qr{\< (?<class> [^\s]+) \s+ ' (?<path>) [^'] '\>}x;
And then? match |> fetch(validate) |> replace, recurse. Say:
while ($in =~ $RE) {
my $tree=db->fetch $+{class},$+{path};
$in=~ s[$RE][$tree];
};
Is that it? As far the substitution goes, then yeah, more or less. We have to check that a tree's definition does not recurse for this to work though, but I would do that __before__ dumping the tree to disk, not after.
There is most likely an upper limit to how much abstraction can be achieved this way, one can only get so specific before the algorithm starts tripping balls I reckon, the point here is just reaching that limit sooner.
So pasting lists of tokens, in a nutshell. Not a novel idea. I'd just be making it easier for myself. I'd rather reference things by name, and I'd rather not define what a name means more than once. So if I've already detailed what a Nazgul is, for instance, then I'd like to reuse it. Copy, paste, good times.
Do promise to slay me in combat should you ever catch me using the term "prompt engineering" unironically, what a stupid fucking joke.
Anyway, the other half, so !dev and I repeat the warning, just out of courtesy. I don't think it needs to be here, as this is all fairly mild imagery, but just in case.
I felt disappointed that a cursed image would scare me when I've seen far worse shit. So I began experimenting, seeing if I could replicate the result. No luck yet, but I think we're getting somewhere.
Our mission is clearly the bronwning of pants, that much is clear. But how do we come to understand fear? I don't know. "Scaring" seems fairly subjective.
But I fear what I know to be real,
And I believe my own two eyes.11 -
Anyone tried converting speech waveforms to some type of image and then using those as training data for a stable diffusion model?
Hypothetically it should generate "ultrarealistic" waveforms for phonemes, for any given style of voice. The training labels are naturally the words or phonemes themselves, in text format (well, embedding vectors fwiw)
After that it's a matter of testing text-to-image, which should generate the relevant phonemes as images of waveforms (or your given visual representation, however you choose to pack it)
I would have tried this myself but I only have 3gb vram.
Even rudimentary voice generation that produces recognizable words from text input, would be interesting to see implemented and maybe a first for SD.
In other news:
Implementing SQL for an identity explorer. Basically the system generates sets of values for given known identities, and stores the formulas as strings, along with the values.
For any given value test set we can then cross reference to look up equivalent identities. And then we can test if these same identities hold for other test sets of actual variable values. If not, the identity string cam be removed, or gophered elsewhere in the database for further exploration and experimentation.
I'm hoping by doing this, I can somewhat automate the process of finding identities, instead of relying on logs and using the OS built-in text search for test value (which I can then look up in the files that show up, and cross reference the logged equations that produced those values), which I use to find new identities.
I was even considering processing the logs of equations and identities as some form of training data perhaps for a ML system that generates plausible new identities but that's a little outside my reach I think.
Finally, now that I know the new modular function converts semiprimes into numbers with larger factor trees, I'm thinking of writing a visual browser that maps the connections from factor tree to factor tree, making them expandable and collapsible, andallowong adjusting the formula and regenerating trees on the fly.7