Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "diffusion"
-
I setup stable diffusion today. Still figuring it out but I'm like an artist now right? Right?
Next step is figuring out how to train models.
Then I have to make some samples of various words in spectrogram form for training.
After that we'll see if stable diffusion can reconstruct phonemes.
I'll train using both my voice and a couple others, and apply them as styles.
And then finally, I can accomplish my lifes goal.
To have the voice of morgan freeman with me at all times, everywhere I go.5 -
I would have never considered it but several people thought: why not train our diffusion models on mappings between latent spaces themselves instead of on say, raw data like pixels?
It's a palm-to-face moment because of how obvious it is in hindsight.
Details in the following link (or just google 'latent diffusion models')
https://huggingface.co/docs/... -
This is gonna be a long post, and inevitably DR will mutilate my line breaks, so bear with me.
Also I cut out a bunch because the length was overlimit, so I'll post the second half later.
I'm annoyed because it appears the current stablediffusion trend has thrown the baby out with the bath water. I'll explain that in a moment.
As you all know I like to make extraordinary claims with little proof, sometimes
for shits and giggles, and sometimes because I'm just delusional apparently.
One of my legit 'claims to fame' is, on the theoretical level, I predicted
most of the developments in AI over the last 10+ years, down to key insights.
I've never had the math background for it, but I understood the ideas I
was working with at a conceptual level. Part of this flowed from powering
through literal (god I hate that word) hundreds of research papers a year, because I'm an obsessive like that. And I had to power through them, because
a lot of the technical low-level details were beyond my reach, but architecturally
I started to see a lot of patterns, and begin to grasp the general thrust
of where research and development *needed* to go.
In any case, I'm looking at stablediffusion and what occurs to me is that we've almost entirely thrown out GANs. As some or most of you may know, a GAN is
where networks compete, one to generate outputs that look real, another
to discern which is real, and by the process of competition, improve the ability
to generate a convincing fake, and to discern one. Imagine a self-sharpening knife and you get the idea.
Well, when we went to the diffusion method, upscaling noise (essentially a form of controlled pareidolia using autoencoders over seq2seq models) we threw out
GANs.
We also threw out online learning. The models only grow on the backend.
This doesn't help anyone but those corporations that have massive funding
to create and train models. They get to decide how the models 'think', what their
biases are, and what topics or subjects they cover. This is no good long run,
but thats more of an ideological argument. Thats not the real problem.
The problem is they've once again gimped the research, chosen a suboptimal
trap for the direction of development.
What interested me early on in the lottery ticket theory was the implications.
The lottery ticket theory says that, part of the reason *some* RANDOM initializations of a network train/predict better than others, is essentially
down to a small pool of subgraphs that happened, by pure luck, to chance on
initialization that just so happened to be the right 'lottery numbers' as it were, for training quickly.
The first implication of this, is that the bigger a network therefore, the greater the chance of these lucky subgraphs occurring. Whether the density grows
faster than the density of the 'unlucky' or average subgraphs, is another matter.
From this though, they realized what they could do was search out these subgraphs, and prune many of the worst or average performing neighbor graphs, without meaningful loss in model performance. Essentially they could *shrink down* things like chatGPT and BERT.
The second implication was more sublte and overlooked, and still is.
The existence of lucky subnetworks might suggest nothing additional--In which case the implication is that *any* subnet could *technically*, by transfer learning, be 'lucky' and train fast or be particularly good for some unknown task.
INSTEAD however, what has happened is we haven't really seen that. What this means is actually pretty startling. It has two possible implications, either of which will have significant outcomes on the research sooner or later:
1. there is an 'island' of network size, beyond what we've currently achieved,
where networks that are currently state of the3 art at some things, rapidly converge to state-of-the-art *generalists* in nearly *all* task, regardless of input. What this would look like at first, is a gradual drop off in gains of the current approach, characterized as a potential new "ai winter", or a "limit to the current approach", which wouldn't actually be the limit, but a saddle point in its utility across domains and its intelligence (for some measure and definition of 'intelligence').4 -
Theres a method for speeding up diffusion generation (things like stablediffusion) by several orders of magnitude. It's related to particle physics simulations.
I'm just waiting for the researchers to figure it out for themselves.
it's like watching kids break toys.6 -
Anyone tried converting speech waveforms to some type of image and then using those as training data for a stable diffusion model?
Hypothetically it should generate "ultrarealistic" waveforms for phonemes, for any given style of voice. The training labels are naturally the words or phonemes themselves, in text format (well, embedding vectors fwiw)
After that it's a matter of testing text-to-image, which should generate the relevant phonemes as images of waveforms (or your given visual representation, however you choose to pack it)
I would have tried this myself but I only have 3gb vram.
Even rudimentary voice generation that produces recognizable words from text input, would be interesting to see implemented and maybe a first for SD.
In other news:
Implementing SQL for an identity explorer. Basically the system generates sets of values for given known identities, and stores the formulas as strings, along with the values.
For any given value test set we can then cross reference to look up equivalent identities. And then we can test if these same identities hold for other test sets of actual variable values. If not, the identity string cam be removed, or gophered elsewhere in the database for further exploration and experimentation.
I'm hoping by doing this, I can somewhat automate the process of finding identities, instead of relying on logs and using the OS built-in text search for test value (which I can then look up in the files that show up, and cross reference the logged equations that produced those values), which I use to find new identities.
I was even considering processing the logs of equations and identities as some form of training data perhaps for a ML system that generates plausible new identities but that's a little outside my reach I think.
Finally, now that I know the new modular function converts semiprimes into numbers with larger factor trees, I'm thinking of writing a visual browser that maps the connections from factor tree to factor tree, making them expandable and collapsible, andallowong adjusting the formula and regenerating trees on the fly.7