Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Hazarth95212yTalking bots arent typically done using NNs. But text generators like GPT-3 use recurrent neural networks (RNN) and generate the output letter by letter with the network remembering the context. So really the output could be just 27+n outputs each corresponding to a letter of the alphabet. I think
-
Hazarth95212y@AvatarOfKaine Aight, so I looked at replika and it's using GPT-3 as it's base
however I was wrong about GPT-3, it doesn't use an RNN but a Transformer model (lol, it's in the name GPT -> Generative Pre-trained Transformer)
So really it's working on a word-to-word basis. I'm not particularly well versed with Transformers yet, but it seems the gist of it is as follows:
First you need a Word -> Vector encoder of some sort (perhaps Word2Vec would work well here instead of using a custom one)
this output vector is of size N, e.g. ["i", "am", "a", "human"] could map to a vector of size 4 (1, 0, 0, 0), (0, 1, 0, 0) and so on (though one-hot encoding is stupid for NLP, so Word2Vec would be a better approach, not sure what it's output vector size is though)
This is then additionally encoded using a technique called position-encoding, which maps every word to a position in the sequence of words, this maps as N -> N'
N' is your GPT-3 Transformers Input layer size now -
Hazarth95212yAfterwards, the magic Transformer magic happens, which mostly seems to be a bunch of dense and sparse transformer layers. I know very little about transformers so you'd have to do your own research here, but back to your question
the output then is once again of size N, the encoded word vector. You then need to take this vector and go through a Vector -> Word mapping, or reverse Word2Vec to get the predicted word back.
You would then take this word and start all over again by encoding it positionally again as the next word in the sequence, and feed that back into the network to once again get the next word, and then the next and next and so on...
so the arch is something like this
Input = "Word 1"
Preprocessing:
input -[Encode]> Vector[N] -[Positional Encode]> Vector[N']
Actual Network:
Vector[N'] -> OutputVector[N]
Postporcessing:
OutputVector[N] -[Decode]> "Next String"
input = "Next String"
Repeat from Preprocessing with new input
How do they parse and arrange the input and then generate the output with neural nets for talking bots?
question