trekhleb

1y

For learning purposes, I made a minimal TensorFlow.js re-implementation of Karpathy’s minGPT (Generative Pre-trained Transformer). One nice side effect of having the 300-lines-of-code model in a .ts file is that you can train it on a GPU in the browser.

https://github.com/trekhleb/...

The Python and Pytorch version still seems much more elegant and easy to read though...

devrant

gpt

ml

transformenrs

js

ai

Ranter

Comments

1

typosaurus

10598

1y

Great, I've been on his page before. But for some reason didn't try it. Don't remember why. I will try your github thingy tho. Probably tomorrow. I'm plating with ollama models currently. The 0.5 and 1.5b are quite useful. I need those since my vps can't do better
2

trekhleb

210

1y

@retoor I've tried to train ~80M GPT parameters on a single GPU in the browser so far. Pretty heavy. It is interesting to see how 1.5B parameter will behave...
1

typosaurus

10598

1y

@trekhleb that's thing, I have only a cpu. So I can forget training myself prolly. It will never get in neighborhood of 1.5b?
1

typosaurus

10598

1y

I'm already training four days a rnn on a spare laptop. It's training with eight books. It already can create some sentences. The method I use has its limits. If you train too long, it'll make up his own language, it gets too creative. This rnn is also from scratch written by someone, it's very small in C, but python version exists too
1

trekhleb

210

1y

@retoor I'm not sure, it probably depends on the model configuration/implementation and equipment. But in the browser, for that "homemade GPT", I see that training on WebGPU is around x100 - x1000 times faster than CPU
0

typosaurus

10598

1y

@trekhleb is there anything specific you want to make? I'm trying to make a chatbot clone of myself. It's working out nice with a 1b model. It's called retoor1b :D I choose a weaker model for two reasons:

- hardware limitation (laptop / vps, vps is faster)

- easier to train than a model that already has a 'full personality'

- it only has to do creative question answering. it doesn't have to solve complex questions. It can code quite well tho, maybe i'll delete that future

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service