So I decided to run mozilla deep speech against some of my local language dataset using transfer learning from existing english model.
I adjusted alphabet and begin the learning.
I have pc with gtx1080 laying around so I utilized that but I recommend to use at least newest rtx 3080 to not waste time ( you can read about how much time it took below ).

Waited for 3 days and error goes to about ~30 so I switched the dataset and error went to about ~1 after a week.
Yeah I waited whole got damn week cause I don’t use this computer daily.

So I picked some audio from youtube to translate speech to text and it works a little. It’s not a masterpiece and I didn’t tested it extensively also didn’t fine tuned it but it works as I expected. It recognizes some words perfectly, other recognize partially, other don’t recognize.

I stopped test at this point as I don’t have any business use or plans for this but probably I’m one of the couple of companies / people right now who have my native language speech to text machine learning model.

I was doing transfer learning for the first time, also first time training from audio and waiting for results for such long time. I can say I’m now convinced that ML is something big.

To sum up, probably with right amount of money and time - about 1-3 months you can make decent speech to text software at home that will work good with your accent and native language.

Add Comment