Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "speech to text"
-
My friend left their macbook unlocked, so we parsed the entire story of Moby Dick into the text to speech and left it in the background on full volume. Never seen such a confused face in my life.
-
I was on vacation when my employer’s new fiscal year started. My manager let me take vacation because it’s not like anything critical was going to happen. Well, joke was on us because we didn’t foresee the stupidity of others…
I had to update a few product codes in the website’s web config and deploy those changes. I was only going to be logged in for 30 minutes to complete that.
I get messaged by one of our database admins. He was doing testing and was unable to complete a payment on the website. That was strange. There was a change pushed by our offsite dev agency, but that was all frontend changes (just updating text) and wouldn’t affect payments.
We don’t want to enlist the dev agency for debugging work, especially when it’s not likely that it’s a code issue. But I was on vacation and I couldn’t stay online past the time I had budgeted for. So my employer enlists the dev agency for help. It’s going to be costly because the agency is in Lithuania, it was past their business hours, and it was emergency support.
Dev agency looks at error logs. There are Apple Pay errors, but that doesn’t explain why non Apple Pay transactions aren’t going through. They roll back my deployment and theirs, but no change. They tell my employer to contact our payment processor.
My manager and the Product Manager contact Payroll, who is the stakeholder for our payment gateways. Payroll contacts our payment gateway and finds out a service called Decision Manager was recently configured for our account. Decision Manager was declining all payments. Payroll was not the person who had Decision Manager installed and our account using this service was news to her.
Payroll works with our payment processor to get payments working again. The damage is pretty severe. Online payments were down for at least 12 hours. Our call center had logged reports from customers the night before.
At our post mortem, we had to find out who ok’d Decision Manager without telling anyone. Luckily, it was quick work. The first stakeholder up was for the Fundraising Dept. She said it wasn’t her or anyone on her team. Our VP of Analytics broke it to her that our payment processor gave us the name of the person who ok’d Decision Manager and it was someone on the Fundraising team. Fundraising then starts backtracking and says that oh yes she knew about it but transactions were still working after the Decision Manager had been configured. WTAF.
Everyone is dumbfounded by this. How could you make a big change to our payment processor and not tell anyone? How did our payment processor allow you to make this change when you’re not the account admin (you’re just a user)?
Our company head had to give an awkward speech about communication and how it’s important. The web team can’t figure out issues if you don’t tell us what you did. The company head was pissed because it was a shitty way to start off the new fiscal year. Our bill for the dev agency must have been over $1000 for debugging work that wasn’t helpful.
Amazingly, no one was fired.4 -
The secretaries at my university had to scan documents in the masters students lab. So one day, when the lab was empty save one of our secretaries, I remote into my machine and write a text to speech app and have the computer announce "Hello Selma, you really know how to push my buttons"
It took a while, but we are friends again :D -
That moment when you work the whole day to write a discord bot from scratch. No discord.py and other wrappers. Pure websockets, oauth2, https, json loads here and there. Understanding how the discord API works was a real challenge, but I did it :).
Most of my time was spent on discord's gateway connection and identification system.
The bot can renew its token, get all the guilds it is part of, all the channels and users of these guilds, send message and communicate with the gateway.
Tomorrow I will start connecting it to a voice channel and let it "speak". Thinking of combining text-to-speech with it, but I am not sure how well they are going to harmonize together.5 -
If you don't know how to explain about your software, but you want to be featured in Forbes (or other shitty sites) as quickly as possible, copy this:
I am proud that this software used high-tech technology and algorithms such as blockchain, AI (artificial intelligence), ANN (Artificial Neural Network), ML (machine learning), GAN (Generative Adversarial Network), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), DNN (Deep Neural Network), TA (text analysis), Adversarial Training, Sentiment Analysis, Entity Analysis, Syntatic Analysis, Entity Sentiment Analysis, Factor Analysis, SSML (Speech Synthesis Markup Language), SMT (Statistical Machine Translation), RBMT (Rule Based Machine Translation), Knowledge Discovery System, Decision Support System, Computational Intelligence, Fuzzy Logic, GA (Genetic Algorithm), EA (Evolutinary Algorithm), and CNTK (Computational Network Toolkit).
🤣 🤣 🤣 🤣 🤣3 -
!rant
Coding is like having superpowers.
For instance: For school i have to read 8 books and I have limited time and motivation. What I did? I wrote a program that filters the text from a pdf or epub and converted it to spoken text with gtts (Google Text To Speech).
Now all I have to to is to listen to the story and relax..5 -
I've actually had mostly good instructors for CS. Or at least mediocre. The worst teacher I had was actually my Algebra II teacher in high school. She taught by reading, word for word, from our textbook. She would copy the example problems from that chapter onto the whiteboard. And then give us the rest of class to work on homework. She was basically a Text-to-Speech program for our textbook.
We all joked that she was drunk and the one locked cabinet in her classroom contained liquor. A year after I had her class she was fired. For drinking on the job. The joke turned out to be 100% true and they actually did find alcohol in the locked cabinet. -
I have been working for my current employer about 3 years now. When I first got to work I was asked by another employee to work on an editor for certain types of files. We will call this employee Ed. Because his name is Ed.
Ed is a verifiable genius, and a genuinely great guy to work with. He is amazing with hardware and math. Ed has a need, or shall I say fetish. He wants an editor for some our proprietary files called "Settings files". They are just xml. Nothing special.
However, I have always had other priorities. We actually had a tense moment when I had to tell Ed my boss doesn't want me to work on the editor. I had started looking into working on the editor when my boss said stop working on this file. So since then it had become a running joke between Ed and myself. Well, I think it is funny, Ed smiles, but I know he wants this editor bad. Our boss even suggested at one time that Ed write this editor. He looked into it, but "other priorities" trumped this effort.
Okay, so now it has been 3 years and we still don't have this editor. Then I had an epiphany. Since Ed wants this editor I found an idea for the name of this program. "Settings Editor" is just too mundane. I now think it should be called: "Mr. Edit". I also found that the library we use for most of our development has text to speech built in. So when the program starts I can have it say: "Hello, I am Mr. Edit, the talking Settings Editor". I have never wanted to write this program so badly before. Muahahahahaha!5 -
Short: Still supported after 10 years
First Chromebook sent to testers.
Long: Lost my voice recently so need to use text to speech on my phone but to many typos so slow...
If only I had a keyboard... Wait a laptop... Hm... Chromebook...
Looks for cheapest on Amazon
But go.... Wait I already have one.... That I installed Ubuntu on and then somehow corrupted.
And never got around to fixing it because didn't really need it... Or have a 8gb USB stick around at the time.
Well now I saved $100 assuming the battery still works too6 -
I am sitting at Starbucks trying to focus and finish some posts for my side project. The lady at the next table has been loudly 😁talking about her Disney trip in excruciating detail. Do we really have to know she will store her bagel and tuna fish in the hotel room fridge - really!
30 minutes in and she is just getting to the hotel - yikes 😫
No other seats open. I am trapped. Project deliverable delay - reason loud lady.
I was going to do this as a speech to text post but too nice to do that. HELP!2 -
Peeve of the week: Youtube videos with robot voices (text-to-speech).
Youtube needs detection and a filter option to let users remove those vids from search results. -
So yesterday I discussed how I am using speech to text to do approximately 50% of my rants. I am now doing a growing percentage of my outlook emails by voice as the human-computer voice interaction is pleasing and very natural. I have even named my iPhone 'little jumpshot' today.
Today I experimented with text to speech so that my rants are automatically read back to me before I send them. Some decent results.
In settings - general - accessibility you will find voice over (not recommended - be careful). Below that is Speech - speak selection or speak screen options.
Speak selection allows you to highlight text to be spoken. Too much human interaction for my purposes of walking hopefully not tripping be looking down. Using up my nine lives 😐
Below voiceover is - Speak screen - which allows you to pull down the screen with two fingers to speak what is on the screen. This will read the rant or of there are multiple rants on the screen it will read those as well.
It works but it will take a bit of getting used to. It also requires a few clicks here and there.
My goal is to interact with devRant fluidly 100% by voice. Just talking to 'little jumpshot' and him creating and posting all of my rants and reading all the other rants developers post.
For a few days experimenting I am satisfied with the progress but there is a long way to go.
Hopefully, in the end, this may help some people. Any ideas are very welcome.4 -
“An engineer?!… An open, shining mind, easy and inoffensive humour, this wide reach, they’re switching from one engineering realm to another, and really, from tech problems to society, then — to art. Those manners, that fine taste, good speech, coherent and free of filler words. One engineer is also a musician, another one — an artist, but all of them have those smart eyes…”
INCREASE SALES
this text is not for managers like you.6 -
I wonder if they have speech to text for code.
Var cars equals left bracket quote Saab quote comma quote Volvo quote comma quote BMW quote right bracket semi-colon4 -
Quite amazingly, yes!
as a matter of fact one of my parents is actually also in information technology or related field so there are very much aware of how in demand the job is and how difficult it is as well and the best part is a lot of my engineering friends are also switching to computer science and just because it is the better choice of because of how over saturated the engineering field is so yeah i think i have a better career choice than most of my peers
(PS: I used Speech to text here so forgive the grammar errors)1 -
Suggestions for a good speech to text program for someone who mumbles and talks too fast?
I sliced my thumb while washing a knife and typing on my computer without it is getting annoying3 -
Sweet, my motivation for coding my personal projects has started to come back.
Last night I setup my Personal Assistant project with Text to speech and Voice recognition.
Now I just have to get it to react to commands.6 -
So I decided to run mozilla deep speech against some of my local language dataset using transfer learning from existing english model.
I adjusted alphabet and begin the learning.
I have pc with gtx1080 laying around so I utilized that but I recommend to use at least newest rtx 3080 to not waste time ( you can read about how much time it took below ).
Waited for 3 days and error goes to about ~30 so I switched the dataset and error went to about ~1 after a week.
Yeah I waited whole got damn week cause I don’t use this computer daily.
So I picked some audio from youtube to translate speech to text and it works a little. It’s not a masterpiece and I didn’t tested it extensively also didn’t fine tuned it but it works as I expected. It recognizes some words perfectly, other recognize partially, other don’t recognize.
I stopped test at this point as I don’t have any business use or plans for this but probably I’m one of the couple of companies / people right now who have my native language speech to text machine learning model.
I was doing transfer learning for the first time, also first time training from audio and waiting for results for such long time. I can say I’m now convinced that ML is something big.
To sum up, probably with right amount of money and time - about 1-3 months you can make decent speech to text software at home that will work good with your accent and native language. -
Eavesdropping by phone's microphone and speech recognition to serve targetted ads by Google? Anyone here had a feeling this happened to them or knows is this already a thing?
Happened to me on my Android phone multiple times over last year on different subjects, that I was talking live with a person, for example how someone had eyelid surgery (my phone was locked in my pocket the whole time and I didn't google search what that is, or made any text input into device whatsoever) and couple minutes later an ad came on my phone for exactly something we were discussing before. Weird coincedence or something more? 🤔9 -
For those of you scared of the ZOMG imminent threat of AI.....
In Spanish, in particular to the way it is spoken in Mexico, we know curly hair to be called "chino" or "chinos" in certain places. This is funny because Chino is actually what we call Chinese people.
So. The other day I mentioned in a friend of mine's post the text "pinches chinos" in regards to the pain of having curly hair(which I also have) during windy days.
FB being the retarded piece of shit that it is took it as hate speech, pinches chinos can be roughly translated to "fucking curly hair" in this regard, but because FB is retarded as all fuck it took it as me spewing some hate speech again'st their Chinese overlords.
I normally wouldn't give a fuck, if it weren't because one of my friends is celebrating their birthdays today and I can't post shit on his wall due to me being on facebook jail.
I have known this dude since I was 6, currently 29, but no, FB decided that I was some racist prick somehow and because of that I can't go ahead and post something to him. Its fine, I was still capable of calling him and celebrating with my boy, but still.
An AI will not be able to detect the difference between a fucking cat and a lion, it is shitty technology, it is interesting because of the math behind it, but seriously, not something to be scared about, skynet is far from coming into existence.
Fuck FB and fuck people scared about AI and deep learning12 -
So I'm pretty sure that when I went all in to Apple's Siri shitfest and turned on "Hey, Siri" that I had to accept some kind of privacy notice or some shit. Essentially, being an iPhone user and turning on that service, I've agreed to allow them to listen to me.
But what about the guy who sits next to me? Are his rights to privacy being infringed because my phone can also hear him? If so, who is the one infringing? Me or Apple?
Apple is just an example, but imagine if someone has an Echo in an office or a Google Home. Or what if you're unknowingly standing next to someone with a Google or Apple device that's always listening?
I know my old Android phone had picked up people at the grocery store before. I never turned on "Ok Google" but I used the speech to text of the keyboard a few times. When people showed how you could go see what Google had "heard", I was surprised to find how many OTHER people it picked up.
Anyway, just some thinking. -
Been thinking about if I want voice acting in my games. Potentially it would be really expensive. So I wondered if there was AI libraries for training your own voices. There are a few and some even are trying to do it realtime. Or at least fast enough for web site use.
I find this library:
https://github.com/coqui-ai/TTS
Looks interesting, but I need training data. I found that you can license a limited number of words from voice actors. The licensing doesn't seem to care if how you use the data as long as it stays inside your production (video, game, etc). There are even some free ones out there. I think it might be kinda fun to learn how to do this.
Yes, there are a bunch of AI websites, but the voices almost always seem conversational. Not voices in a game setting. Most of those are stupid subscription bullshit. I also looked at text to speech and most of those are subscription. I really really hate the SaaS business model. I avoid companies that use it as much as I can.9 -
Damn happy to see this much traffic in my repo...
Title: Audio book generator
GitHub link:
https://github.com/globefire/...
Demonstration:
https://youtu.be/xhMvGg1dAsg
Star if you like it.. :)rant speech to text audio books? text to speech innovative github audio books github audio project ebooks github star nailedit -
IPhone speech to text has come a long way. Definitely has improved. Real-time dictation rather than batching it.
I am currently doing approximately 50 percent of my rants by voice. In fact the rank you are reading I did by voice.
You can easily do punctuation such as a period, new paragraph, new line, caps and lower case. The speech recognition is excellent even with my New York accent and it learns the more you use it. Rarely does it get a word wrong.
Editing still has to be done manually and is a pain but that may change as dragon already allows you to do in-line editing. iOS speech to text has already surpassed dragon in some facets.
I do have to press the add new and post buttons at this Time to post my rants. But that may change as the enhanced dictation on the map allows you access to specific commands.
I will keep you informed of progress and I will be testing on android over the next few days as well.4 -
Ability to understand all machine learning models to modify code and those models directly and create better ones every time.
I would take existing ml model, modify it by hand to create better one, win some multimillion dollars competitions and make them open source.
Eventually all recommendation systems, text to speech, speech to text, music generation, movies generation etc would be opensource.
This would either destroy or boost all modern economy but for sure it would make harm to corporations and make them cry.
That would be fun to see.6 -
Next personal fail ...
previous rant
https://devrant.com/rants/2060249/...
Turned out that wavenet is sequential so it needs previous step to predict next.
Quite obvious when you look at how people speak sentences, they hardly stop in the middle of the word.
🤔
need to think how to proceed next, how to cut sentences.
Watched deepvoice3 and some accent models from baidu.
I can generate 8 sentences at a time, each takes 8 minutes so if I cut between words and got last mels between words right I can get 1 minute but I need to store model somewhere.
I forgot my machine learning and speech synthesis skills from previous life, time to load more skills ... -
I am building a web application which is multimedia centric (mostly video chat). Text to Speech and vice versa.
I have chosen Node with mongoDb as backend API with React Frontend.
What stack would you suggest for such an application?8 -
Some professors at my university just come to the class and read out the pdf/slides.
Now I know how came the idea of Audio Books and Text-to-speech PDF readers !!!4 -
Several years ago I spent over two months working out how to integrate Text To Speech and Speech To Text (TTS/STT) into any windows program I wrote in Delphi, originally for a powerful flat-file search engine. Does anyone know if TTS/STT is useful on windows 10+ or have any use?
I was thinking about redeveloping the search engine into a stand alone program which can be used as a fast and light query tool with trigger functions, it can be made into a "reply bot" or used with a server like Apache, but without the old IBM mainframe mentality being readopted as "AI" and "social media" everywhere today. low-level Independent and secure droid like systems sound more fun to develop. -
Coding a voice controlled IoT project is all fun and games in research until you notice no frameworks support your native language...2
-
HTML Writers Guidelines
When designing your web site you want to make the visiting experience as enjoyable as possible and at the same time make it so that if the site needs to be changed in any way, the changes are not too difficult to make. You want the look to be as appealing as possible for all browsers and also make the site accessible to users with disabilities. In order to accomplish all this there are some general guidelines when creating your HTML code.
1. The first thing that will really make your life easier is through the use of Cascading Style Sheets (CSS) - CSS is used to maintain the look of the document such as the fonts, margins and color. HTML directly on the page is not a good choice to handle these aspects because if say, the font color you are using for certain paragraphs needs to be changed from blue to red, you would have to go in and change each color tag manually. By using CSS you can designate the color for each of those paragraphs just once in the CSS file. That way if you have to change the font color from blue to red you make one change instead of the countless number of changes you might have to make, especially if your web site contains hundreds of pages. This is a big time saver and a must for all professionally designed web sites.
2. Don't use the FONT tag directly in your HTML code - This becomes a problem when using some cheap authoring tools that try to mimic what a web page should look like by using excessive FONT tags and nbsp characters. These tools end up creating web pages that are impossible to keep maintained. There is a program you can use, if you've created one of these disaster pages, called the HTML Tidy Program which you can actually download here . This will clean up your code as well as possible.
3. You want your web pages readable to people who have disabilities - People who surf the Internet depend on speech synthesizers or Braille readers to interpret the text on the page. If your HTML markup is sloppy or isn't contained in CSS the software these people use to read pages have a difficult time in interpreting these pages. You should also include descriptions for each image on your page. Also, don't use server side image maps. If you are using tables you should include a summary of the table's structure and also associate table data with the correct headers. This gives non visual browsers a chance to follow the page as they go from one cell to another. And finally, for forms, make sure you include labels for form fields.
By following just these three guidelines you give your visitors, especially disabled visitors the best chance of having an enjoyable visit to your site while at the same time making it so that if you have to make changes to your site, those changes can be made easily and quickly.2 -
Developed this project "Audio Book Generator"
Implementing speech synthesis(📖 to 🗣) on eBooks
Bored with writing notes in a lecture? How about we convert the notes dictated by the lecturer into text? Use the speechtotext.py script to get the text format of spoken notes, which saves the text in a .txt file.
Too lazy to read a novel? Get an Ebook version of the novel and run the finalAudioBookGenerator.py script. It will generate an mp3(audio) format of the book. Enjoy book listening :)
You can also convert your single images using the singleImageReader.py script.
Demonstration:
https://youtu.be/xhMvGg1dAsg
Project:
https://github.com/globefire/...
Star If you liked it. :)rant project python github audio books speech synthesis youtube text to speech speech to text tesseract3 -
Anyone tried converting speech waveforms to some type of image and then using those as training data for a stable diffusion model?
Hypothetically it should generate "ultrarealistic" waveforms for phonemes, for any given style of voice. The training labels are naturally the words or phonemes themselves, in text format (well, embedding vectors fwiw)
After that it's a matter of testing text-to-image, which should generate the relevant phonemes as images of waveforms (or your given visual representation, however you choose to pack it)
I would have tried this myself but I only have 3gb vram.
Even rudimentary voice generation that produces recognizable words from text input, would be interesting to see implemented and maybe a first for SD.
In other news:
Implementing SQL for an identity explorer. Basically the system generates sets of values for given known identities, and stores the formulas as strings, along with the values.
For any given value test set we can then cross reference to look up equivalent identities. And then we can test if these same identities hold for other test sets of actual variable values. If not, the identity string cam be removed, or gophered elsewhere in the database for further exploration and experimentation.
I'm hoping by doing this, I can somewhat automate the process of finding identities, instead of relying on logs and using the OS built-in text search for test value (which I can then look up in the files that show up, and cross reference the logged equations that produced those values), which I use to find new identities.
I was even considering processing the logs of equations and identities as some form of training data perhaps for a ML system that generates plausible new identities but that's a little outside my reach I think.
Finally, now that I know the new modular function converts semiprimes into numbers with larger factor trees, I'm thinking of writing a visual browser that maps the connections from factor tree to factor tree, making them expandable and collapsible, andallowong adjusting the formula and regenerating trees on the fly.7 -
Good day all
This is a Text Detector app I created using Google API and firebase MLKit
https://play.google.com/store/apps/...
Text to speech
Translate up to 60 languages
Download and give a review13 -
Do you want to use text to speech huh? Ha ha ha here’s a low battery pop up to completely derail what you were saying and make you repeat the last 10 seconds.
an actual good design would have waited until the text to speech function was complete to pop up the message or at least don’t stop recording what’s being said. But I guess I don’t understand innovation.
Think different indeed12 -
Chinese remainder theorem
So the idea is that a partial or zero knowledge proof is used for not just encryption but also for a sort of distributed ledger or proof-of-membership, in addition to being used to add new members where additional layers of distributive proofs are at it, so that rollbacks can be performed on a network to remove members or revoke content.
Data is NOT automatically distributed throughout a network, rather sharing is the equivalent of replicating and syncing data to your instance.
Therefore if you don't like something on a network or think it's a liability (hate speech for the left, violent content for the right for example), the degree to which it is not shared is the degree to which it is censored.
By automatically not showing images posted by people you're subscribed to or following, infiltrators or state level actors who post things like calls to terrorism or csam to open platforms in order to justify shutting down platforms they don't control, are cut off at the knees. Their may also be a case for tools built on AI that automatically determine if something like a thumbnail should be censored or give the user an NSFW warning before clicking a link that may appear innocuous but is actually malicious.
Server nodes may be virtual in that they are merely a graph of people connected in a group by each person in the group having a piece of a shared key.
Because Chinese remainder theorem only requires a subset of all the info in the original key it also Acts as a voting mechanism to decide whether a piece of content is allowed to be synced to an entire group or remain permanently.
Data that hasn't been verified yet may go into a case for a given cluster of users who are mutually subscribed or following in a small world graph, but at the same time it doesn't get shared out of that subgraph in may expire if enough users don't hit a like button or a retain button or a share or "verify" button.
The algorithm here then is no algorithm at all but merely the natural association process between people and their likes and dislikes directly affecting the outcome of what they see via that process of association to begin with.
We can even go so far as to dog food content that's already been synced to a graph into evolutions of the existing key such that the retention of new generations of key, dependent on the previous key, also act as a store of the data that's been synced to the members of the node.
Therefore remember that continually post content that doesn't get verified slowly falls out of the node such that eventually their content becomes merely temporary in the cases or index of the node members, driving index and node subgraph membership in an organic and natural process based purely on affiliation and identification.
Here I've sort of butchered the idea of the Chinese remainder theorem in shoehorned it into the idea of zero knowledge proofs but you can see where I'm going with this if you squint at the idea mentally and look at it at just the right angle.
The big idea was to remove the influence of centralized algorithms to begin with, and implement mechanisms such that third-party organizations that exist to discredit or shut down small platforms are hindered by the design of the platform itself.
I think if you look over the ideas here you'll see that's what the general design thrust achieves or could achieve if implemented into a platform.
The addition of indexes in a node or "server" or "room" (being a set of users mutually subscribed to a particular tag or topic or each other), where the index is an index of text audio videos and other media including user posts that are available on the given node, in the index being titled but blind links (no pictures/media, or media verified as safe through an automatic tool) would also be useful.12 -
Anybody know about a good open source speech to text engine?
I googled but there are tons of them and I don't have much time right now to try each them of out
What I actually want is just to convert the audio (in English) to text and would also want to note the time those sentences were spoke in the audio like a subtitle file.7 -
For my final project of first year at middle school (that's before university), I had to make a experiment and measured it using a circuit connected to the computer. At the end I couldn't finish but I made a program for explain what the circuit (expected) did using one of the Microsoft Office's assistant (Merlin the wizard), Merlin moved around the screen talking about the experiment and what the circuit measured it over and over, almost forgotten to tell I had to show it in a science festival to anybody who came at school, none asked about the experiment or the circuit, all the questions was about how I made the program, how the program could speech in spanish and explain the experiment.
At the begining of that day I was so nervous, but at the end I could say fuck yeah.
And the program was a macro in Basic with text to speech of a Loquendo like voice, I only record the movements and put the text.
That's one of the reason of I like programming, it save it my ass.
That was more than ten years ago, I didn't have a computer only at the school, internet not was so common.4 -
I am really psyched about the tech to create voices for generated speech. I am really excited when in the future this tech might be small enough to deliver with a game or OS. Then much more interactive games can be built with generated text. It would be so cool to license voices for this kind of work.
It will probably end up with artists creating unique and interesting voices to allow game developers to pick and choose. So voice artists will be a thing as well as graphics artists. The tricky part will finding a way to add mood states to the generated voices. Right now this could be done with different voice profiles for different speech.
Right now the tech is "large", but this will rapidly become smaller and efficient as it gets developed more.1 -
So happy!
I made my first project (or at least started) using my iPad (with some help from my laptop).
I am trying to make it possible for web comic artists to upload their comics without any text in the speech bubbles and then load the text using javascript for the specific locale.
It’s in an early stage (a few hour old) and the editor and the viewer share data only with cookies and local storage instead of a server but it's still a concept.
What do you think?
Github: https://github.com/konstantintuev/...2 -
Inspired by an overheard conversation (partial) among some of my co-workers:
I'm going to make an app that takes a speech sample, either text, or audio file, and accurately gauges the speakers' ages based on the number of times per minute the word "restaurant" is used.1 -
A medical equipment that you can attach to employees and excruciatingly kill them as soon as they say things like (please note that the list is not limited and we should use a speech to text API to provide NLP states for the meaning - I want to catch all false negatives!! Kill them all!!!!):
- It works on my machine
- I tested it before!
- Haskell is a terrible language
- Big data and actionable insights
- why do you need unit tests here?
- I am a recruiter
- Anything that comes with the following construction as well: "I don't have anything against X, but..."
Any other suggestions of phrases?2 -
I feel really lost in neural network theory.
the mnist sample made sense, but now I'm looking at Gans and CNN's.. and now all of a sudden I'm lost.
True also are the examples I'm finding of something I know I was able to get to work when more at peace once upon a time called wavenet for text to speech.
I used the Onyx model however which was very easy to implement, but I quickly get lost looking at the tensorflow and pytorch code, even though it is very short I feel intimidated.
The ssd mobilenet documentation also is pretty straightforward, but when I look for wavenet information about providing input in what format and interpreting output I'm having some trouble.
Its frustrating.
I'm tense, I'm poorly rested, I'm sick of having to redo crap and I'm surrounded by people who make me hypervigilant, skin crawly and tense.
How to overcome these things when I'm not at peace at all ?
I don't know. Pushing through it isn't compatable with the mindset I've been forced into.5 -
Part 3
https://devrant.com/rants/9881158/...
I dropped subtitles and started extracting audio from movie, after that I use whisper to convert speech to text.
I parse srt from whisper, adjust timestamps to get >= arbitrary amount of voice seconds. I put text to vector database with timestamps and movie file name.
I query database by ex. “I don’t know” and extract first n results, after that I walk trough movies and extract parts with found text.
I normalize and merge parts into one movie.
Results are satisfying so now I decided to try to find a common dialogue that I can watch by combining multiple persons speaking from multiple movies.
Might also try to extract person from one movie and put it to other movie.2 -
Looking for speech-to-text library python for a home automation project. SpeechRecognition doesn't really work out for me and Google won't give me any other good alternatives. Thanks!3
-
Are you out of your free medium articles?😢 My Scrapy is here for the rescue.💸
This is simple application of web scraping, it scrapes the articles of medium and allows you to read or hear the article. If you use this on computer there will be a number of accents in the option.
The audio feature is provided only to the premium medium users, so here comes My Scrapy to save your 5$/month. 💸
.
Tech Stack used :
Python, beautiful soup, Django, speech synthesis
.
PS: This application was built for educational purpose and the source code for this application is not open sourced anywhere.
Fun Fact : You can still read any medium articles if they ask you to upgrade, you must be wondering how? Well, copy the link of the article and browse it in incognito mode on any browser.😂🤣
Try the app and lemme know if you liked it:
https://mymediumscraper.herokuapp.com/...4 -
Android Text-to-speech output: "English (Germany) is not supported" (my default language) ... but no alternative language option offered.1
-
I am not completely sure of the method, but it sounds like they ran a song through Google's speech to text? The result is epic:
https://youtu.be/ur560pZKRfg
Is the speech to text a neural network?1