88
donuts
7y

A sound recognition system

Being deaf is one major factor that's kept me from moving out. I can't hear sounds like doorbells, fire alarms, etc

I want to build a client-server solution that can be trained to recognize specific sounds

Project Type
Project idea
Summary

A sound recognition system<br /> <br /> Being deaf is one major factor that&#039;s kept me from moving out. I can&#039;t hear sounds like doorbells, fire alarms, etc<br /> <br /> I want to build a client-server solution that can be trained to recognize specific sounds

Description
So this is going on Infinite hold... Moved into apartment but use cases are pretty much nonexistent... Also the limited sounds I do want to pick up aren't very loud (at least at a distance) so the mic has a hard time even detecting them... --------- General setup I'm thinking would be: -an input device (cellphone, raspberry pi?) -a PC/server that accepts the sounds and can be trained to detect certain noises as well as configured to send it text alerts -an output (push alert, text message, email) THIS IS CURRENTLY JUST AN IDEA I've tried a few times to get Google to pick it up for their Home device but too no avail so it's now DIY time. (Though no set start date yet) I'm also thinking it could be trained to detect "abnormal noises" and then send these snippets to to my parents. And maybe with photos too. So say I'm sleep walking then fall down (it actually happened once)
Tech Stack
PC, listening device, Java, Tensorflow, React Native (GUI)
Comments
  • 7
    Sounds like a good idea. But to reduce costs for server and stuff I would think about a decentralized network of devices which notify each other. I dont know whether you wanted to use neural nets or sth like this. If you do so you also could teach one device a sound and its meaning/priority and due to the decentralized network only one device has to "know" it. Anyways to reduce network traffic you also could synchronize them onto each device... I would love to help
  • 3
    @hyvte Well first server would be my PC, only cost is electricity.

    I'm thinking the parts will be designed to implement API/interfaces so they can be swapped out/replaced as it gets more advanced.

    The server also hosts the sound recognition algorithms so I don't think decentralized will work. Need a "powerful" machine for that, it's will be like the brains of the system.

    I'm probably gonna be doing an MVP/POC first but yes I don't know any ML so yes help would be great.

    Although first gotta figure out how to get the audio to the server, and some way to splice I think based on the amplitude? i.e. actually detecting a sound vs just background noise/silence
  • 2
    @billgates Maybe making it possible to run client & server on the same device would solve this problem?

    Anyways you could analyse the amplitude on the "client" and only then send the sound to the server. This would save a lot of traffic. Especially outdoor where you have no WiFi connection it would save a lot of Highspeed volume.

    Tag me when you need help.

    I would first develop the algorithm and see whether it really needs a server or if it also could run on a client device...
  • 1
    @hyvte This would be for the home, so all components would be communicating via WiFi. I guess it's sort of IoT 😁

    Not sure what how it would be useful outside. I mean, if you see everyone running away from something, you should prolly run too 😉
  • 0
    @billgates Ah then this issue is solved. But outdoor would make sense too. E.g. cars honking
  • 3
    @hyvte hm... A deaf guy moving a 2 ton object at 50mph.... 🤔🤔🤔

    Don't think their honking will save either of us 😂
  • 1
    @billgates I didnt mean as a driver :D As a pedestrian... I dont know because you said you dont wanna go outside because you cant hear such things
  • 2
    I guess I'd get a nice nickname though:

    Public Enemy #1
  • 1
    @hyvte oh no, I go outside lots as long as there are traffic lights and sidewalks.

    I mean this setup I don't think would be much use outside.
  • 0
    @billgates ah okay sorry. Then I misunderstood you :/ But indoor wouldn't it be easy to modify your door bell and things like this to give an optical Signal? Okay that would cost much more I noticed. Do you want to use your smartphone as input? You carry it maybe always with you and static microphones only optional?
  • 1
    @hyvte $$$$$$$$

    Whole reason I tried to get Google to do it... I even wrote about it.

    https://medium.com/@c41feb25d8dd/...

    They got the parts... Just need to hack it together... but now I gotta do it myself...
  • 1
    @billgates As said before I would like to help you. This were only first thoughts which helped me to understand your idea.
  • 1
    I could help with some Java (ughh Java) code if needed 8)
  • 0
    @dontPanic still thinking about design... Didn't get far today tho... N too tired now... Maybe this weekend...

    Yes guess I should do the server first and just feed it some dummy data... Figure out how to do ML on the audio...
  • 2
    I'd, be interested in lending a hand in design if you're interested in a GUI and I'm not averse to the iot idea either
  • 1
    @optravox thanks, for GUI I'm thinking bootstrap + React. Write once, run anywhere.

    But prolly will be the last part. Need to get the server and processing logic down first.

    Not really sure how I will do it at this point.

    I just need "something that can be trained to recognize specific audio patterns even with background"

    Need to think about that and do some learning and build some MVPs first
  • 1
    @billgates sounds dope I'll go check some things on web after this cool beer ;)
  • 1
    Concerning the trainsmition of the audio.
    I've looked into it a bit and it seems either sending amplitude data or spectograms as images is recommended.

    Also google did an experiment about sound classification. https://aiexperiments.withgoogle.com/...
    It has some similarities with your project.
    The most important words for you should be T-SNE and librosa.

    Short summary of the terms:
    T-SNE is a clustering algorithm for high dimensional data (like audio)
    Librosa a python library by Google, that helps you analizinng and visualizing audio data.

    Would love to help you with this project, but I'm still not confident enough with tensorflow and ML in general.

    Hope this helps 😉
  • 1
    @Awlex thanks! neither am i, need to learn all this stuff.

    Was planning to start looking into it this weekend but life happened.... Things with higher priorities.... :(
  • 1
    @Awlex just checked out the demo, wow this is pretty much what I need.

    But looks like the project code is not as complete as their demo, just drums. So trying to reach out to the guy now see if can get more info on how to capture and convert the sounds
  • 1
    Check out microsoft cognitive api for natural language processing.
  • 0
    @andyk thanks but not quite what I'm looking for. Speech recognition has already been solved.
  • 0
    @Awlex fck so just spent all night re-setting up my Windows Python env... but can't get these libs to work..... Seems like I need to go with Linux but... I don't have the space for the VM.... n not sure if Cygwin or Win 10 Ubuntu shell can handle it... Would have to install Python n Java again too....

    don't have enough space... And I ain't switching my main OS from Windows to Linux...
  • 1
    Awesome idea..
    Have played around with transferring data via, barley human audible, high frequency sound; mostly no other sounds at that range to filter out.

    Without having an exact target to listen for, it would be difficult, at first, to interpret much reliably.

    I'd start with an android device using HTML5's audio api wrapped in a WebView, (as it would be an android webview app, you would have access to android's libraries and could also run it as a background service I'd imagine)

    Here's a link to instructions for pulling audio from the phones microphone if you wanted to try a test run

    https://developer.mozilla.org/en-US...

    The API is compatible with android WebView, 47, I believe. There is also a function in the web audio API that provides frequency data and other stream analysis related information.
  • 1
    Once you get a microphone listening and frequency info displayed onscreen, you can try to get a tolerance range for whatever sounds you want recognized (doorbell would provably be easiest for a demo).

    Have it listen for that sound and when it does, send a notification/toast message/email/SMS/flicker the flash light; whichever best for you.

    Hope that gives you a start a d best of luck on your project!
  • 1
    @billgates Since I've gone through this (for too long) I understand how you feel. It's possible to setup everything in windows.

    For environment:
    Add
    <Path to python> and <Path to python>\Scripts to the systemvariable path.

    For Python: Make sure your Pythonversion is 3.5.x (IMPORTANT!) since Tensorflow only supports that. You can test that with "Python --version"

    For Tensorflow:
    Make sure to get the most out of your hardware by installing tensorflow-gpu. There will be Errors about not being able to install dependencies. You'll have to manually download precompiled libraries for windows. (You can get them here: www.lfd.uci.edu/~gohlke/pythonlibs)

    Choose the versions that match your environment and install them with pip.
    In the install docs tensorflow tells you which stuff you have to install too use your graphic card.

    If you have any other questions feel free to ask
  • 0
    @Awlex oh Tension Flow is Python too? Guess I need to change Java to Python.

    The problem I had actually was trying to get the AI Experiments app working.... It does look like it has most of the parts needed but... can't get it to work properly in Windows....
  • 1
    @billgates Can you specify what you can't work on Windows? Maybe I can help
  • 0
    @mohammed I tried setting up the demo

    https://aiexperiments.withgoogle.com/...

    And the underlying I think.

    https://github.com/kylemcdonald/...

    But that needs some SampleRates library not available on Windows.

    Also seems Anaconda and Jupyter Notebook isn't working but I think it's a PATH issue... Jupyter can't "import utils" but Anaconda in the installer says you it's not recommended to hard code the Python location in PATH
  • 1
    Umh, using smartband/smartwatches integration you could configure different vibration/push notification directly on those. Tactile feeling. How that would work outside your home/work environment thou...
  • 1
    @-eth you mean there's already an app or it could be another output device?

    I guess if it gets to Production that could make me a lot of $$$... hm... But still need a 24/7 listening device... that could freak ppl out just like Google Glass...
  • 1
    I didn't do the homework, i was proposing it as an output device, i feel it might be more real-timish...

    For the Google Glass scare, you could release the listening and signal processing portion of the code as open source, so it would be clear that there is no storage of the audio recorded
  • 2
    Alright so got some audio samples and the Python audio analysis thingy sorta working.... Didn't need to get the demo working or install Linux.,

    Just needed Librosa and actually reading the ReadMe.md....

    Now to build my sound library and see how to classify all these doorbells and other sounds with TensorFlow
  • 1
    @billgates just foznd this, might be interesting for your project :
    https://reddit.com/r/raspberry_pi/...
  • 1
    @-eth this is what I was thinking. These are all separate IoT projects. Ideally, one wants a doorbell or fire alarm to send a signal to a wearable to vibrate a code. While capturing sound, like a doorbell, is a great idea, its eminently more practical to send a signal to a wearable.
  • 0
    Note to self
  • 0
    what is the algorithm approach? Pattern recognition, fast Fourier transformation with harmonics analisys or neural networks?
  • 0
    @localghost no idea, that's what I need to learn... But haven't gotten around to it yet these few weeks. I was thinking TensorFlow does it.
  • 0
    Recognition could become a major problem since you need a enormous amount of data to train sound recognition (unless you use one shot learning) for many different sounds.

    If you need help or advise, just ask.
  • 0
    @coretool but there are only specific sounds needed?

    Like for doorbell just record the doorbell ringing a few times and feed it to the system.

    I guess in ML terms, the main problem is classification, assuming you can extract/clean the sound from a almost continuous audio stream?
  • 0
    @billgates Didn't get that... But that makes everything easier, I guess. ^^
  • 0
    @coretool Basically I'm thinking of something like "OK Google", listening 24/7 and in IFTTT terms:

    -If hears a sound
    -Then classify it based on tagged sound samples/training data
  • 1
    @billgates IMHO keras with tensorflow backend it's easier to use compared to tensorflow itself, it has a great learning curve, you not need to build the layers but there are some preset as Dense (fully connected layer), Convolution2D (also 1D and 3D for convolutional layers), Flatten, Dropout, MaxPooling, these are the common used and you can build an NN in few lines of python, you can save the model and the weights or use tensorboard in just one line of code and finally there is a nice project to load your trained NN in browser with keras-js :)
  • 1
    @billgates

    Okay, that makes sense. So it does only know certain sounds and when hearing other sounds it will just do nothing.

    I would use a neural network to learn the 'features' and couple it with Fourier Analysis. This way you can use a simple, single layered NN and there are even tutorials out there on how to do this.
  • 0
    @coretool does nothing... Or it uploads it to the internet and crowdsources an Answer... Though given how successful Tay had been Im a bit skeptical that will work...
  • 1
    @billgates

    We could train a model against a big audio dataset as a fallback. If yours doesn't know the sound, I passes it back and hopes for a response. If you want, I can provide you with such a model.
  • 0
    @coretool thanks, at the moment i m still kinda thinking on idea space. Not too familiar with ML actually at least not the current kind. Took a basic Data Mining course years ago but nothing keep. None of the Forier transform, algo stuff just how to use DM software like Weka.

    Any books you'd recommend to get started with some practical hands-on knowledge, get up to speed (ie. not academic theoretical textbook BS) and related to what I'd like to do?
  • 1
    @billgates I'll have to look some up so give me a day :)
  • 1
    @billgates look at Machine Learning in Action, great light book that's not a dry read but also teaches a lot
  • 0
    @Axis but aren't all the... In Action long and dry like a reference manual?

    I think I tried Scala In Action before but gave up pretty quick...
  • 2
    @billgates I cannot speak for any of the other in action books but basically this one provided a little math lesson (linear algebra and probability) and every chapter afterwards is a different ml technique that is taught with an example that you are supposed to follow along with. It isn't the most fun book ever but i wouldn't say it is dry
  • 0
    @billgates After looking up some books, I think Machine Learning in Action which @Axis purposed is probably a pretty good one to begin with.

    I'd look into the Medium series "Machine Learning is Fun" by Adam Geitgey too, since it is an introduction to the many topics of ML.
  • 1
    Yo easy on that shiny new word "Tensorflow". You can't jus use that to solve any ML problems. Training a deep neural network is very costly and it's not necessary in every problems. Several lightweight ML frameworks are there.
    https://blogs.nvidia.com/blog/2017/...

    Btw the idea sounds great. Let's collaborate. Python/Node/Android/Firebase - Anything and I am in.
  • 1
    I read something in popular Science about a vest you could wear that turns sound into vibrations on your back that your brain eventually learns to interpret as sound. Maybe that can help you
  • 0
    This screams python, tensorflow, & twillo, on a rpi3.
  • 0
    Python : collect the training samples for the needed sounds .
    Train a classification model with audio features like MFCC .
    Classification algorithms : depends on the data volume .. SVM would work well with less data. could also try random forest and xgboost .
    Put the model on raspberrypi with a running loop recording audio for some given time and then further classifying it as music ,car honks ,people talking ,etc usimg the trained classification model
    Flash the output on screen( classified label.)
  • 0
    1. there already are solutions like this precisely to help deaf people (sorry, can't remember specific names)

    2. from what i've read from the devs, the sound recognition is easy

    3. what is extremely hard is to make/train the system to successfully filter out background noise. has to be trained/tailored for every single location's soundscape specifically, where even each room in a house is considered a separate location (meaning currently it's impossible to have a mobile phone be the microphone, even in ideal conditions)
  • 0
    @Midnigh-shcode if you can find #1, that's be helpful

    #3 is my problem... Bring deaf I can't rely say which find are significantly... Most noise it picked up are just cars I think
  • 1
    @billgates
    1. sorry, can't find the specific thing i came across back then even though it was the only one that looked most advanced and most similar to what you are talking about. it was a tech video explaining how it works, they mentioned the name only in passing, i'll look some more, but for now, maybe useless, but i found things like this: http://lssproducts.com/category/...

    3. visual pairing might help, at least for cars? set up a camera with a directional microphone pointed at the road, to a) be able to kinda learn how the soundwaves of passing cars look, b) be able to cross-reference these recordings with background noise stuff in you main microphone, the one you're trying to clean up background noise from?

    there's a lot of discussion in here already and i haven't read it, so i'm sorry if i'm just repeating stuff that was already discussed
  • 1
    @billgates
    #1 got it! i remembered it was a computerphile video about the topic, i recommend watching it, even though it's on the border of "layman explanations", but there's still things that are interesting&relevant&informative even for us hardcore techies ;)

    https://youtube.com/watch/...

    guy talking in the video is CEO of the company working on this stuff, their page is
    https://audioanalytic.com/people/...

    there's also some more interesting/possibly useful links in the video description. hope this helps at least a bit
  • 0
    @Midnigh-shcode actually this is on hold... Mostly was just an idea but my POC ran into the roadblock you mentioned.

    For existing products here's s my problem... They're like 10yrs and so goddamn expensive. It's like 500$ for a vibrating microphone...
  • 1
    there's also another approach you could take at filtering relevant/irrelevant sounds (to a degree), based largely where they're coming from (outside vs. inside). look at this, sound reconstruction from video (visual microphone):
    http://news.mit.edu/2014/...

    https://youtube.com/watch/...

    the software they use is opensource.

    now imagine: two visual microphones, one pointed at an exterior window, another pointed at "bag of chips" (receptive surface) inside, what's picked up by the window pane but not the "bag of chips" is from outside, what's picked up by both (or just the chips) is inside, first rough filter pass done. but maybe this just adds an unnecessary middleman, as the same can probably be done with normal microphones and I don't know if it adds any advantages...
  • 1
    @billgates yes, i noticed the "on hold" part, but I bet it doesn't stop you from still thinking about it, and... that's what I'm kinda doing and good at - providing some new food for thought (hopefully) and inspiration ;)
Add Comment