Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "fuzzy search"
-
SQL gives me a hard on right now.
Two tables, 954 rows and 9414, connected via foreign keys and shit.
SELECT a_id,name,shit FROM table1 a JOIN table2 b ON a.id = b.a_id WHERE ((lower(name) LIKE '%lorem%') OR MATCH(name) AGAINST('lorem' WITH QUERY EXPANSION) OR name SOUNDS LIKE 'lorem')
and you got fuzzy search with resolved keys to another table in 0.047 sec, fuck me dude.5 -
Person A: "Add a search box, one where you can type anything in. You know, a hairy search or something"
Person B: "You mean a fuzzy search..." -
When I wrote my first algorithm that learns...
So in order to on board our customers onto our software we have to link the product on their data base to the products on ours. This seems easy enough but when you actually start looking at their data you find it's a fuck up of duplication's, bad naming conventions and only 10% or so have distinct identifiers like a suppler code,model no or barcode. After a week or 2 they find they can't do it and ask for our help and we take over. On average it took 2 of our staff 1-2 weeks to complete the task manually searching one record of theirs against our db at a time. This was a big problem since we only had enough resources to on board 2-4 customers a month meaning slow growth.
I realized when looking at different customers databases that although the data was badly captured - it was consistently badly captured similar to how crap file names will usually contain the letters 'asd' because its typed with the left hand.
I then wrote an algorithm that fuzzy matched against our data and the past matches of other customers data creating a ranking algorithm similar to google page search. After auto matching the majority of results the top 10 ranked search results for each product on their db is shown to a human 1 at a time and they either click the the correct result or select "no match" and repeat until it is done at which point the algo will include the captured data in ranking future results.
It now takes a single staff member 1-2 hours to fully on board a customer with 10-15k products and will continue to get faster and adapt to changes in language and naming conventions. Making it learn wasn't really my intention at the time and more a side effect of what I was trying to achieve. Completely blew my mind. -
This is probably a standard pattern/algorithm, but I feel pretty good about myself figuring this out.
I was doing a programming challenge and found myself with 2 lists of integer points (x,y). I needed to see where the points converged and identify those locations. Of course I started with a brute force approach and did nested loops to find these locations. This was taking WAY TOO LONG. These lists were 200K each. So checking with naive looping is 200K * 200K operations. Which is a lot.
Then I thought, well I am checking equality, so I will create a third map. The index to the map will be the point, and the data will be an integer. I then go through each list once incrementing the integer for each point that exists in each respective list. Any point with a value greater than 1 is a point convergence.
Like I said, this has got to be a standard thing, so can someone tell me what algorithm this is? I am not sure how to search for this.
I am fuzzy on complexity notation but I think the complexity started at n^2 and was reduced to n. Each list is cycled over once.4 -
@dfox Tag searches are too fuzzy. Someone tagged "cake" and I was curious how many other rants contained cake, but the search just had a bunch of results for "make," "cave," "wake," etc.1
-
!!rant
Elasticsearch! First time touching it and need to find out on my own how to build an index that allows a weighted multi field fuzzy search on four fields where two needs to be full ngram, one ngram on the words and one standard search + not index any other field. The documentation is horrible! Just realizing that this is what I need took me 2 days!2