random algo thought

Ranter

donuts

23247

Comments

2

Root

77194

6y

Downcase the data (and input) in your comparator?
0

melezorus34

3191

6y

Well, to be able to talk about these *cases*, they have to be letters. They don't have to be latin. Once you have a lexicon of all alphabets (with the unicode codes) it will be easy to implement.
0

melezorus34

3191

6y

Also the more accurate the casing is the more relevant the result is. Dunno how u gonna calculate that but a for loop could do it easily
0

donuts

23247

6y

@Root yes but is there an algorithm solution? How do DBs and well Google do it or even Word, Chrome, notepad? Even Regex allows case insensitive matching?
0

donuts

23247

6y

@melezorus34 Ok so say I enter all permutations of "aVeryLongString" into a "database" along with some other values.

I will search the same, but it should return all matches in their original casing.
0

melezorus34

3191

6y

@billgates
if you are searching for "aSd" case insensitive the regexp would be
/[Aa][Ss][Dd]/
You need the lexicon bro. Maybe get one with klington.
0

donuts

23247

6y

I'm just wondering how Google, databases, text editors do it. Chrome, you Find any text and it will pretty quickly highlight all matches
0

melezorus34

3191

6y

@billgates well, in js we can just do this:

var regexr= new Regexr(input,"i"),
arrayOfObjectsWithNames=[(...)];

//We need the ones matching so
return arrayOfObjectsWithNames
.filter(object=>regexr.test(object.name))

//Is it too complex? Heck yeah, I can't help it.
0

donuts

23247

6y

@melezorus34 ok but that's my question? The only way I can think of is it runs a finite state machine where the next match is either A or a ... But that doesn't work for Google or databases?

It would take forever given how much data they hold
0

melezorus34

3191

6y

@billgates
You can have outputs with regexr search bubbles.
SomeHTMLText.replace(/(the thin)/gi ,
matched=>`<mark>${matched}</mark>`);
Will match any "the thin" in any text, case insensitive, and wrap them around highlighting tags.
0

melezorus34

3191

6y

@billgates Sir, you are having a conversation with a child now. I can't optimise my thinking speed to code and debug at the same time. I don't know what trick they have at their sleeves. Probs docker swarm/k8s.
0

melezorus34

3191

6y

Also, my second comment states that there could be title/text relevance to sort, e.g. 500, which would be picken after they
SORT BY audit_score
Or something I don't know. Its 3 am.
1

melezorus34

3191

6y

I wish you wisdom through this question, while asking nicely to ping me too when you find a proper solution. Mine is a hypotetical solution. **Do not take an advice from a child on coffee²**
0

nibor

4602

6y

At a guess, when Google builds their indexes of all of the Internet, they do something clever to store the search keys in lower case.
2

SortOfTested

19199

6y

There's graduate programs on that subject.

https://elastic.co/elasticon/conf/...
5

Voxera

10883

6y

@billgates one thing Google does is split the index up over thousands of machines.

The query is sent to each which returns the local matches to an aggregator that gets results from many index servers and that then returns to the searcher.

Thats a simplified solution.

Its quite possible that the index servers store a lower case version for first match and then only do case sensitive on the resulting matches but google does not always explain everything.

They used to have some very good papers on it though but its several years since I took time reading them.
0

vane

10486

6y

database use btree index
big data use map reduce
ngram - key value approach where you store each part of word in key value index ex from at least 2 characters for devrant it will be:
de
dev
devr
devra
devran
devrant

you can combine all above techniques + add some rank and/or machine learning, geo awarness to improve results
0

NoToJavaScript

4506

6y

At database level you can use a language coalition which is not case sensitive.
0

donuts

23247

6y

@NoToJavaScript you mean collation? But I'm not sure how that works.

Like even if you tell it, A=a, B=b, etc. The only way I can think of would be to create a index where the keys are all lowercase and then map them to the original values?
0

NoToJavaScript

4506

6y

@billgates Yep, sorry :)

An example, we have this :

[Name] NVARCHAR (MAX) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,

That works for search : not case sensetive, also ignores frenchs acents.

But I agree it is not "universal". It works for our case (french/english, not case sensetive)

Add Comment

question