Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
hitko31483yLet's say you're trying to autocomplete countries, and user types in "Samoa". Do you want autocomplete to show "American Samoa" as an option or not?
If yes, use standard analyser. If no, use keyword analyser. -
I'm a bit confused about @hitko answer.
When an edgeGram tokenizer is used, tokenizer create tokens of the input string, usually on a min and max value of characters.
The analyzer searches the **resulting** tokens for matches.
So... "American Samoa" is broken down into tokens, between min and max chars as far as I know.
The resulting tokens are analyzed - meaning the analyzer isn't looking at "Samoa"… rather "Sam" "oa" etc (tokens).
At least this is what I would expect.
The question is now what you'll expect regarding search terms - e.g. a single search term vs many search terms. -
hitko31483y@IntrusionCM You've got the order wrong.
When using keyword + edgeGram, the whole string is treated as a single text, and then edgeGram creates tokens like "Am", "Ame", ... , "American Sa", and obviously none of those tokens will match "Samoa".
When using standard + edgeGram, each word of the string is treated as a single text, so the final tokens would be "Am", "Sa", "Ame", "Sam", "Amer", "Samo", ... , and those will match "Samoa". -
hitko31483y@hitko If the order was switched, then edgeGrams would be "Am", "Ame", ... , "American Sam", "American Samoa" in both cases, but standard analyser would then split them into words, giving "Am", "Ame", "Amer", ... , "S", "Sa", "Sam", ... - notice this wouldn't respect edgeGram minLength=2, and edgeGram would need to have significant maxLength for any of this to work.
-
@hitko Interesting.
Didn't have a mongodb database to test, sadly.
But it makes more sense in your way xD
And yes, exactly the second comment / conclusion is what made me doubt my sanity...
Thanks for the longer explanation.
Related Rants
Question for someone who uses Mongo Atlas Search:
If I'm only interested in autocomplete from the start of the text, which is more performant?
1) standard analyzer + edgeGram tokenizer
2) keyword analyzer + edgeGram tokenizer
I don't see why I should index separate words if I don't care about random positions :/
Thank you
question
atlas
mongo
search
development
database