Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "lucene"
-
It works.
How I hate that sentence.
Whenever that sentence pops up, I wanna take a frying pan, make some bacon, eat the bacon and slam the still hot pan with grease through someone's face till the skull breaks.
Why has he so many anger issues, one might ask.
Usually the sentence "It works" means that after looking at "working thing" it works wrong in 95 % of all cases, but hey - for 5 % it at least does *something* right. Not everything, don't get ya hope up.
We had this fun topic happening again today and I'm still too angry to sleep.
Lucene analysis of texts in Elasticsearch.
Stopword list? Multiple word n-grams per line, duplicates, not lower cased, not properly encoded.
Tokenizers? Duh. Why should one put them in proper order.... Or more realistic: There is an order in tokenizers necessary *devs with shocked faces*.
Language specific details... UHM. Wait. Languages are different? There are edge cases in languages? *more shocked faces*.
Even more shocking that if an text processing pipeline is implemented horribly wrong, it delivers wrong results. *mind blown*.
But our unit tests (this goes out to @kiki) were working.
Yeah. You dumb nuggets who even an amoeba would be ashamed of, when you only do positive tests in unit tests with the most obvious working examples, then your unit tests are just useless waste of nibbles.
Some of the devs are really a fucking waste of genetic information, should have probably ended better in a sock.
If this sounds too harsh, they had 2 weeks.
In just 3 hours I found out that they can redo that with supervision.
-.-
I'm getting too old for that shit. Seriously.4 -
Another case of "devs too stupid to poop" TM.
We had a funny discussion today.
Topic came up that a project using Lucene was incredibly slow.
Then came the yadda yadda of Java bad, Java sucks, Java bla Java blub in the gossip mill.
Both things irritated me, last thing was just the usual "I want to use new stuff cause I wanna be a cool jackass" trouble.
So. Today meeting. We did quick analysis by pair programming.
If I tell you that a whole team managed to review an PR, give it green light...
Despite the PR using the thread safe Lucene IndexWriter in a non-parallel fashion for large bulk inserts?
The whole problem screamed parallelization.
Yeah. If you ignore that scream and implement it in a sequential fashion, it is slow.
Congrats Jimmy, your retard level is off the charts. -
Follow up to this:
https://devrant.com/rants/6403741/
So we had today a meeting....
To restart the project, as the current state is garbage.
Turns out the whole team has after two weeks of being left alone with it - kinda like the rant says - zero clue how lucene works, what it does, what its for.
In case anyone of you wonders why some managers are micromanaging biatches, there you have it.
The whole meeting had more "oooh"... "ehm".... "eh"... and other fillwords just to cover the shame of not having any clue at all.
I'm really disappointed that a team of up to 5 people really thought they could pull a stunt of "fake it till you make it". Collectively. Really noone had a real clue.
Now to an interesting discussion: How would you devs reprimand them?
:)
Just curious. Firing is out of option, for several reasons, e.g. law.
Serious answers, I would be really curious. :)
I'm feeling sad for the socks metaphoric in the last rant btw.
Even a cum socket deserves more dignity than them imho.6 -
A very long rant.. but I'm looking to share some experiences, maybe a different perspective.. huge changes at the company.
So my company is starting our microservices journey (we have a 359 retail websites at this moment)
First question was: What to build first?
The first thing we had to do was to decide what we wanted to build as our first microservice. We went looking for a microservice that can be used read only, consumers could easily implement without overhauling production software and is isolated from other processes.
We’ve ended up with building a catalog service as our first microservice. That catalog service provides consumers of the microservice information of our catalog and its most essential information about items in the catalog.
By starting with building the catalog service the team could focus on building the microservice without any time pressure. The initial functionalities of the catalog service were being created to replace existing functionality which were working fine.
Because we choose such an isolated functionality we were able to introduce the new catalog service into production step by step. Instead of replacing the search functionality of the webshops using a big-bang approach, we choose A/B split testing to measure our changes and gradually increase the load of the microservice.
Next step: Choosing a datastore
The search engine that was in production when we started this project was making user of Solr. Due to the use of Lucene it was performing very well as a search engine, but from engineering perspective it lacked some functionalities. It came short if you wanted to run it in a cluster environment, configuring it was hard and not user friendly and last but not least, development of Solr seemed to be grinded to a halt.
Elasticsearch started entering the scene as a competitor for Solr and brought interesting features. Still using Lucene, which we were happy with, it was build with clustering in mind and being provided out of the box. Managing Elasticsearch was easy since there are REST APIs for configuration and as a fallback there are YAML configurations available.
We decided to use Elasticsearch since it provides us the strengths and capabilities of Lucene with the added joy of easy configuration, clustering and a lively community driving the project.
Even bigger challenge? Which programming language will we use
The team responsible for developing this first microservice consists out of a group web developers. So when looking for a programming language for the microservice, we went searching for a language close to their hearts and expertise. At that time a typical web developer at least had knowledge of PHP and Javascript.
What we’ve noticed during researching various languages is that almost all actions done by the catalog service will boil down to the following paradigm:
- Execute a HTTP call to fetch some JSON
- Transform JSON to a desired output
- Respond with the transformed JSON
Actions that easily can be done in a parallel and asynchronous manner and mainly consists out of transforming JSON from the source to a desired output. The programming language used for the catalog service should hold strong qualifications for those kind of actions.
Another thing to notice is that some functionalities that will be built using the catalog service will result into a high level of concurrent requests. For example the type-ahead functionality will trigger several requests to the catalog service per usage of a user.
To us, PHP and .NET at that time weren’t sufficient enough to us for building the catalog service based on the requirements we’ve set. Eventually we’ve decided to use Node.js which is better suited for the things we are looking for as described earlier. Node.js provides a non-blocking I/O model and being event driven helps us developing a high performance microservice.
The leap to start programming Node.js is relatively small since it basically is Javascript. A language that is familiar for the developers around that time. While Node.js is displaying some new concepts it is relatively easy for a developer to start using it.
The beauty of microservices and the isolation it provides, is that you can choose the best tool for that particular microservice. Not all microservices will be developed using Node.js and Elasticsearch. All kinds of combinations might arise and this is what makes the microservices architecture so flexible.
Even when Node.js or Elasticsearch turns out to be a bad choice for the catalog service it is relatively easy to switch that choice for magic ‘X’ or component ‘Z’. By focussing on creating a solid API the components that are driving that API don’t matter that much. It should do what you ask of it and when it is lacking you just replace it.
Many more headaches to come later this year ;)3 -
So I have inherited a crappy Symfony 1.4 application and I need to rebuild the lucene indexes. Anyone know if it's safe to do this while users are in the application?2