Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "elasticsearch"
While reading through the Elasticsearch (Java search engine) source code a while ago I found this gem:
return i == -1? -1: i;
I think someone should stop drinking while coding.
Some other nice lines:
int i = 0;
return j + 1000 * i;
Are these guys high?11
DevOps required skillset:
* Frontend engineering
* Backend services
* Database administrator
* Security consultant
* Project management
* 3rd party contract negotiator
* Build system monitor
* Build system hostage negotiator
* Paging, alerting, monitoring
* Search server admin
* Old search server admin
* Old-old-new search server admin
* Redis, ElasticSearch, MySQL, PostGres, owner
* Agile coach
* No you shouldn't do that coach
* Oh, you did that anyway coach
* DNS: (Optional) It'll replicate when it wants, and how it wants to to anyway
* Multi-Cloud deployment strategist
* Must be able to translate Klingon to YAML, and YAML to MySQL
* Cost analyzer, reducer, and justifier
* Complex documentation generation in markdown that we should have done years ago anyway
* Marketing's email went to spam analyzer
* Wordpress is broke fixer
* Where the fuck does Wordpress run anyway?
* Ability to fix MySql running Wordpress on marketing's dusty laptop8
Contenders for arseholes this week
- Elasticsearch as their implemented product identification and integration in client libraries like Python to exclude OpenSearch made a lot of things very painful. Yay....
- Microsoft decided to integrate kill switches in Exchange. Yeah.... Great stuff.
- Atlassian has another week of dumbness - after they botch release after release, they killed Slack with DNS
- Adoptium still hasn't managed to provide repositories after fucking up it's transition from AdoptOpenJDK
- No, a project with JDK 8 makes no sense anymore, take that shit and burn it. JDK 11 the same, would be great if we had a Repository working for JDK 17 Adoptium....
- unwanking a TLS setup by integrating an intermediary load balancer to deal with several outdated TLS implementation is a kind of thing that's really scary...
(TLS 1.3 in, TLS 1.1 - TLS 1.3 out... Theoretically all solutions have TLS 1.2… most of them non working. Solutions is a wild bunch from different vendors)
- If you buy a fucking new Apple with an Arm Chipset, ram it up so far up your arse it gets dissolved in stomach acid.
It's an arm. There's tons of compatibility problems of course. No you shouldn't listen to what the marketing says. No I cannot shit rainbows and make it work.
- German election. No politics I know, but still.
- New neighbors decided to move in. Friendly person's. Except I wanted to murder them since they choose 22 o clock for moving time.
- I forgot putting the heater on. Ever woken up frozen like fuck and having a hard week... It's a good combo to break any form of motivation.
The company next to me is renovating. Waking up to the feeling of an earth quake because they demolish their old building is another thing that makes me unhappy.
It's Friday. I survived.17
There's so many cool technologies too look into now a days, but so little time.
Where do i download more hours to add to my days?16
<just got out of this meeting>
Mgr: “Can we log the messages coming from the services?”
Me: “Absolutely, but it could be a lot of network traffic and create a lot of noise. I’m not sure if our current logging infrastructure is the right fit for this.”
Senior Dev: “We could use Log4Net. That will take care of the logging.”
Mgr: “Log4Net?…Yea…I’ve heard of it…Great, make it happen.”
Me: “Um…Log4Net is just the client library, I’m talking about the back-end, where the data is logged. For this issue, we want to make sure the data we’re logging is as concise as possible. We don’t want to cause a bottleneck inside the service logging informational messages.”
Mgr: “Oh, no, absolutely not, but I don’t know the right answer, which is why I’ll let you two figure it out.”
Senior Dev: “Log4Net will take care of any threading issues we have with logging. It’ll work.”
Me: “Um..I’m sure…but we need to figure out what we need to log before we decide how we’re logging it.”
Senior Dev: “Yea, but if we log to SQL database, it will scale just fine.”
Mgr: “A SQL database? For logging? That seems excessive.”
Senior Dev: “No, not really. Log4Net takes care of all the details.”
Me: “That’s not going to happen. We’re not going to set up an entire sql database infrastructure to log data.”
Senior Dev: “Yea…probably right. We could use ElasticSearch or even Redis. Those are lightweight.”
Mgr: “Oh..yea…I’ve heard good things about Redis.”
Senior Dev: “Yea, and it runs on Linux and Linux is free.”
Mgr: “I like free, but I’m late for another meeting…you guys figure it out and let me know.”
Me: “So..Linux…um…know anything about administrating Redis on Linux?”
Senior Dev: ”Oh no…not a clue.”
It was all I could do from doing physical harm to another human being.
I really hate people playing buzzword bingo with projects I’m responsible for.
Only good piece is he’s not changing any of the code.3
Why is it that pretty much zero package & framework maintainers understand semantic versioning?
1. If you do a complete rewrite of your package, but the resulting API is identical, you don't need to bump to the next major version. As a user, I'm thankful for your increased performance or cleaner internal code, but it doesn't really affect my update process.
2. If your package required some-framework 6.0.0, and now ALSO supports some-framework 7.0.0 but is still compatible with 6.0.0, you don't need to bump to the next major version. As a user, I can now upgrade the framework, and know that the package will keep working, but otherwise it doesn't really affect me.
3. Following your versioning along with the framework/language version is super annoying, especially if your library really doesn't need to differentiate between framework versions because it's not actually utilizing new framework functionality.
4. On the other hand, if you stop supporting a certain language, framework or shared library version, or change the public methods, exceptions, fields, etc, you MUST bump to a new major version.
Yet everyone gets this wrong.
For example, many of Laravel's underlying subpackages (for collections, filesystem, database, config, http, mail, etc) do not change their code in a breaking way, or do not even change at all between major framework versions.
Yet they follow along with the major framework version.
Now if someone makes a library "laravel-elasticsearch" which uses the support libraries and collections from laravel, they need to update their package to move along with the versions as well, and often they choose to number their library along with the framework in turn.
This means that to update the framework, you also need to update over 9000 dependencies.
FOR NO FUCKING REASON. THE ONLY CHANGE IN THOSE FUCKING DEPENDENCIES IS TO UPDATE COMPOSER.JSON TO BE COMPATIBLE WITH THE FUCKING FRAMEWORK.
Meanwhile, Laravel itself breaks repeatedly on minor/patch version updates, because breaking changes slip through their review process.
This day I have received the most glorious news in e-pistolary form. For some years, I was suffering in support of a client who was, well, insufferable. My presence there paralleled the divine comedy in both essence and fact.
I opened the missive, expecting another plea to bail them out of whatever clusterfuck they found themselves in. Instead, what I found was something truly magical.
I hope this finds you well. I'm not sure if you remember a few years back, we were trying to decide between IBM Cloud and AWS. Well, after years of battling FF*, we're finally moving ahead with AWS. He failed one too many times to deliver anything visibly. After you left, there was no one left he could use to steal credit, ideas, and work.
FF is still pushing to have them use IBM cloud as a "warm backup" in the event "AWS fails." We will see where that goes.
I figured you'd like to know; you were the void in the wilderness for a long time. I don't want to think about how much time we could have saved if we had just listened.
This event represents a personal victory, albeit belated, over a few peoples' absurd amount of privilege. Towards the end, I was vicious about my contestation to the insanity of adopting a desperate hedge attempt-as-cloud offering from a failing company. Some examples:
// cloud 'strategy meeting'
Moi: What cloud platform are we looking at using?
FF: We're looking at IBM cloud and AWS as a second.
Moi: Why is that? I understand you're obligated to rep your offering first, but that decision doesn't seem to have the customer's best interest at heart.
FF: IBM cloud is a market leader; AWS isn't as good.
Moi: I see. I mean, that's the tech equivalent of the company's fleet management considering monkeys on tricycles as a strong competitor to service trucks, but I get what you mean.
// steering meeting
Director: Who can we look to as an example? Who is currently using the IBM cloud?
Moi: No one; they account for a single-digit portion of the actual cloud market. Their long game to sell you a "Hybrid Cloud," which means put some front end payload in a CDN, and buy n-frame units of IBM z servers for the DC with IBM gateway appliances acting as connective tissue. So it's not the cloud at all, really.
Director: How does it compare in cost?
Moi: It's generally 40% more expensive than other clouds, and it only goes higher as you option their software.
Director: What about Watson? I hear Watson is good?
Moi: It's a brand name. Most of the "Watson" product is just a facade on top of FOSS products like Spark, Hadoop, Elasticsearch, etc.
Director: Those were words. They sounded good. FF say it's good tho so we'll believe him because we're from the same city.
Moi: *deletes Director from LinkedIn*
Moral of the story: Never trust a vendor that only recommends their products.
*FF = FatFuck - an embarrassingly rotund individual whose girth is roughly equivalent to his height. He shit his way into an IBM architect position in his mid-20s purely due to winning the visa lottery. He had fake hair glued to his head for his wedding to hide his male pattern baldness; his arrange-married wife undoubtedly cries herself to sleep after sex.
**PeeEm - the then project manager, now portfolio manager of some satellite projects. An overall decent human being, capable.10
Manager: let's use elasticsearch for performing relational queries. PostgreSQL performance is not great.
Me: Say what? 👿4
Elasticsearch queries are FUCKING ugly
Elasticsearch documentation is FUCKING bare-minimum.
Kibana only shows data when it FUCKING feels like it.
Elastic stack is FUCKING annoying me12
Has anyone installed Elasticsearch on Linux - centos to be specific.
Trying to workout why the fucker won't install. Setting up a proof of concept so don't want to use it currently as SaaS.
From why I can tell, it only needs Java, (check) and to be ran as a user other then root (check) but running ./bin/Elasticsearch hangs after a while and starts powering up 100 odd threads with no progress.6
Is it just my random madness...
Or do you sometimes picture yourself in a fictional comic / movie / whateva...
Had this feeling today.
Burned a database down, grilled 2 terabyte of data, deleted ~ 500 elasticsearch indices.
Then I chopped an haproxy loadbalancer into 6 seperate machines, because noone likes to read ~ 2.5 to 3 k of lines.
And I guess now I'm doing some backups of elasticsearch before the second round of flamethrower madness starts.
It's somehow very satisfying to just destroy everything.3
I found weird that some developer never ask why when facing a problem. "What do you mean never ask why?" here some story.
Let's say a developer work with simple app. Laravel as Backend and Postgresql as Database. He face a problem that the app very slow when searching data.
In order to solve that problem he implement cache using redis but he found problem that it fast occasionally. In order to solve that problem he implement elasticsearch because he think elasticsearch very good for search but he found another problem that sometimes data on postgresql out of sync with data on elasticsearch. In order to solve that problem he implement cronjobs to fix out of sync data but he found another problem that cronjobs cannot fix out of sync data in real time. and so on...
Do you see the problem? He never ask why the app slow. Which part search the data? Backend or Database (Search in the Backend mostly slower than Database because Backend have to get all data on database first). Has the query been optimized? (limit offset, indexing). How about the internet connection? etc.
For me it's important to ask why when facing a problem and try to solve the problem as simple as possible.2
Around 2 years ago, I had first discovered DevRant.
I was an intern in a startup then, and I was working on ElasticSearch. I remember making rants about it. The internship ended. So did my relationship with ElasticSearch.
This week, a new intern joined our organisation (a different organisation). He was assigned the task of deploying ElasticSearch, with me as his mentor. All was going good, we migrated data from MongoDB to ElasticSearch and all.
Back then, I used to curse the team lead (leading a team of interns mostly), for not helping me properly...
I wanted a publicly accessible dashboard, since we can't really see the Kibana dashboard with SSH :P... So, we implemented user authentication using X-Pack security. And here we are, stuck... Again... I'm unable to help the intern. The World has come to a full circle.
PS: I have to just guide him while doing my own User Stories.
My boss did not care about making things secure in our early development stage, even though I told him several times.
After 1 day our elastic search cluster was filled with random crappy data.
Fix: Apply security schemes provided by AWS1
Note to self: Pointing your tests at a non disposable DB will cause very very bad things to happen. No idea what the flying fuck I was thinking - but praise to the data gods it wasn't a production elastic!
Today marks the day that i finally get to do stuff on a production server.
Its just installing the elasticsearch cluster. But i still feel honored by the trust im given even tho im still an apprentice.7
Job frustrated me again today.
The shit just keeps on commiting suicide...
Cannot talk much about it, but essentially it's faulty software killing randomly one or one up to N servers running elasticsearch...
Conversation between me and a good friend:
Me: No gaming today, work todo.
Me: Yes...... Could u go buy some groceries? Household help is sick.
Him: maybe...what u need?
Me: coffee. I need frigging fucking coffee.
Him: ok. How bad is it.......
Me: empty today.
Him: will be at your house in hour. DON'T DO ANYTHING STUPID.
It's funny how good friends immediately sense danger and become very attentive when the lack of coffee and myself is mentioned in one sentence.6
kimchy ≒ kimchi (????)
Kimchi : A traditional fermented Korean dish made of vegetables with varied seasonings.
I wonder what Elasticsearch meant to say. LOL9
Elasticsearch, from the bottom of my heart...
How can one ecosystem be so batshit crazy inconsistent?
Seemingly every agent does the same (e.g. filebeat vs journalbeat vs packetbeat)… yet there are subtle changes in configuration everywhere.
Plus YML. The most shitty markup language one can use and the cockslubbing durps used it fucking everywhere.
Makes fun to have complex stuff and requiring a python Jinja to JSON to YML converter to be able to write the complex stuff without having the fucking migraine to count like a stupid 4 year old whitespace with both hands...
To make it even more absurd: the ingest pipelines which contain a lot of regular expressions / grok and are thus very prone to quoting issues... Yes. Let's do this in YML too.
If you need to add an fucking manual section how to debug YML errors you should have realized what a fucking stupid idea it was, morons.
Now I have the joy of having a python script regex quoting the shit for a Jinja template which then generates JSON which then generates YML.
Why the JSON part?
Yeah... Because ECS and changes in the upstream YML files / GitHub.
To be able to run diffs in a sane way because in YML distinguishing thing is pretty much impossible, so JSON as an intermediary format solely for the purpose of converting upstream YML to JSON to diff it against modified JSON ingest pipelines downstream.
I fucking hate elasticsearch8
Am I the only one who just loves when I do "curl 127.0.0.1:9200/_template" and get back entire 32" screenful of compressed JSON?
# Retrospective as Backend engineer
Once upon a time, I was rejected by a startup who tries to snag me from another company that I was working with.
They are looking for Senior / Supervisor level backend engineer and my profile looks like a fit for them.
So they contacted me, arranged a technical test, system design test, and interview with their lead backend engineer who also happens to be co-founder of the startup.
## The Interview
As usual, they asked me what are my contribution to previous workplace.
I answered them with achievements that I think are the best for each company that I worked with, and how to technologically achieve them.
One of it includes designing and implementing a `CQRS+ES` system in the backend.
With complete capability of what I `brag` as `Time Machine` through replaying event.
## The Rejection
And of course I was rejected by the startup, maybe specifically by the co-founder. As I asked around on the reason of rejection from an insider.
They insisted I am a guy who overengineer thing that are not needed, by doing `CQRS+ES`, and only suitable for RND, non-production stuffs.
Nobody needs that kind of `Time Machine`.
After switching jobs (to another company), becoming fullstack developer, learning about react and redux.
I can reflect back on this past experience and say this:
The same company that says `CQRS+ES` is an over engineering, also uses `React+Redux`.
Never did they realize the concept behind `React+Redux` is very similar to `CQRS+ES`.
- Separation of concern
- CQRS: `Command` is separated from `Query`
- Redux: Side effect / `Action` in `Thunk` separated from the presentation
- Managing State of Application
- ES: Through sequence of `Event` produced by `Command`
- Redux: Through action data produced / dispatched by `Action`
- ES: Through replaying `Event` into the `Applier`
- Redux: Through replay `Action` which trigger dispatch to `Reducer`
The same company that says `CQRS` is an over engineering also uses `ElasticSearch+MySQL`.
Never did they realize they are separating `WRITE` database into `MySQL` as their `Single Source Of Truth`, and `READ` database into `ElasticSearch` is also inline with `CQRS` principle.
## Value as Backend Engineer
It's a sad days as Backend Engineer these days. At least in the country I live in.
Seems like being a backend engineer is often under-appreciated.
Company (or people) seems to think of backend engineer is the guy who ONLY makes `CRUD` API endpoint to database.
- I've heard from Fullstack engineer who comes from React background complains about Backend engineers have it easy by only doing CRUD without having to worry about application.
- The same guy fails when given task in Backend to make a simple round-robin ticketing system.
- I've seen company who only hires Fullstack engineer with strong Frontend experience, fails to have basic understanding of how SQL Transaction and Connection Pool works.
- I've seen company Fullstack engineer relies on ORM to do super complex query instead of writing proper SQL, and prefer to translate SQL into ORM query language.
- I've seen company Fullstack engineer with strong React background brags about Uncle Bob clean code but fail to know on how to do basic dependency injection.
- I've heard company who made webapp criticize my way of handling `session` through http secure cookie. Saying it's a bad practice and better to use local storage. Despite my argument of `secure` in the cookie and ability to control cookie via backend.18
Am I just stupid? It took me 3 whole weeks to finally come to the realisation today that the Elasticsearch "guide" and Elasticsearch "reference" were different, with different version numbers. I've been ignoring Google search results that say Elasticsearch 2.x for WEEKS and wondering why I couldn't find a solution to simple problems.
Turns out, the current Elasticsearch "guide" is on version 2.x while Elasticsearch itself is on version 6.x.
They even have almost identical URLs that go ../guide/../reference and ../guide/../guide.
WHY? Why would you do that? Am I just stupid? Am I still getting it wrong? What the heck is up with Elasticsearch documentation?
So I was setting up ELK (Elasticsearch, Logstash and Kibana) all in one EC2 on AWS today for demo purposes. I had everything prepared. Elastic IP, correct security group rules, etc.
I figured I would just do quick test before writing filters and templates if I can access Kibana. So I started service for it and tried to open it with Chrome.
Checked config file. Compared it to documentation. Seemed good but changed some things just for sake of change. Restarted service.
Reverted changes I've made in config. Restarted service. Curl on localhost. It work... OK. 😐
It took me half an hour but finally I figured it out after I took my phone and opened it from there. It was working from the beginning. Stupid company network was for some reason blocking this connection. Fuck! 😡And I was restarting that poor service like crazy trying to fix something that wasn't broken.
I ran elasticsearch reindexing on production.
My manager asked why there's no item shown on the search page, and I slowly told him, I ran reindex on production.
This is so annoying, I had 9 diff. jobs the past 2 years and this is my 10th and if this doesn't change I might reconsider my options again.
I came to work at a company that pays me like a Junior and treats me as an intern. My 20yo "boss" who acts as a project owner/lead dev doesn't want to learn anything new and sees any improvement as a waste of money. The problem is he thinks hes a great programmer but he doesn't know shit. Im mainly working on the Laravel installation because "I claimed I know Laravel". And its absolute garbage. They haven't used a single Laravel features besides routes and everything else is vanilla PHP. They write for loops that loop through $_REQUEST to remove a single character. Write 100 deep nested ifs and they abuse Elasticsearch to the point ES crashes because the program is using 1000 deep multidimensional arrays. Its only a webshop...
Everytime I try to make a suggestion like making the master branch protected, doing code reviews etc etc I get shut down because they are autistic and don't want anything to change.9
So, since almost a week I was trying to get familiar with Algolia which was to be used as API for the search feature in our App. But now, we are going with Elastic Search.
I just gave a simple API which fetches recent searches by pinging an index on elasticsearch to the UI developer.
She had just one job. And ended up calling the server every time it loads on every screen thereby reaching max limit of calls per second and giving 429.
QA are not even required to break your code. UI developers are more than enough :)
Manager said we need to use Queue. Several meetings after then I looked at prototype by 6 senior devs:
A QueueListener connects to RabbitMQ check for payload then *disconnects*;
A TaskProvider in ASP.Net.MVC.Core(whatever it is) listening http and dependency inject that QueuePoller;
A Visual Cron timer calls that http url every 5 minutes.
Wait for it: a set of database tables to store messages for another MessageProcessor.
It’s a XML to CSV file conversion project consists of 43 unique projects under a solution. I did it within 500 lines of Node with ElasticSearch and told we don’t use fancy new stuffs here.1
So this web company i joined had a page load time in minutes. The free text search (inverted index search, based on elasticsearch) queries would return results in 10-45 seconds (should be milliseconds always). The indexes had no schema. And they would crawl data and feed into mssql db, which had a 2 gb/db limit on the free version. So everytime the db hit the limit, a new db was created and the name was incremented by one.
Had a very tough time cleaning up that mess. Plus the architect who had made this architecture was on his way out and unhelpful to the core.
What was worse was that most of the changes i did were very simple changes that should have been done long back. Basic sanity changes.4
So a few days ago I sat down to write a redis adaptor to transfer data back and forth between redis and elasticsearch. I download the go-redis package and start writing a simple client.
I run the client and it gives me an error. So I'm stuck at it for about 30 mins and then I say to myself, "You dumb fuck you haven't started the redis-server". So I open up another terminal and type in `redis-server` and then I realise I don't even have redis installed on my machine.
I do such dumb things every weekend. If you have any dumb mistakes you made while writing code please share them in the comments. :-)
You can't have distributed free text search and not have elasticsearch in the same sentence.
A lot of analytics companies are running because of the elasticsearch aggregation framework. And search couldn't have been faster on such mass of data.
P.S. i used to be a solr fanboy, then i met elasticsearch. Kimchy knows the best.1
The 'lead' developer is unable to comprehend why sending an empty string when it doesn't exist (instead of not sending it at all and setting it later when it becomes available) is not the best idea to do. Instead, everything is the fault of ElasticSearch (which I oversee in some capacity) because it doesn't read stupid! And so any error being caused is due to ES. YOU DENSE MOTHERFUCKER!!! FUUUUUUUUUUUUUUUUUUUUUUUUUUUU
I hate the elasticsearch backup api.
From beginning to end it's an painful experience.
I try to explain it, but I don't think I will be able to cover it all.
The core concept is:
- repository (storage for snapshots)
- snapshots (actual backup)
The first design flaw is that every backup in an repository is incremental. ES creates an incremental filesystem tree.
Some reasons why this is a bad idea:
- deletion of (older) backups is slow, as newer backups need to be checked for integrity
- you simply have to trust ES that it does the right thing (given the bugs it has... It seems like a very bad idea TM)
- you have no possibility of verification of snapshots
Workaround... Create many repositories as each new repository forces an full backup.........
The second thing: ES scales. Many nodes / es instances form a cluster.
Usually backup APIs incorporate these in their design. ES does not.
If an index spans 12 nodes and u use an network storage, yes: a maximum of 12 nodes will open an eg NFS connection and start backuping.
It might sound not so bad with 12 nodes and one index...
But it get's pretty bad with 100s of indexes and several dozen nodes...
And there is no real limiting in ES. You can plug a few holes, but all in all, when you don't plan carefully your backups, you'll get a pretty f*cked up network congestion.
So traffic shaping must be manually added. Yay...
The last thing is the API itself.
It's a... very fragile thing.
Especially in older ES releases, the documentation is like handing you a flex instead of toilet paper for a wipe.
Documentation != API != Reality.
Especially the fault handling left me more than once speechless...
gives you a state PARTIAL
gives you a state SUCCESS
Why? The first one is blocking and refers to the backup status itself. The second one shouldn't be blocking and refers to the backup operation.
And yes. The backup operation state is SUCCESS, while the backup state might be PARTIAL (hence no full backup was made, there were errors).
So we have now an additional API that we query that then wraps the API of elasticsearch. With all these shiny scary workarounds like polling, since some APIs are blocking which might lead to a gateway timeout...
Gateway timeout? Yes. Since some operations can run a LONG (multiple hours) time and you don't want to have a ton of open connections hogging resources... You let the loadbalancer kill it. Most operations simply run in ES in the background, while the connection was killed.
So much joy and fun, isn't it?
Now add the latest SMR scandal and a few faulty (as in SMR instead of CMD) hdds in a hundred terabyte ZFS pool and you'll get my frustration level.
PS: The cluster has several dozen terabyte and a lot od nodes. If you have good advice, you're welcome - but please think carefully about this fact.
I might have accidentially vaporized people sending me links with solutions that don't work on large scale TM.2
Today I wrote my first plugin ifor elasticsearch ... Was awsome feeling..it is cool to decorate the readme when you have written something of your own.2
So I've been given a task to monitor a whole lot of logs of some servers (whole university ~ 10+ departments). The technologies are diverse so I'm cramming everything into elasticsearch via logstash (and filebeat), viewing it into kibana. Any recommendations for what should be the 'useful' stuff to be viewed into dashboard? I guess:
- Overall traffic wtih respect to previous days/weeks
- Most viewed domains
- Failed logins?
- Dropped connections?
- Critical-load of systems? 90%+2
Dear AWS, your Elasticsearch service is a bogus pile of shit-engorged horse fly larvae. Not only do you give no useful visibility into what's happening with the cluster (making diagnosis a sadistic guessing game), you lock down the fucking settings API, making it impossible to debug!! But your excellent support is on it! I wonder if I'll hear back from them this week with another inane suggestion like "increase the node count". Meanwhile the rest of my system is limping along, sometimes getting data where it's supposed to be while I keep fake-smiling and reassuring management and customers that "I'm working on it". If you're going to offer a service either make sure it works or get the fuck out of my way. I'll be moving my cluster back to EC2 and you can go do a back flip off a skyscraper. I need a drink2
Also, no one knows anything about it because the only dev who was supposed to maintain this app left 3 months ago due to unbearable management.1
So, there were four judgement rounds, over a period of 36 hours.
During the 3rd judgement, the judge says we have a potentially winning project, we just need to put things together now.
During the fourth judgement round, my laptop's Network Interface Card crashes, while running Node server and ElasticSearch server (while another laptop was running a Django server)...
On top of that, the judge assumes that the probability distribution of having a chest disease that we were showing in the form of heatmap on a chest X-ray, was actually body heatmap... And we were saying wherever there is more heat, is the diseased part.
My only hackathon...
Pro tip by a Noobie: Whenever you use an open sourced software, and set it up using some tutorial, make sure you download the latest distribution.
Wasted 2 days fixing something while setting up KeyCloak, eventually downloaded the latest version and worked fine. There was a bug in KeyCloak apparently.
Happened the same 2 and a half years ago trying to write node scripts for ElasticSearch, using an older ES library -_-3
A very long rant.. but I'm looking to share some experiences, maybe a different perspective.. huge changes at the company.
So my company is starting our microservices journey (we have a 359 retail websites at this moment)
First question was: What to build first?
The first thing we had to do was to decide what we wanted to build as our first microservice. We went looking for a microservice that can be used read only, consumers could easily implement without overhauling production software and is isolated from other processes.
We’ve ended up with building a catalog service as our first microservice. That catalog service provides consumers of the microservice information of our catalog and its most essential information about items in the catalog.
By starting with building the catalog service the team could focus on building the microservice without any time pressure. The initial functionalities of the catalog service were being created to replace existing functionality which were working fine.
Because we choose such an isolated functionality we were able to introduce the new catalog service into production step by step. Instead of replacing the search functionality of the webshops using a big-bang approach, we choose A/B split testing to measure our changes and gradually increase the load of the microservice.
Next step: Choosing a datastore
The search engine that was in production when we started this project was making user of Solr. Due to the use of Lucene it was performing very well as a search engine, but from engineering perspective it lacked some functionalities. It came short if you wanted to run it in a cluster environment, configuring it was hard and not user friendly and last but not least, development of Solr seemed to be grinded to a halt.
Elasticsearch started entering the scene as a competitor for Solr and brought interesting features. Still using Lucene, which we were happy with, it was build with clustering in mind and being provided out of the box. Managing Elasticsearch was easy since there are REST APIs for configuration and as a fallback there are YAML configurations available.
We decided to use Elasticsearch since it provides us the strengths and capabilities of Lucene with the added joy of easy configuration, clustering and a lively community driving the project.
Even bigger challenge? Which programming language will we use
What we’ve noticed during researching various languages is that almost all actions done by the catalog service will boil down to the following paradigm:
- Execute a HTTP call to fetch some JSON
- Transform JSON to a desired output
- Respond with the transformed JSON
Actions that easily can be done in a parallel and asynchronous manner and mainly consists out of transforming JSON from the source to a desired output. The programming language used for the catalog service should hold strong qualifications for those kind of actions.
Another thing to notice is that some functionalities that will be built using the catalog service will result into a high level of concurrent requests. For example the type-ahead functionality will trigger several requests to the catalog service per usage of a user.
To us, PHP and .NET at that time weren’t sufficient enough to us for building the catalog service based on the requirements we’ve set. Eventually we’ve decided to use Node.js which is better suited for the things we are looking for as described earlier. Node.js provides a non-blocking I/O model and being event driven helps us developing a high performance microservice.
The beauty of microservices and the isolation it provides, is that you can choose the best tool for that particular microservice. Not all microservices will be developed using Node.js and Elasticsearch. All kinds of combinations might arise and this is what makes the microservices architecture so flexible.
Even when Node.js or Elasticsearch turns out to be a bad choice for the catalog service it is relatively easy to switch that choice for magic ‘X’ or component ‘Z’. By focussing on creating a solid API the components that are driving that API don’t matter that much. It should do what you ask of it and when it is lacking you just replace it.
Many more headaches to come later this year ;)3
i feel its a great time to be a developer we have so many toys to play with
machine learning, scientific python, nodejs, frontend js frameworks, nosql, NLP, elasticsearch, mongodb, open source .net, big data with java, arduino..., VR, 3d printing
what toys are you playing with?
Hi, I am using a Wikipedia scrapper in one of my Open Source project. The data extracted from it is the stored in Elasticsearch... Now I have decided to create library out of it so that other people can use it too... My question is should also include the Elasticsearch storing module in library or just add the scrapper... Please let me know your thoughts.8
I've a whole new respect for ElasticSearch. It's codebase is so insanely complex, that I'm seriously contemplating tracing out the flow on a big ass chart. Any suggestions on how you people work and debug so many asynchronous flows?
I have been working on a bug, for almost 6 days (to be read as 3 consecutive weekends), and the best I've done is, conceptually isolate where it's happening. I'm an open source noob, but I feel I've learnt a whole lot during sifting through ES' codebase. :)2
Found a bug today that made me groan in frustration.
It appears that the official elasticsearch debian package checks if the system's init daemon is systemd by... Checking if systemctl binary is available.
Issue is... Systems might contain that binary while using a different init, as the binary is part of the "systemd" package.
To actually switch to systemd however, the package systemd-sysv has to be installed, which creates a link from /bin/init to systemd's main executable.
What happens when your system doesnt use systemd then? The postinstall/preremove scripts fail as systemctl fails to talk to the system bus, and thus, the installation is marked as failed!
Oversights like this are exactly the reason behind my systemd dislike. We never wanted the systemd package, but another key package suddenly added it as a dependency one day...
Now to see if this is reported as a bug already, and if not, to report it myself...
(also, who checks for init by looking for the init's management utility?! Its like I checked if sysvinit is installed by checking if update-rc.d is installed!
And not like figuring out the system's init daemon is hard anyway! Just check /bin/init, or, better yet, check for process with pid 0!)1
Has anybody here used Solr or Elasticsearch for a big online shop? We’re implementing fact finder and are not happy and are wondering about solr and stuff like that.
I kinda want more emotional input from other devs so I thought I asked here :)
Have you deployed ElasticSearch to production? If so, I got couple of questions for you.
How much complexity did ES add to the project overall from a developers perspective?
How much did this differ in price from other solutions you used? In production loads that is.
Elasticsearch! First time touching it and need to find out on my own how to build an index that allows a weighted multi field fuzzy search on four fields where two needs to be full ngram, one ngram on the words and one standard search + not index any other field. The documentation is horrible! Just realizing that this is what I need took me 2 days!2
I had a discussion - no, it was more a lobotomy - with one of our "experts"
I was kinda confused, as he had several grafana tabs open and an query editor...
He explained to me that he debugs and optimizes his query based on the grafana data....
Elasticsearch cluster with several hundred, different indices, > 20 TB data
I explained to him the scrape interval of 5secs, that he cannot distinguish his query from other queries, that there is far too much of an interference... Let alone that a 5 sec scrape interval is a very loooong time.....
Nope. It makes perfect sense to him and he'll continue to work like this.
OK. We've got this tiny little pet project of mine (work related)…
I rescued it from the git archive, simply put: someone hot glued an elasticsearch scroll + document processor (processing) together.
After a lot of refactoring, I had an simple, much improved (non-parallel) Akka Worker System without an Akka topology / hierarchy.
I left out the hierarchy at first, because I didn't know Akka at all.
I've worked with a lot of process workflows, and some systems that come very close to IPC, so I wasn't completely in the dark.
Topology requires knowledge / creation of a state machine / process workflow. And at that point of time I just had... Garbage. Partially working garbage.
I finished yesterday the rewrite into several actors... Compared to before, there are 8 actors vs 2... And round about 20 classes more. Mostly since I rewrote the Receive Methods of Akka as Command DTOs... And a lot of functions needed to be seperated into layers (which where non existent before)
Since that felt more natural than the previous chaos of passing strings or other primitive types around, or in the worst case just object....
(Yes: Previously an Actor was essentially a class with one or more functions "doEverything" and maybe a few additional functions which did everything - from Rest Client to Processing)).
Then I draw the actual state machine based on everything I've written in the last weeks and thought about how to create the actual topology and where / how parallelizing might make sense.
Innocent me stumbled in the Akka Docs on Akka Typed... (Didn't know it existed, since I'm very new to Java and Akka).
Hm, that sounds an a lot like what I did. In an different way, yes. But not so different that it might be VERY hard to port to.... And I need to change (for implementation of hierarchy) a few classes....
[I should have known at this stage that my curiosity would get the best of me, but yeah. Curiosity killed the cat.]
Actually the documentation is not bad. It's just that upon reading the first more complex examples, my brain decided to go into panic state.
The've essentially combined all classes in one class in all source code examples [which makes sense more sense later], where it is fscking hard for an chaotic brain like mine to extract information....
The thing is: It's not hard to understand… actually very simple.
It was just my brain throwing an fuck you tantrum.
So I've opened more examples in other tabs and cross referenced what happened there and why...
Few frustrated hours later I got that part.... And the part why it's called Akka Typed. It was pretty simple....
Open the gates of hell, bloody satan that was too easy for fucks sake.
Nooooow.... I just need to port my stuff to Akka Typed.
Cause. Challenge accepted, bitch - eh brain. You throw tantrum, you work overtime. -.-
I just cannot decide wether to go FP or OOP.
Now... I'm curious wether FP is that hard... Hadn't dealt with it at large before.
Can someone please stop me... I'm far too curious again. -.- *cries*6
Implement a rest API for elasticsearch.
Follow the client's index's mapping.
Generate json document from Java pojos, given by the client.
Jsons don't match the schema mapping, one (at least) field, for geographic coordinates, is in another format.
Ask the client for explanation.
Client response, after 6 hours:
"We build it in this shape so you have to convert them to another format before posting into ES".
What the hell is wrong with you?!1
So what's with the whole Elastic open source licenses thing. Seems like a spat btwn then and Amazon?
Amazon now argues that doing this means Elasticsearch and Kibana will no longer be open sourced and that the Elastic License limits how the code can be used while the Server Side Public License makes it unacceptable to the open-source community.23
Is there some sort of Query Builder for ElasticSearch?
I have ELK setup and in Kibana can generate all the aggregation visualizations but now I want the data to be usable in a program so it can generate reports like who are our top users.
But the aggregation queries seem to be very verbose... not sure how anyone can generate or understand it by hand vs telling Kibana I want a chart with X and Y axes using these terms.
IDeally I'd like to have Kibana then tell me what's the actual JSON/Elastic query it used to generate that but can't seem to find something like that.1
What do you guys think about deploying elastic search on App Engine Custom Runtime?
(Basically, an empty folder with an elastic search Dockerfile.)
I think it's a good idea: you can now deploy your code and storage application (Elastic search, Redis, etc) as services on your cluster.
You can use GCP magic to auto scale those services, you have so many good stuff that come with it.
And it's inside the same network as your services running in the same AppEngine project.1
We were on track to provision Cassandra for logging and elasticsearch for business data store.
Now we are on track to provision mongodb and elasticsearch. That's right, two document stores.
Things go to hell when management think they are capable of taking engineering decisions.
From last one week I m just sleeping for 2 hours trying to solve a problem in elasticsearch by writing plugin .. its always fun i never understand when it becomes 7am in the morning.
This invite to an ElasticSearch webinar is epic:
Looking for feedback on Elastic's SSPL license type change. Is anybody else worried? Any other companies seek legal counsel already? What have your lawyers said about it?5
You know how people rant about js frameworks; well the very same is true about nosql.
I thought let me broaden my horizon (pun intended) with a nosql db in my project.
So from Friday evening, I started off with ElasticSearch, which is pretty simple to get started, but apparently I need to understand it a lot better to use it as a primary data-store.
Then I stumble upon orient-db, was pretty exciting and learnt the apis/librarys but researching it a bit more to learn about the community; there is some bad-blood there.
Now I'm onto something called ArangoDB, think I'll stick with this; Any more time spent on this and I'll just give up on the project.5
I want to keep 1 year of daily indexes but for the ones older than 30 days, unloaded from memory. But accessible when needed.
So like say there's a performance issue today and I want to compare all the activity against 2 months ago. I can open the old index and search it.
Can you do that, does closing remove it from memory? Otherwise how would you do that?6
Any Elasticsearch gurus here? I have a box with too many young gen GCs (one per 2 or 3 seconds), and irregular, very long old gen GCs (One per several hours, taking around a minute and freeing about 2/3's of the old gen space) -- I was thinking changing the new gen ratio from 2/3 to something like 3/4 or 4/5.
However, after reading an elastic article about settings to never touch... I'm no longer so sure...
Only other option I was considering is going from CMS to G1GC to cut back on the old gen GC time... A minute long downtime for Elastic is rather problematic.
Any thoughts? The box is rather old - running Elastic 5.6 with 20 GBs of heap, 207 shards and 306k docs.2
I did some of the front-end and whole backend. build and manage the SQL + elasticsearch database. After all of this, only 17 lines of mother fu**er code ruined my life. The client is asking for code. And.... And... Can't say anymore.
path => "/home/rsa-key-20200528 /aslogger.log"
type => "java"
start_position => "beginning"
codec => rubydebug
hosts => ["localhost:9200"]
index => "aslogger"
How the fuck do you use and make a fields.yml for dynamic filebeat indexes?
Aka what if i don't want all the fields?
FUCK YOU AWS Elasticsearch!!
Fucking lossing data on cluster upgrade. Fuck you! Now I have to rebuild the goddann records from Postgres database entries.
Cunt AWS ES. Screw you!
Getting an ElasticSearch Developer training this week, anyone here been already? If so, what was it like?2