Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "prometheus"
-
When i ask for a raise -
Boss : Are u joking ? I can see u hardly working.
When I ask for a leave -
Boss : Are u joking ? Who will do your work ?4 -
An ancient legend goes that there exists sacred knowledge that enables anyone possessing it not to turn one’s career into a constant uphill battle with the management.
I sought this knowledge, I travelled the world, to no avail. Once upon a time, I climbed the Mount Fuji and met the wizard in his pagoda on the mount. I won in a CSS-golf battle with him, and he revealed the sacred truth: one need to chose companies that do business instead of constant backroom deals and dick-measuring contests.
Like Prometheus, I give this knowledge to you. An ancient scroll says that for this I’ll be chained to the mountain of PHP legacy code, and HRs will peck my brain for eternity, but I found Arachne, the queen of HRs, and exchanged the keto-diet secret for freedom.1 -
Inherited a simple marketplace website that matches job seekers and hospitals in healthcare. Typically, all you need for this sort of thing is a web server, a database with search
But the precious devs decided to go micro-services in a container and db per service fashion. They ended up with over 50 docker containers with 50ish databases. It was a nightmare to scale or maintain!
With 50 database for for a simple web application that clearly needs to share data, integration testing was impossible, data loss became common, very hard to pin down, debugging was a nightmare, and also dangerous to change a service’s schema as dependencies were all tangled up.
The obvious thing was to scale down the infrastructure, so we could scale up properly, in a resource driven manner, rather than following the trend.
We made plans, but the CTO seemed worried about yet another architectural changes, so he invested in more infrastructure services, kubernetes, zipkin, prometheus etc without any idea what problems those infra services would solve.2 -
Me : There is a hotfix needs to be done.
Boss : So what ?
Me : Its assigned to you from last 2 weeks and deadlines are near.
Boss: (speaking politely) oh.. is it? Can you take a look ?
Me : .. -
Pun :
My C# developer friend Alfred getting a divorce because his feminist wife didn't like him treating her as "Object".
Now she's gonna "Dispose" him after "Using" him for her benefits.
😋3 -
One of these days....
Where you want to do a tiny task
....
And suddenly an explosion nukes every service, related service and dependant service.
Chain reaction. Yaaaayyy........
(ancient prometheus node lead to an snapshot error, snapshot error made the migration tool unhappy, migration tool unhappy meant that my task failed - updating prometheus meant checking every target, exporter and so on...
Fuckity fuck it''s gangbang time.)1 -
Time to get going properly with ansible, consul and docker swarm.
Idea is first to convert tinc to a container, which automatically sets itself up based on previous consul announced tinc nodes.
Consul to keep track of all the nodes with prometheus too and hopefully auto attach to grafana.
Ansible to set up new nodes right with DO API, announce to consul, pull docker images and join the docker swarm master.2 -
Storytime.
The Prometheus Tales - Part V
This week I found some time to reinstall the Titan on Ubuntu 20.04.
And the fucker finally can reach all targets in our environment.
Targets don't spawn via active puppet run on the Titan anymore, but via bash/puppetdb magic, which is nice.
Progress!
Let's see how the storage behaves.. -
Here's a leak of our Prometheus dashboard during our last Friday deployment of our campus site
https://streamable.com/bfgtz1 -
Wanted to try a new alerting based on a new Prometheus metric we added. To trigger an alert we killed the dev stage db of the service. Alert didn't get triggered. The reason was that the metrics endpoint suddenly needs exactly 60s for a response if the db is killed and prometheus timeout is 20s.
And to top it off, this behavior happens for each service we developed (that has a db) .
Well at least the new alerting already helped find a bug.2 -
Using grafana together with tinc+promotheus, has been a blast.
Initially I wanted to get into ELK with Kibana and all that, but that required 8G of ram, the instructions to get it running in the open source "mode" was nearly non-existent, together with all the ready docker compose stacks out there simply not working or the images being broken.
I'm sure I could've managed around most of those issues, but the fact it is as hungry as gitlab, made it a literal no-go for the usual server resources my clients host or my own scaled down server recently.
Thankfully I remembered that there's grafana and me having experimented some time ago with tinc, so I can have very lightweight beat'esque prometheus agents deployed listening on tinc local net only, with the typical nginx auth and some whitelists to all of the servers I host and all those of my clients.
The dashboard creation was especially great in grafana (tbf promotheus does actually most of it), literally what I always wanted out of those "complicated" solutions, that do it all, but have no proper query language, complex documentation, heavy collectors with no properly named data points, expensive resource runtimes, ..
with grafana I can just easily put dashboards into folders, create users to look only at certain stats or even dashboards (opened up some interesting contracts actually, because now I can also offer proper monitoring for all things delivered), easily drag and drop around stuff to fit more information (most others fix you to a small 3x2 grid, a too big grid for a TV or simply non resizable tiles, making that one counter take up an entire row) and resize to my hearts desire
tinc of course allows me to easily create private networks that are resistant to failure across any region and the routing is done for me, so I don't have to run around it all that much either
P.S: a damn tiny fly went into one of my now 4 monitors and died right in the middle, because I thought it's just some dirt and I pressed it in while trying to wipe it off, so that monitor now serves as the top most on a vesa mount5 -
Half a day wasted. FUCK!
I use grafana loki and mimir/prometheus for telemetry. A few days ago I queried loki to see if logging is still working. Yesterday I changed the datasource to mimir, changed the query parameters to get metrics from another env, ran the query, and... Querier [mimir] crashed.
Wtf.
Error says it got too much data to chew on.
So I spend 4 hours playing with the querier and grpc limits, balancing between limit errors and OOMKills [2G ram].
I got suspicious about oomk. Why would it...
Then I tried to shrink the timeframe to 15min. Still oomk. Down to 5min -- now it worked. But the number of different metrics returned was over 1k
then I look once again at the query. And ofc it is ´{env="prod"}´
turns out, forgetting that you're querying metrics with a logs' query is an expensive and frustrating mistake. Esp. at 3am.
idk why it even returned me anything...7 -
Wanted to add alerting for systemd services in Prometheus today, which spontaneously turned out to be a huge pain in the lower human backend.
For some reason, on Ubuntu 16.04 systemd adds services without unit files for software, that isn't even installed on the damn server (in this case for mysql-server / mysql-common and mysql-client are installed) and lists them as "not-found" and "inactive". The prometheus node exporter that we use, has a little bug in the systemd collector that makes sure that the states of *all* services are collected - even those without a unit file.
so those metrics are pulled by prometheus and now I have to take with those faulty metrics in the condition logic of the alert, because I'm trying to trigger that one on a service which is listed with state "active" = 0 or "failed" = 1.
now guess. right! If the unit file doesn't exist, the regarded systemd service is marked as "inactive", which is another possible state of the metrics in the node exporter. the problem is that the value 1 for state "inactive" means, that "active" has the value 0 (not even wrong) and the alert is triggered.
so systemd fucks up somehow, the node exporter collector fucks up because systemd fucked up and I have to unfuck this with some crazy horse shit logic. w.t.f. to that.
the only good news is, that it works like a charm on Ubuntu 18.04, as far, as I can tell.
while writing this little rant, I thought of a solution.
I could try to change the alert condition to state "active" = 0 AND "failed" = 1.. but that will wait till tomorrow.
one does not simply patch monitoring conditions at midnight..3 -
Storytime.
The Prometheus tales
Part IV - A new FUBAR.
A new and very fascinating problem emerged a few days, after feeding some node definitions to the new titan instance.
It's a storage fuck-up. A major one.
If I'm informed correctly, the latest prometheus should have the same (or even better) log compression algorithms for metrics, as the old one - because these fuckers are so damn good at what they are doing: compress some fucking logs.
The new instance is agregating metrics as planned. Grafana work's like a fucking charm.
Nethertheless, because of very fascinating but unknown reasons, the new instance creates 50GB of metrics in under 4 fucking hours.
Am I missing something here? Some magic parameter that has to be passed to the titan, that enables the hardcore compress-them-fuckers-feature?
Debugging session is tomorrow.
To be continued. -
FOMO on technology is very frustrating.
i have a few freelance and hobby projects i maintain. mostly small laravel websites, go apis, etc ..
i used to get a 24$/ month droplet from digital ocean that has 4vCPUs and 8GB RAM
it was nore than enough for everything i did.
but from time to time i get a few potential clients that want huge infrastructure work on kubernetes with monitoring stacks etc...
and i dont feel capable because i am not using this on the daily, i haven't managed a full platform with monitoring and everything on k8s.
sure u can practice on minikube but u wont get to be exposed to the tiny details that come when deploying actual websites and trying to setup workflows and all that. from managing secrets to grafana and loki and Prometheus and all those.
so i ended up getting a k8s cluster on DO, and im paying 100$ a month for it and moving everything to it.
but what i hate is im paying out of pocket, and everything just requires so much resources!!!!3 -
Storytime.
Our prometheus node, one of your oldest systems (somehow fits the Titan reference..), is about to be relieved of its duties after several years of loyal services to the crew.
We decided to run with another Prometheus node in the ring, that will run simultaneously with the old one, so that the new one can start to collect metrics that we need for alerting (some historic metrics are needed too..). sort of an Prometheus cluster, without the cluster fun and with 2 different Prometheus versions.
The problems with this? Well it's not the new node or the latest shit versions of Prometheus per se.
1: The node exporter.
those dudes decided to make some breaking changes in a minor update, so that you will need to run with some magic bullshittery, that the latest Prometheus can make something out of the old metrics provided by the old node exporters.
The other one is the related puppet code.
The node definitions for Prometheus were built via exported resources on the target nodes.
The code worked like a charm with only one Prometheus node, but try that with two instances in the same way.
Still WIP, but some targets are already included in the new Prometheus instance.
alerting works so far.
Can't wait to close this ticket for good.. -
After the conversation, the real good way was already provided:
Prometheus exporter: https://github.com/prometheus/... (https://blog.opstree.com/2018/12/... for more details)
Overview: https://devconnected.com/complete-m...1 -
What do you use for performance monitoring on your infrastructure?
My company uses zabbix, OpenNMS and Nagios to monitor different parts of our infrastructure (from shared web hosting to OCCAS to IPTV to FutureVoice to Atlassian servers) but has no real-time performance checks.
I’ve set netdata master with prometheus backlog and grafana dashboards to monitor different metrics, however I am not sure whether any better approach could be done. Any suggestions?2 -
Storytime - The Prometheus tales - Part III (I think..).
Updated the node definitions on the old node today, just to keep it up to date. nothing fancy.
I went to the new node and and checked the setup again. I already had roughly 120 node definitions onboard for testing purposes.
so all firewalls should have been configured the right way, so that the wee one might celebrate the marriage with the rest of the gang finally.. and then went with "puppet YOLO" on the new node. added every fkn node definition to the new setup.
every node turned out just to be fine.
except for 137 little InstanceDown alerts (out of 600+).
it's a good thing, that the little fella can send mails to me, myself and I only for the time being.
so debugging. again. but at least it's not a problem related to prometheus itself, because the connections end with a timeout on the related nodes. should be more like a firewall fubar.
we will see.5 -
If anyone is looking for a great tutorial on getting started with a docker cluster check out https://dockerswarm.rocks/
I had a 4 node cluster up on Digital Ocean with Traefik + Lets Encrypt, Prometheus, Portainer, Grafana all that good stuff in under 2 hours. Not much longer to test a basic WP and Next Cloud container with full SSL. Neat stuff. Just burning through $100 credit for testing but it's been fun5 -
How long does it take to someone to master backend web development with either in spring or asp.net?4
-
I am an ASP.Net mvc web developer and now I need to learn the spring boot per new requirements. The books I am reading makes me to compare the spring boot with asp.net as the books mainly emphasize the spring "magic"
Any suggestions for the materials which put things in different way ?2 -
!Dev
Fuck people using trace rifles in momentum control. How the hell am I supposed to kill someone who kills me in two rounds and also fires at 1000 rounds per minute. I was trying to get the catalyst aka upgrade for the seasonal weapon which is pretty bad and the upgrade makes it usable but I am getting ripped apart after my first kill because someone can kill me with 2 bullets wherever he shot me.
Yes momentum control is supposed to be a gunfight mode and it comes around rarely but that does not mean a broken weapon can roam around killing anybody in sight before they even know you fired a shot at them from some lane. Shotguns do the same but you need to get close. Shotguns are still a problem but at least you can dodge or counter with a shotgun since your radar tells you someone is nearby and snipers need a headshot. These weapons can fire at your toe and you are dead. Oh the devs knew that such fast firing weapons wil be op and needed their damage and made them use the same ammo as shotguns, sniper and non heavy grenade launchers. However the game mode gives all weapons a damage buff which is enough for trace rifles to be broken. Yes you can use other primaries but what are you gonna do when a auto rifles kills you with two shots to the toe. And since they burn ammo quickly and take more rounds to kill then their counterparts like shotguns which use he same ammo as them they spawn in with 50 in the mag and anybody who is using shotguns snipers or grenade launchers give them ammo and they only need two rounds to kill. Also after I kill 50 PvP opponents I need to kill a few hundred opponents in PVE or PVP to actually apply the upgrade and who you kill does not matter.
Seriously and the second weapon I want to upgrade which is able has tracking but you need to aim down sights after hipfiring the tracking shots
which dl negligible damage so they explode or aim down sights and shoot which deals more damage but I am probably not going to have enough time before some random kills me again.
And this is just the first game. From what I heard it was supposed to be a fun game mode which focused on gunfights with your primary not the infamous laser tag show of Prometheus lens which happened a few years ago but now all trace rifles can do that. Oh and I still need to get 50 kills there for a seasonal challenge so I can get the free version of the premium currency and I can only skip one challenge and I have already skipped one challenge since it requires a dlc K don't own.
Seriously why cant some actual good game come up to challenge this. All the competition seems to be third person shooters. Also most of the guns don't feel good and lore is pretty lacking but lore is not top priority. The only competition is Warframe which is not my style, Titanfall 2 but I get insane pings from here so no multiplayer so after the story nothing to do unless I want to do airtstrafing which is useless since I can't play multiplayer. Granted Titanfall 2 is not a looter shooter but the guns feel good and the movement is too good and Halo 1 - 3 since I heard 4 and 5 are pretty bad and I have only played halo 1. I might complain about jackal snipers in halo 2 but at least they have fixed spawns.
Maybe I am overreacting since it is my first game of momentum control