devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "logs of death"

20

k0pernikus

5245

9y

So we have an API that my team is supposed send messages to in a fire and forget kind of style.

We are dependent on it. If it fails there is some annoying manual labor involved to clean that mess up. (If it even can be cleaned up, as sometimes it is also time-sensitive.)

Yet once in a while, that endpoint just crashes by letting the request vanish. No response, no error, nothing, it is just gone.

Digging through the log files of that API nothing pops up. Yet then I realize the size of the log files. About ~30GB on good old plain text log files.

It turns out that that API has taken the LOG EVERYTHING approach so much too heart that it logs to the point of its own death.

Is circular logging such a bleeding edge technology? It's not like there are external solutions for it like loggly or kibana. But oh, one might have to pay for them. Just dump it to the disk :/

This is again a combination of developers thinking "I don't need to care about space! It's cheap!" and managers thinking "100 GB should be enough for that server cluster. Let's restrict its HDD to 100GB, save some money!"

And then, here I stand trying to keep my sanity :/

undefined api rest api logging logs logs of death dependencies

1
7

ctnqhk

1109

3y

Stakeholder: Users are unable to buy tickets on the website. IT says Azure’s health check is showing an unhealthy status.

[It’s Sunday. Web Engineering is not on call so no one sees this right away.]

Stakeholder: IT restarted the Azure website twice, but users still can’t place orders.

Me: There was never an issue with the Azure site. That health check is inaccurate. There is a rewrite rule that sends the Azure supplied domain to our custom domain. The Azure health check doesn’t like that so it returns an unhealthy status. The problem is the ticketing server that the website has to communicate with. The ticketing server is overwhelmed and can’t handle more requests. IT should have checked the ticketing server’s logs. This has happened before and it’s never been an Azure issue. It’s a ticketing server issue.

Stakeholder and IT: Oops 😅

—-

JFC. Stop trying to make this web engineering’s problem. Stop trying to make it look like engineering dropped the ball. The ticketing server has experienced this issue multiple times. The ticketing server is maintained by a different team. The website’s symptoms are always the same and there are steps you need to take before you make the decision to restart the website, which will cause the website to show a blue screen of death that says 503 service unavailable for a few minutes. And we have a switch to shut off all transactions. Why do you not want to use it when it’s clear the website can’t process transactions???

rant you restarted the website for nothing this is what happens when you panic wtf jfc

3

Top Tags

rant linux code windows fuck i java c programming android dev the is javascript js a life joke python

Weekly Rant

Most unrealistic deadline you've had?

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service