Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "splunk"
-
Worst thing you've seen another dev do? Long one, but has a happy ending.
Classic 'Dev deploys to production at 5:00PM on a Friday, and goes home.' story.
The web department was managed under the the Marketing department, so they were not required to adhere to any type of coding standards and for months we fought with them on logging. Pre-Splunk, we rolled our own logging/alerting solution and they hated being the #1 reason for phone calls/texts/emails every night.
Wanting to "get it done", 'Tony' decided to bypass the default logging and send himself an email if an exception occurred in his code.
At 5:00PM on a Friday, deploys, goes home.
Around 11:00AM on Sunday (a lot folks are still in church at this time), the VP of IS gets a call from the CEO (who does not go to church) about unable to log into his email. VP has to leave church..drive home and find out he cannot remote access the exchange server. He starts making other phone calls..forcing the entire networking department to drive in and get email back up (you can imagine not a group of happy people)
After some network-admin voodoo, by 12:00, they discover/fix the issue (know it was Tony's email that was the problem)
We find out Monday that not only did Tony deploy at 5:00 on a Friday, the deployment wasn't approved, had features no one asked for, wasn't checked into version control, and the exception during checkout cost the company over $50,000 in lost sales.
Was Tony fired? Noooo. The web is our cash cow and Tony was considered a top web developer (and he knew that), Tony decided to blame logging. While in the discovery meeting, Tony told the bosses that it wasn't his fault logging was so buggy and caused so many phone calls/texts/emails every night, if he had been trained properly, this problem could have been avoided.
Well, since I was responsible for logging, I was next in the hot seat.
For almost 30 minutes I listened to every terrible thing I had done to Tony ever since he started. I was a terrible mentor, I was mean, I was degrading, etc..etc.
Me: "Where is this coming from? I barely know Tony. We're not even in the same building. I met him once when he started, maybe saw him a couple of times in meetings."
Andrew: "Aren't you responsible for this logging fiasco?"
Me: "Good Lord no, why am I here?"
Andrew: "I'll rephrase so you'll understand, aren't you are responsible for the proper training of how developers log errors in their code? This disaster is clearly a consequence of your failure. What do you have to say for yourself?"
Me: "Nothing. Developers are responsible for their own choices. Tony made the choice to bypass our logging and send errors to himself, causing Exchange to lockup and losing sales."
Andrew: "A choice he made because he was not properly informed of the consequences? Again, that is a failure in the proper use of logging, and why you are here."
Me: "I'm done with this. Does John know I'm in here? How about you get John and you talk to him like that."
'John' was the department head at the time.
Andrew:"John, have you spoken to Tony?"
John: "Yes, and I'm very sorry and very disappointed. This won't happen again."
Me: "Um...What?"
John: "You know what. Did you even fucking talk to Tony? You just sit in your ivory tower and think your actions don't matter?"
Me: "Whoa!! What are you talking about!? My responsibility for logging stops with the work instructions. After that if Tony decides to do something else, that is on him."
John: "That is not how Tony tells it. He said he's been struggling with your logging system everyday since he's started and you've done nothing to help. This behavior ends today. We're a fucking team. Get off your damn high horse and help the little guy every once in a while."
Me: "I don't know what Tony has been telling you, but I barely know the guy. If he has been having trouble with the one line of code to log, this is the first I've heard of it."
John: "Like I said, this ends today. You are going to come up with a proper training class and learn to get out and talk to other people."
Over the next couple of weeks I become a powerpoint wizard and 'train' anyone/everyone on the proper use of logging. The one line of code to log. One line of code.
A friend 'Scott' sits close to Tony (I mean I do get out and know people) told me that Tony poured out the crocodile tears. Like cried and cried, apologizing, calling me everything but a kitchen sink,...etc. It was so bad, his manager 'Sally' was crying, her boss 'Andrew', was red in the face, when 'John' heard 'Sally' was crying, you can imagine the high levels of alpha-male 'gotta look like I'm protecting the females' hormones flowing.
Took almost another year, Tony released a change on a Friday, went home, web site crashed (losses were in the thousands of $ per minute this time), and Tony was not let back into the building on Monday (one of the best days of my life).10 -
We're having an ongoing credential stuffing attack right now. Hackers hit us hard over the weekend and the web team sent out an email congratulating themselves that they stopped the threat.
I decided to look to see how they "fixed" the issue.
They modified their code to stop logging the errors to prevent Splunk from sending the automated emails to management (how we have been able to spot/monitor the attack).
They literally just put their heads in the sand, stapled a sign to their ass that reads "Meteor? We see no meteor approaching. Everything is fine."5 -
I'm convinced code addiction is a real problem and can lead to mental illness.
Dev: "Thanks for helping me with the splunk API. Already spent two weeks and was spinning my wheels."
Me: "I sent you the example over a month ago, I guess you could have used it to save time."
Dev: "I didn't understand it. I tried getting help from NetworkAdmin-Dan, SystemAdmin-Jake, they didn't understand what you sent me either."
Me: "I thought it was pretty simple. Pass it a query, get results back. That's it"
Dev: "The results were not in a standard JSON format. I was so confused."
Me: "Yea, it's sort-of JSON. Splunk streams the result as individual JSON records. You only have to deserialize each record into your object. I sent you the code sample."
Dev: "Your code didn't work. Dan and Jake were confused too. The data I have to process uses a very different result set. I guess I could have used it if you wrote the class more generically and had unit tests."
<oh frack...he's been going behind my back and telling people smack about my code again>
Me: "My code wouldn't have worked for you, because I'm serializing the objects I need and I do have unit tests, but they are only for the internal logic."
Dev:"I don't know, it confused me. Once I figured out the JSON problem and wrote unit tests, I really started to make progress. I used a tuple for this ... functional parameters for that...added a custom event for ... Took me a few weeks, but it's all covered by unit tests."
Me: "Wow. The way you explained the project was; get data from splunk and populate data in SQLServer. With the code I sent you, sounded like a 15 minute project."
Dev: "Oooh nooo...its waaay more complicated than that. I have this very complex splunk query, which I don't understand, and then I have to perform all this parsing, update a database...which I have no idea how it works. Its really...really complicated."
Me: "The splunk query returns what..4 fields...and DBA-Joe provided the upsert stored procedure..sounds like a 15 minute project."
Dev: "Maybe for you...we're all not super geniuses that crank out code. I hope to be at your level some day."
<frack you ... condescending a-hole ...you've got the same seniority here as I do>
Me: "No seriously, the code I sent would have got you 90% done. Write your deserializer for those 4 fields, execute the stored procedure, and call it a day. I don't think the effort justifies the outcome. Isn't the data for a report they'll only run every few months?"
Dev: "Yea, but Mgr-Nick wanted unit tests and I have to follow orders. I tried to explain the situation, but you know how he is."
<fracking liar..Nick doesn't know the difference between a unit test and breathalyzer test. I know exactly what you told Nick>
Dev: "Thanks again for your help. Gotta get back to it. I put a due date of April for this project and time's running out."
APRIL?!! Good Lord he's going to drag this intern-level project for another month!
After he left, I dug around and found the splunk query, the upsert stored proc, and yep, in about 15 minutes I was done.1 -
Unaware that this had been occurring for while, DBA manager walks into our cube area:
DBAMgr-Scott: "DBA-Kelly told me you still having problems connecting to the new staging servers?"
Dev-Carl: "Yea, still getting access denied. Same problem we've been having for a couple of weeks"
DBAMgr-Scott: "Damn it, I hate you. I got to have Kelly working with data warehouse project. I guess I've got to start working on fixing this problem."
Dev-Carl: "Ha ha..sorry. I've checked everything. Its definitely something on the sql server side."
DBAMgr-Scott: "I guess my day is shot. I've got to talk to the network admin, when I get back, lets put our heads together and figure this out."
<Scott leaves>
Me: "A permissions issue on staging? All my stuff is working fine and been working fine for a long while."
Dev-Carl: "Yea, there is nothing different about any of the other environments."
Me: "That doesn't sound right. What's the error?"
Dev-Carl: "Permissions"
Me: "No, the actual exception, never mind, I'll look it up in Splunk."
<in about 30 seconds, I find the actual exception, Win32Exception: Access is denied in OpenSqlFileStream, a little google-fu and .. >
Me: "Is the service using Windows authentication or SQL authentication?"
Dev-Carl: "SQL authentication."
Me: "Switch it to windows authentication"
<Dev-Carl changes authentication...service works like a charm>
Dev-Carl: "OMG, it worked! We've been working on this problem for almost two weeks and it only took you 30 seconds."
Me: "Now that it works, and the service had been working, what changed?"
Dev-Carl: "Oh..look at that, Dev-Jake changed the connection string two weeks ago. Weird. Thanks for your help."
<My brain is screaming "YOU NEVER THOUGHT TO LOOK FOR WHAT CHANGED!!!"
Me: "I'm happy I could help."4 -
Introduced a ‘new’ logging framework for our web site. Web team is testing the integration and I get an email saying the logging wasn’t working. Instead of sending me how she is searching the logs, she sends me a screen shot of the code (which is ass-backwards of how I documented the logging library, but that’s another rant). OK, she wrote 5 lines of code that should be one line, but OK, the error still should have logged fine. I search the logs, and sure enough, there they are. Errors logged just as they should.
So I email back (with screenshot of the search query and results) asking how she searched for the errors.
Hour later she responds ..”I don’t know.”
That’s it.
WTF do you mean “I don’t know”?…WTF…you are a –bleep-ing developer too! This is not the first –bleep-ing splunk query you’ve written!
OK..I’m calm..feeling better. Wouldn’t be so bad if she emailed just me with the question (I’m not a splunk query expert either, we can figure it out together), but she was sure to cc 3 of the PMs involved in the integration, my boss, and other team members to make it sound like the problem was my code.3 -
Pro tip:
Make sure you can RECOVER from your backups.
It's all well and good backing this and that up, but make sure that when the shit really hits the fan you can recover.
I've now 4 days into recovering a raspberry pi that ran:
Pi-hole
Snort
DHCP
VSFTP
Logwatch
Splunk forwarder
Grafana
And serveral other things... I've learnt my lesson4 -
Inmates are trying to take over the asylum again.
Got a message from the web team manager deeply concerned because since switching to the new logging framework, the site is significantly slower.
She provided no proof or any data to what 'significantly slower' means.
#1 The 'new logging' has been in place and logging for 5 years. We only recently depreciated the ILogger interface ('new' ILogger interface only has 1 method instead of 5)
#2 The 'old logging' was modified 5 years ago, so even if you were using the 'old' interface, the underlying implementation is still the same.
She tried to push the 'it wasn't this slow before' argument, so I decided to do some fact based analysis.
Knowing they deployed their logging changes couple of weeks ago, I opened up AppDynamics, looked at the average call time to Splunk (along with a few other http calls they are doing)
- caching services - 5ms
- splunk - 30ms
- Order Service - 350ms
- Product Data Service -525ms
Then I look at the data they are logging, for the month of June, over 5 million messages. At 30ms each, that's almost 42 hours spent logging errors...yes errors. Null reference exceptions, Argument exceptions, easily fixable stuff.
So far for the month of July (using the 'new' logging), almost 2.5 million errors. Pretty close so far with June's numbers.
My only suggestion was to fix the bugs in their code so they don't log so many errors.
Her response.."Can we have one of our developers review your logging code? We believe we can find ways to optimize the http requests"
Oh good Lord. I'm not a drinking man..but ...I might start.1 -
Coming from Splunk to Sumo Logic, so far I haven't found anything I like about Sumo Logic. It's slow and constantly tries to limit you. Splunk is expensive but I can see why now.
-
Had an internet/network outage and the web site started logging thousands of errors and I see they purposely created a custom exception class just to avoid/get around our standard logging+data gathering (on SqlExceptions, we gather+log all the necessary details to Splunk so our DBAs can troubleshoot the problem).
If we didn't already know what the problem was, WTF would anyone do with 'There was a SQL exception, Query'? OK, what was the exception? A timeout? A syntax error? Value out of range? What was the target server? Which database? Our web developers live in a different world. I don't understand em.1 -
Just discovered 20+TiB of Splunk data in our AWS account today. We haven't used Splunk for almost a year and a half...6
-
All these super expensive and fancy enterprise tools. CloudWatch, AppDynamics, Grafana, Splunk and whatnot. Spent a month trying to figure out why the fuck the app does not perform well.
Took 1 day with tcpdump, awk and gnu utils to figure out why.
Should anyone need a tcpdump analyzer -- try my awk script. Shows response times of each network call w/o impacting app performance :)
https://gist.github.com/netikras/...14 -
We receive an email from Splunk when errors go above a certain threshold, and a particular service has been especially problematic this week (throwing hundreds of exceptions). Email response from the team mgr responsible for the service.
"We are working to address these errors. We don’t currently have a way to prevent a user who’s account is locked from logging into the service and performing work."
The exception? NullReferenceException: Object reference not set to an instance of an object.
The code? (paraphrasing)
var user = GetUser(request.Login);
if (user.CanPerformWork) ...
<facepalm>
I'm doing my best not to reply .."Really? No way? You do realize we can read code, right?"4 -
We are researching enhancing our current alerting system (we use Splunk) to be 'smarter' about who is emailed/texted/whatever when there are problems in our applications.
Currently, if there are over 50 errors logged within a 15 minute period, a email/phone/text blast to nearly 100 individuals ranging from developers, network admins, DBAs, and vice presidents.
Our plan is to group errors by team and let each team manage their own applications. Alert on 1 error, 5, 500...we don't care, let the team work out the particulars.
The trick was interfacing with Splunk's API (that's a long rant by itself)
In about a day or so I was able to use Splunk's WebHook feature to notify a WebAPI service I threw together to send myself an email with details about the underlying data (simulating the kind of alert we would send to the team)
I thought ...cool... it worked. Show it off to the team, most thought it was a good start, except one:
Dev: "The errors are not grouped by team."
Me: "No, I threw the webapi service together to demonstrate how we can extract the splunk bits to get access to the teams"
Dev: "Well...this won't work at all."
Me: "Um..what?"
Dev: "The specification c l e a r l y states the email will be team based. This email was only sent to you and has all the teams and their applications"
Me: "Um...uh...the service can, if we want to go using a service route. Grouping by team name is easy using a LINQ query. I just through this service together yesterday."
Dev: "I don't know. Sounds like I need to schedule a meeting to discuss what you are proposing. I don't think emailing all that to everyone is a good idea."
WTF! Did you not listen to what I said?!!!
Oh well..the dev's proposal is to use splunk's email notification and custom Exchange rules with callbacks into splunk that resend...oh good lord ...a fracking rube goldberg of a config nightmare ...
I suspect we'll go the service route once I finish the service before the meeting.1 -
Spent the weekend geeking out getting my head around a proper Docker based environment for my development env at home and for the team... 90% done and I couldn't figure out why I couldn't start my Splunk instance up.... I'd set the default logging to Splunk.... Chicken & Egg probs!
But how awesome is docker with portainer and app templates eh?! -
when github autocorrects my pull request message that contains the word splunk.
(or when devrant also decides to) -
So.. I'm doing this 2 week long internship at a pretty big corporation that was looking for high school students for a security training program, and they're having us utilize Splunk (which I've heard about, but never used). Any opinions and/or tips on this tool? :)1
-
Helping out a team, I was documenting some code/processes when I came across several classes that was logging a lot of, IMO, 'junk' that was unnecessary (and I knew wasn't being used in any Splunk alerts/reports)
I offer a refactoring suggestion, simplifying the data being logged, moving the duplicate code to a central location, maybe saving 10~20 lines of code. Didn't think it was a big deal because they were already actively working on the code and it was all new code (nothing deployed to production yet). Sent the suggestion to the lead developer and he responds:
Dev: "Yes, the changes looks fine, but not in scope of the project. Any out of scope work will need to be suggested at the end of the project, reviewed by the team, the project manager and approved by the vice president."
"Out of scope"? Logging data to Splunk needs a vice president's approval? WTF?
YOU PROBABLY HAVE THE PROJECT OPEN IN VISUAL STUDIO RIGHT NOW!!!
Along with the documentation the lead dev said they didn't have time to do, I send his boss and the dev team my suggested changes (before-after screen shots of the code) and offered to do the 2 minutes worth of work (again, this was new code, nothing in production and zero side affects to anything).
I even offered to create the splunk reporting/alerting against the data being logged (another item they said they would not have time to do)
About a minute later the lead dev responds..
Dev: "Those changes look good. I'll have Jake make those changes and we can test the logging when we deploy to dev on Monday. Thanks!"
Of course you will...fracking ass hat.
I'll bet my Battlestar Galactica DVD box set he was going to make the changes himself, brag to his boss how he refactored the code, saving X lines of code..blah blah blah to help *me* with documenting the logging portion. -
!rant
My phone died today...
I was following up on a production ramp-up... Open one of our tracking dashboards through corporate VPN and while loading... It crashed...
Reboot after non-stop reboot... Cleared cache, factory reset... Stabilized for an hour or two and it's crashing again...
I give up... Rest now my friend... You did good, you did good!1 -
Me post a lot of investigation in a slack thread and come to a conclusion
20 mins later engineers in thread post things like they didn't read anything I wrote and come to same conclusion, but involve other parties AGAIN making us all look dumb
Why do they just ignore what I wrote, literally linked the same splunk dashboards and same error numbers that I did. I don't get it.
Why should I care? I hoped I could use this as a way to convince manager once again that I do the things he asks me to, but it seems it's all useless.
Really want a new job but tough times, should be happy I even have a job I guess -
In my previous position I did mostly networking and helped out where I could setting up servers and workspaces. About a month ago our systems admin left the company so I got to spend all day troubleshooting network issues and configuring the proper NAT statements to connect a new hire to our customers networks. I was supposed to be working on migrating our api from the Splunk search head to the indexer to keep it from absolutely tanking the performance of our database.
-
So I'm interested in building a Raspberry Pi stack at home to continue securing and adding my smart home capabilities, 👍
Have ideas for 2/3 but what else could I look to add?
1. Pi. Hole with cloudflared argo proxy for all DNS
2. Home Automation server
3. IPS / IDS like Bro or snort? Or firewall like pfsense?
4. Log server with Splunk agent from other pi's and router....
5. What else?
Ideas in the comments -
Seems like I have to become Splunk developer. My first impression is not the greatest, I like to write stuff from scratch. Some of you guys worked with it before? Some tips?1