Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "failing all night"
-
My first job: The Mystery of The Powered-Down Server
I paid my way through college by working every-other-semester in the Cooperative-Education Program my school provided. My first job was with a small company (now defunct) which made some of the very first optical-storage robotic storage systems. I honestly forgot what I was "officially" hired for at first, but I quickly moved up into the kernel device-driver team and was quite happy there.
It was primarily a Solaris shop, with a smattering of IBM AIX RS/6000. It was one of these ill-fated RS/6000 machines which (by no fault of its own) plays a major role in this story.
One day, I came to work to find my team-leader in quite a tizzy -- cursing and ranting about our VAR selling us bad equipment; about how IBM just doesn't make good hardware like they did in the good old days; about how back when _he_ was in charge of buying equipment this wouldn't happen, and on and on and on.
Our primary AIX dev server was powered off when he arrived. He booted it up, checked logs and was running self-diagnostics, but absolutely nothing so far indicated why the machine had shut down. We blew a couple of hours trying to figure out what happened, to no avail. Eventually, with other deadlines looming, we just chalked it up be something we'll look into more later.
Several days went by, with the usual day-to-day comings and goings; no surprises.
Then, next week, it happened again.
My team-leader was LIVID. The same server was hard-down again when he came in; no explanation. He opened a ticket with IBM and put in a call to our VAR rep, demanding answers -- how could they sell us bad equipment -- why isn't there any indication of what's failing -- someone must come out here and fix this NOW, and on and on and on.
(As a quick aside, in case it's not clearly coming through between-the-lines, our team leader was always a little bit "over to top" for me. He was the kind of person who "got things done," and as long as you stayed on his good side, you could just watch the fireworks most days - but it became pretty exhausting sometimes).
Back our story -
An IBM CE comes out and does a full on-site hardware diagnostic -- tears the whole server down, runs through everything one part a time. Absolutely. Nothing. Wrong.
I recall, at some point of all this, making the comment "It's almost like someone just pulls the plug on it -- like the power just, poof, goes away."
My team-leader demands the CE replace the power supply, even though it appeared to be operating normally. He does, at our cost, of course.
Another weeks goes by and all is forgotten in the swamp of work we have to do.
Until one day, the next week... Yes, you guessed it... It happens again. The server is down. Heads are exploding (will at least one head we all know by now). With all the screaming going on, the entire office staff should have comped some Advil.
My team-leader demands the facilities team do a full diagnostic on the UPS system and assure we aren't getting drop-outs on the power system. They do the diagnostic. They also review the logs for the power/load distribution to the entire lab and office spaces. Nothing is amiss.
This would also be a good time draw the picture of where this server is -- this particular server is not in the actual server room, it's out in the office area. That's on purpose, since it is connected to a demo robotics cabinet we use for testing and POC work. And customer demos. This will date me, but these were the days when robotic storage was new and VERY exciting to watch...
So, this is basically a couple of big boxes out on the office floor, with power cables running into a special power-drop near the middle of the room. That information might seem superfluous now, but will come into play shortly in our story.
So, we still have no answer to what's causing the server problems, but we all have work to do, so we keep plugging away, hoping for the best.
The team leader is insisting the VAR swap in a new server.
One night, we (the device-driver team) are working late, burning the midnight oil, right there in the office, and we bear witness to something I will never forget.
The cleaning staff came in.
Anxious for a brief distraction from our marathon of debugging, we stopped to watch them set up and start cleaning the office for a bit.
Then, friends, I Am Not Making This Up(tm)... I watched one of the cleaning staff walk right over to that beautiful RS/6000 dev server, dwarfed in shadow beside that huge robotic disc enclosure... and yank the server power cable right out of the dedicated power drop. And plug in their vacuum cleaner. And vacuum the floor.
We each looked at one-another, slowly, in bewilderment... and then went home, after a brief discussion on the way out the door.
You see, our team-leader wasn't with us that night; so before we left, we all agreed to come in late the next day. Very late indeed.9 -
So I am finally plunging into continuous integration. If I make one more deploy script mistake, I've lost enough time to merit having learned a better solution than bash scripting calling git and rhc and py files I wrote. I have failing tests that are failing because they weren't updated after the million and a half urgent changes in the past 2 months, so it's time to act like I am a TDD fanatic and write the tests correctly. So much work. All from me listening to the constant req changes, listening to the urgency, letting non-devs get under my skin if you will. I'm optimistic in all the wrong places - I think I can write that by end of day let's try it. I'm lazy in the wrong places - I think that I can write that test later, because all I changed was XYZ (which took all night but I said I'd get it as close as possible didn't I?). And I think these handful of bash scripts are good enough to make sure I run tests? But remember, I didn't write the tests or I didn't go back and update them. Or the tests that fail, I'm too lazy. And so much of the tests, I would need to use, idk selenium for, and damnit if I really don't want to dig for element IDs to wait for every time I need an AJAX call.
Okay wow, I really did rant here. And discredited myself a bit lol I need to ignore the wrong lazy and embrace the right lazy. Protect myself from myself and from contributors. It really is, up to me now, to rescue myself from my bad habits. Bad habits perpetuated by clients urgency every day, to change things, that should have been finalized in November if we wanted a stable flipping system in January. It feels like the blind (client) leading the blind (me, when I do dumb shit like rush features out the door half tested).
Anyway all this came out, because I have been reading about continuous integration and stumbled upon this quote. And thought someone might laugh at the anachronism like I did2 -
!rant just a question. Sorry in advance for the long post.
I've been working in IT in Windows infrastructure and networking side of things for my entire career (5years) and recently was hired for a role working with AWS.
We use Macs and we use *nix distros for days. I've only ever dabbled for 'funsies' before with Linux because every previous job I held was a Windows house and f*** all else.
I'm just wondering if anyone here might have some insights as to a great way to learn the Linux environment and to learn it the right way. I'm not the best Windows admin ever and will never claim to be, but I have seen stuff that other people have done that makes me want to swing a brick at someone's head. And I feel that with all of the setup wizards and the "We'll just do it for you." approach that Windows has used since forever it allowed enough wiggle room for people that didn't know what they were doing to f*** sh*t up royally. I'm not familiar enough with Linux to know if this is also a common problem. I know that having literal full-access to every file in your OS can cause a n00b like myself to mess up royal, thus the question about learning Linux the right way.
I vaguely understand the organization of the folders and file structure within Linux, and I know some very basic commands.
sudo rm -rf /*
Just kidding
But All of my co-workers at my new job are like mighty oaks of knowledge while I'm a tiny sapling. And at times I've been intimidated by how little I know, but equally motivated to try and play catch-up.
In addition to all of this, I really want to start learning how to program. I've tried learning multiple times from places like codecademy.com, YouTube tutorials, and codeschool.com but I feel like I'm missing the lesson that explains why to use a certain operation instead of another. Example: if/else in lieu of a switch.
I'm also failing to get the concept of syntax in certain languages I've tried before. Java comes to mind real fast.
The first language I tried teaching myself was C++ from YouTube. I ended up having a fever dream that night about coding and woke up in a cold sweat. Literally, like brain overload or something. I was watching tutorials for like 9 hours straight.
Does anyone know of a training resource that will explain, in terms a 5 year old would understand, what the code is doing and why? I really want to learn but I'm starting to lose steam cause I'm just not getting it.
Thank you in advance for any tips guys and gals. I really appreciate it. Sorry for the ridiculously long questions.5 -
I wonder if you guys can help, nobody else seems to.
Nine months ago I packed my desktop away because I had to move.
I then turned it on a few weeks ago and noticed a curious problem: the audio and video stutters.
I don't know why this happens.
I tried booting into Linux, and didn't see any stuttering. I went back over to windows and updated all the drivers, but nothing worked.
Last night I did a full system wipe, and re-installed windows without leaving any of the old files.
The error still persists.
I have no clue what might be wrong with the computer.
For all I know it's the hardware that's failing, but I don't have any idea which component in the system is leading to this strange behaviour.
Maybe you guys know enough to shed some light on this?23 -
Wanted to get to bed early tonight, but ended up wasting two hours after I moved code from my development machine over to a test system and it was failing. After adding all kinds of logging to figure out where it was failing on the test machine i realized i fixed am error in an input file on my dev machine, but that error in the input fine was still there on the test machine. Another night with little sleep and tomorrow is Monday. 😭
-
After three months of development, my first contribution to the client is going live on their servers in less than 12 hours. And let me say, I shall never again be doing that much programming in one go, because the last week and a half has been a nightmare... Where to begin...
So last Monday, my code passed to our testing servers, for QA to review and give its seal of approval. But the server was acting up and wouldn't let us do much, giving us tons of timeouts and other errors, so we reported it to the sysadmin and had to put off the testing.
Now that's all fine and dandy, but last Wednesday we had to prepare the release for 4 days of regression testing on our staging servers, which meant that by Wednesday night the code had to be greenlight by QA. Tuesday the sysadmin was unable to check the problem on our testing servers, so we had to wait to Wednesday.
Wednesday comes along, I'm patching a couple things I saw, and around lunch time we deploy to the testing servers. I launch our fancy new Postman tests which pass in local, and I get a bunch of errors. Partially my codes fault, partially the testing env manipulating server responses and systems failing.
Fifteen minutes before I leave work on the day we have to leave everything ready to pass to staging, I find another bug, which is not really something I can ignore. My typing skills go to work as I'm hammering line after line of code out, trying to get it finished so we can deploy and test when I get home. Done just in time to catch the bus home...
So I get home. Run the tests. Still a couple failures due to the bug I tried to resolve. We ask for an extension till the following morning, thus delaying our deployment to staging. Eight hours later, at 1AM, after working a full 8 hours before, I push my code and leave it ready for deployment the following morning. Finally, everything works and we can get our code up to staging. Tests had to be modified to accommodate the shitty testing environment, but I'm happy that we're finally done there.
Staging server shits itself for half a day, so we end up doing regression tests a full day late, without a change in date for our upload to production (yay...).
We get to staging, I run my tests, all green, all working, so happy. I keep on working on other stuff, and the day that we were slated to upload to production, my coworkers find that throughout the development (which included a huge migration), code was removed which should not have. Team panics. Everyone is reviewing my commits (over a hundred commits) trying to see what we're missing that is required (especially legal requirements). Upload to production is delayed one day because of this. Ended up being one class missing, and a couple lines of code, which is my bad (but seriously, not bad considering I'm a Junior who was handed this project as his first task at his first job).
I swear to God, from here on out, one feature per branch and merge request. Never again shall I let this happen. I don't even know why it was allowed to happen, it breaks our branch policies. But ohel... I will now personally oppose crap like this too...
Now if you'll excuse me... I'm going to be highly unproductive and rest, because I might start balding otherwise after these weeks... -
I guess you know its time to go when i your print statement writes:
System.out.println("LAST CHANCE 2:" + output); -
#Suphle Rant 3: Road to PHP8, Flow travails
Some primer: Flows is a feature that causes the framework to bypass handling the request now but read it from cache. This cache entry is meant to be populated without warming, based on the preceding request. It's sort of like prefetching but done on the back end
While building Suphle, I made some notes on some chapters about caveats and gotchas I may forget while documenting. One such note was that when users make the Flow request, the framework will attempt to determine who user is, using authentication mechanism defined on the first module (of the modular monolith)
Now, I got to this point during documentation and started wondering whether it's impossible for the originating request to have used a different authentication mechanism, which would result in an empty entry for returning user. I *think* it's possible cuz I've got something else called "route mirroring", where web based routes can be converted to API routes. They'll then return JSON, get served under defined API path, use JWT, all automatically. But I just couldn't connect the dots for the life of me, regarding how any of this could impact authentication on the Flow request
While trying to figure out how to write the test for this or whether it was even necessary (since I had no use case), it struck me that since Flow requests are not triggered by an actual user, any code attempting to read authenticated user will see nothing!
I HATE it when I realize there's ambiguity or an oversight, after the amount of attention and suffering devoted. This, along with a chain of personal troubles set off despondency for a couple of days. No appetite for food or talk. Grudgingly refactored in this update over some days. Wrote some tests, not all passed. More pain. May have to convert them to unit tests
For clarity, my expectation is, I built this. Nothing should be impossible for me
Surprisingly, I caught a somewhat lucky break –an ex colleague referred me to the 1st gig I'm getting in 1+ year. It's about writing a plugin for some obscure forum software. I'm not too excited cuz it's poorly documented and I'll have to do a lot of groping, they use arrays instead of objects etc. There's no guarantee I'll find how to implement all client's requirements
While brooding last night, surfing the PHP subreddit, stumbled on a post about using Rector to downgrade a codebase. I've always been interested in the reverse but didn't have any incentive to fret over it. Randomly googled and saw a post promising a codebase can be upgraded with 3 commands in 5 minutes to PHP 8. Piqued my interest around 12:something AM. Stayed up all night upgrading it, replacing PHPSTAN with Psalm, initializing the guy's project, merging Flow auth with master etc. I think it may have taken 5 minutes without the challenge of getting local dev environment to PHP 8
My mood is much lighter than it was, although the battle is not won yet –image tests are failing. For some weird reason, PHP8 can't read generated test images. Hope I can ride on that newfound lease on life to study the forum and get the features working
I have some other rant but this is already a lot to digest in one sitting. See you in rant #4