Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "diagnostic"
My first job: The Mystery of The Powered-Down Server
I paid my way through college by working every-other-semester in the Cooperative-Education Program my school provided. My first job was with a small company (now defunct) which made some of the very first optical-storage robotic storage systems. I honestly forgot what I was "officially" hired for at first, but I quickly moved up into the kernel device-driver team and was quite happy there.
It was primarily a Solaris shop, with a smattering of IBM AIX RS/6000. It was one of these ill-fated RS/6000 machines which (by no fault of its own) plays a major role in this story.
One day, I came to work to find my team-leader in quite a tizzy -- cursing and ranting about our VAR selling us bad equipment; about how IBM just doesn't make good hardware like they did in the good old days; about how back when _he_ was in charge of buying equipment this wouldn't happen, and on and on and on.
Our primary AIX dev server was powered off when he arrived. He booted it up, checked logs and was running self-diagnostics, but absolutely nothing so far indicated why the machine had shut down. We blew a couple of hours trying to figure out what happened, to no avail. Eventually, with other deadlines looming, we just chalked it up be something we'll look into more later.
Several days went by, with the usual day-to-day comings and goings; no surprises.
Then, next week, it happened again.
My team-leader was LIVID. The same server was hard-down again when he came in; no explanation. He opened a ticket with IBM and put in a call to our VAR rep, demanding answers -- how could they sell us bad equipment -- why isn't there any indication of what's failing -- someone must come out here and fix this NOW, and on and on and on.
(As a quick aside, in case it's not clearly coming through between-the-lines, our team leader was always a little bit "over to top" for me. He was the kind of person who "got things done," and as long as you stayed on his good side, you could just watch the fireworks most days - but it became pretty exhausting sometimes).
Back our story -
An IBM CE comes out and does a full on-site hardware diagnostic -- tears the whole server down, runs through everything one part a time. Absolutely. Nothing. Wrong.
I recall, at some point of all this, making the comment "It's almost like someone just pulls the plug on it -- like the power just, poof, goes away."
My team-leader demands the CE replace the power supply, even though it appeared to be operating normally. He does, at our cost, of course.
Another weeks goes by and all is forgotten in the swamp of work we have to do.
Until one day, the next week... Yes, you guessed it... It happens again. The server is down. Heads are exploding (will at least one head we all know by now). With all the screaming going on, the entire office staff should have comped some Advil.
My team-leader demands the facilities team do a full diagnostic on the UPS system and assure we aren't getting drop-outs on the power system. They do the diagnostic. They also review the logs for the power/load distribution to the entire lab and office spaces. Nothing is amiss.
This would also be a good time draw the picture of where this server is -- this particular server is not in the actual server room, it's out in the office area. That's on purpose, since it is connected to a demo robotics cabinet we use for testing and POC work. And customer demos. This will date me, but these were the days when robotic storage was new and VERY exciting to watch...
So, this is basically a couple of big boxes out on the office floor, with power cables running into a special power-drop near the middle of the room. That information might seem superfluous now, but will come into play shortly in our story.
So, we still have no answer to what's causing the server problems, but we all have work to do, so we keep plugging away, hoping for the best.
The team leader is insisting the VAR swap in a new server.
One night, we (the device-driver team) are working late, burning the midnight oil, right there in the office, and we bear witness to something I will never forget.
The cleaning staff came in.
Anxious for a brief distraction from our marathon of debugging, we stopped to watch them set up and start cleaning the office for a bit.
Then, friends, I Am Not Making This Up(tm)... I watched one of the cleaning staff walk right over to that beautiful RS/6000 dev server, dwarfed in shadow beside that huge robotic disc enclosure... and yank the server power cable right out of the dedicated power drop. And plug in their vacuum cleaner. And vacuum the floor.
We each looked at one-another, slowly, in bewilderment... and then went home, after a brief discussion on the way out the door.
You see, our team-leader wasn't with us that night; so before we left, we all agreed to come in late the next day. Very late indeed.9
Nearly had a crash today driving home and almost had a heart attack. Apparently my car had the heart attack for me and started doing. A speaker test.
So I'm contemplating what just happened and my car's speakers start going BEEEEP BEEEP BOOOOOOOOOOOOP (Subwoofer).
Then the radio came on and switched to a Spanish station.
I looked it up, apparently I had entered diagnostic mode on the infotainment system when I was fiddling with the wheel buttons as a stress relief.
Long story short, the diagnostic mode informed me that my car runs Windows ME!
I would like a new car please, kthxbye.8
*reads JSON license*
"The Software shall be used for Good, not Evil."
Well that's actually a nice license.. if only nuclear research etc could be licensed like that.
Wait actually.. WanBLowS is using XML for its "diagnostic data", right? I always found it so weird that they don't use JSON for that.. but I guess that this is why 🙃20
A couple of weeks ago, I asked the "brand manager" if he knew how to reset printers to their defaults before reconfiguring them, knowing full well that he did not. He assured me that he did. I smiled and let him leave.
He called me yesterday, frantic, because he didn't know how to reconfigure a printer that already had a password. After reminding him of the above, I told him how to put the printer in diagnostic mode and how to navigate the menus. Literally: "Turn the printer off, then hold down the feed paper button while turning the printer on. It will print out a bunch of diagnostics, and a menu at the bottom. Just follow the instructions at the bottom to use the menu"
Apparently following simple instructions is well outside of his abilities. After he spent five minutes fighting with it and complaining, I called him and walked him through powering the printer on while holding down the feed paper button. Terribly difficult.
The next step amounts to "hold down the feed paper button for more than 1 second." He spent ten minutes (ten!) on this unimaginably challenging step, and, frustrated at his inability to outsmart a simple button, he gave up completely.
He literally couldn't follow the instructions on the printout. I've attached a picture to show how ridiculous this is, and it saddens me terribly to report that I'm quite serious. he was literally unable to figure this out.
HE SPENT TEN MINUTES TRYING TO PUSH A BUTTON FOR >1 SECOND! TEN MINUTES!
That's what was too difficult for him! A button! With written instructions!
I can't even.
But the kicker?
Now he and the bossman want me to drive half an hour so I can push a button for ~1.2 seconds because they're utterly incapable.
I'm soo done.
NO I DON'T WANT TO GIVE UP ALL MY DATA JUST TO GET AN EXPLORER DARK THEME!!!
YES I DISABLED TELEMETRY PARTIALLY!!
YES I STILL WANT TO RECEIVE UPDATES REGARDLESS OF WHETHER I EXPRESS MY DESIRES TO NOT BE TRACKED IN FULL!!!
NO I REALLY DON'T WANT TO HEAR SHIT ABOUT "THIS FUCKING QUESTION HAS BEEN ANSWERED SOMEWHERE ELSE"!!!
(https://answers.microsoft.com/en-us... - certified Microshit MOTHERFUCKERS!!!!!)
AND NO I DON'T WANT TO HEAR FROM YOU THAT AFTER RE-ENABLING TELEMETRY THAT MY PRIVACY SETTINGS ARE STILL TOO LOW!!! AND CERTAINLY I DON'T WANT TO SEE YOUR WORTHLESS "FIX ME" SHIT UNABLE TO FIX JACK SHIT!!!
AND LIKE FUCKING HELL DO I WANT TO REINSTALL WANBLOWS, FUCKING KEEP MY SHITTY FILES THAT ARE FUCKING BACKED UP BUT LOSE ALL MY CUSTOM CONFIGURATIONS!!! LIKE FUCKING HELL!!! NOT BECAUSE YOU CAN'T FIX YOUR OWN BLOODY SYSTEM AFTER I DID MY PART TO GIVE MY DATA TO THE SHAFTLORDS AGAIN!!!
FUCK YOU MICROSOFT!!!!23
Around 27 hours at new customer location.
They had a server failure due to incompetence.
They had fired their own IT guy and called us 6 months later because the server stopped responding.
First diagnostic. 2 drives are dead in a raid 5 with one hot spare. Raid controller then proved to be broken once the disks was replaced.
Waiting for new raid controller and installing.
Backup non existing, no one changed dat tape during the 6 months without IT. The tape was just a transparent plastic band, no media left.
Raid config is stored in static ram on controller, no backup!
Several hours in tech support to find out how to rebuild raid config from existing disks.
Proves to be impossible to rebuild raid set due to some checksum failures.
More hours with support to enable some diagnostic read only mode to mirror low level content to external drive.
Then many more hours to copy parts of the tree until it gets an error, restart after that and go on.
In the end we got around 70% back.
During this time I manage to be in contact with the raid manufacturers all support centers, one in europe, one in the us and one in Taiwan, switching each time one if them closed for the night.
The customer later declined a steady support contract due to us being to expensive ;)
Some just don’t want to learn.6
I love Linux, but its community can be so full of incompetent assholes..
Just now I asked in Freenode ##linux how to get the process ID of my current running process in bash. I got my answer - it's a shell built-in called "$$".
Then people start to nitpick some more - why do you need it? How is that different from an exit? - to which my response was.. well I know the whole idea behind exit codes, and I'd use it whenever possible, in all defined behavior that allows my program to terminate itself whenever it can. This pidfile however would be used to exit itself and provide diagnostic information whenever the program enters undefined behavior - a segfault in C language. Scenarios in which I don't have full control over the script's behavior anymore, such as the system entering an unworkable state where the system stalled, still got some binaries in RAM but the rootfs got unwritable, such as now - very helpfully, thanks HP! - when my laptop likely overheated and shat itself. I issued sudo reboot into it, but even that wouldn't issue properly anymore due to the /sbin/poweroff binary becoming inaccessible too. I had to issue a hard power cycle.. one of the few times in which I'm thankful to HP for actually causing shit like this, lol.
Point is, that undefined behavior is what I'm trying to mitigate against. I certainly can't let any files other than diagnostics remain in nonvolatile storage like that, especially when their state should be predictable in order to ensure good operation (like files expressing whether the script is already running or not, i.e. lock files).
Back to that IRC chat. Aside from the answer, I got ridicule from people who probably don't even know how to properly compile a kernel. Ubuntu users, overconfident scum. Sometimes I feel like I should ask questions in channels like #archlinux only, where such incompetency is ridiculed on its own.13
Weirdest thing happened to me today.
My teacher spots me browsing "Hack This Site" and I'm sent to the tech lab for a "diagnostic test" ?????1
I used to work in a tech shop. Old lady brings her laptop in claiming viruses broke her Gmail. I do the diagnostic, it's relatively clean with a bit of browser adware and tracking cookies. I call her and let her know there was nothing wrong with her Gmail and that it's good to go (she approved a tune up). She comes in and gets it. She calls later saying Gmail is still broken. I invite her to bring it in so we can have a look together (knowing for sure she was the problem). So we open up Gmail together and she shows me what she's doing. She's clicking on the sender and getting the contact card instead of the email opening. I show her how to actually open the email. She doesn't understand. I spend twenty more minutes explaining how to open an email. And this is the wk13 kicker, she waits until after twenty minutes to ask what "click" means. I was so done. That lady was too old to be using a computer.
In the before time (late 90s) I worked for a company that worked for a company that worked for a company that provided software engineering services for NRC regulatory compliance. Fallout radius simulation, security access and checks, operational reporting, that sort of thing. Given that, I spent a lot of time around/at/in nuclear reactors.
One day, we're working on this system that uses RFID (before it was cool) and various physical sensors to do a few things, one of which is to determine if people exist at the intersection of hazardous particles, gasses, etc.
This also happens to be a system which, at that moment, is reporting hazardous conditions and people at the top of the outer containment shell. We know this is probably a red herring or faulty sensor because no one is present in the system vs the access logs and cameras, but we have to check anyways. A few building engineers climb the ladders up there and find that nothing is really visibly wrong and we have an all clear. They did not however know how to check the sensor.
Enter me, the only person from our firm on site that day. So in the next few minutes I am also in a monkey suit (bc protocol), climbing a 150 foot ladder that leads to another 150 foot ladder, all 110lbs of me + a 30lb diag "laptop" slung over my shoulder by a strap. At the top, I walk about a quarter of the way out, open the casing on the sensor module and find that someone had hooked up the line feed, but not the activity connection wire so it was sending a false signal. I open the diag laptop, plug it into the unit, write a simple firmware extension to intermediate the condition, flash, reload. I verify the error has cleared and an appropriate message was sent to the diagnostic system over the radio, run through an error test cycle, radio again, close it up. Once I returned to the ground, sweating my ass off, I also send a not at all passive aggressive email letting the boss know that the next shift will need to push the update to the other 600 air-gapped, unidirectional sensors around the facility.11
I practice stress-debugging.
It means that while I'm debugging and still in the phase where I have lots of ideas about what possibly went wrong (thanks to my loyal diagnostic team of ducks), I intentionally increase my discomfort level, forcing my brain to work faster and solve the issue. I don't drink, eat, or go to the loo no matter how desperately I need to, until the bug is fixed.
My colleagues and managers think I'm insane or a closet masochist but nevertheless, they acknowledge it actually works for me.
How weird is that?1
Ran Windows RAM diagnostic tools because I was too lazy to get my Linux USB-stick. Ran for 20 minutes, restarted - "There are hardware problems present."
NO SHIT. No info how many errors, no log file mentioned, no code or anything. Something happened. How retarded can a diagnostic tool be?
Guess laziness gets you punished immediately...1
You know modern cars, they have these computer thingies that tell you when something isn't working with little warning lights.
How useful !
"Take me to repair shop!" it says, and even sets the SatNav route.
Of course, the place might be closed, but still, its trying to help. :-)
Anyhow, by chance just happen to be there getting said car serviced..
Mention the several warning thingies that sprang up on the way in..
Which took twice as long as a normal service, so I was hopeful they was fixing things !
Though every time I go and ask how things are, magically its just been finished and I haven't been waiting for no good reason because no one remembered I was waiting..
No, they didn't fix any of the faults...
Why I asked without getting angry..
Because the diagnostic computer said there wasn't any..
But there was !
Come back when the fault returns they said..
If the fault disappears before their computer gets plugged in, they will just say there isn't a fault..
Apparently on the car there is no fault logging, its either, a fault right now, or no fault at all..
This might explain why a few months ago all the brakes seized up ( Its less than 2 years old, it shouldn't do that ! ), if some computer part is playing up..
So, I'll get my own car diagnostic computer and wait for it to play up, and maybe get some more error codes/etc. to pass on to the car fixing place !
Today's lesson, logs are important !
Also, just because a computer says there isn't something wrong with something, doesn't mean there isn't, so go and check it physically !
And, the customer is always right !
Previously had an issue with a part that had worn out, asked for it to be replaced.
Went to pick up the car, asked if the part had been replaced.
No it hadn't !
They thought it wasn't worn out !
I asked, did they look at it ?
No they didn't was their reply..
I told them, if you take it off, you can see its worn out.
I watched them take it off, ( After much struggling, to which I remarked that yes, when I took it off to look at it, I had similar trouble ! ) they then saw it was worn out and put a new one on !
They then struggled to put the new one on, which I also mentioned I had the same trouble.
Being as it was my first time taking off one of those parts, you could be forgiven to think I was just a beginner.
But you might think a professional would be able to do a better job..
You just can't get the staff these days !14
I'm trying to deploy an Azure Function via Visual Studio. VS gives me this error:
Publish has encountered an error.
Publish has encountered an error. We were unable to determine the cause of the error. Check the output log for more details.
A diagnostic log has been written to the following location:
Let's check the log then!
"Microsoft.WebTools.Shared.Exceptions.WebToolsException: Publish has encountered an error. We were unable to determine the cause of the error. Check the output log for more details. "
Fucking piece of crap1
✓ running server on windows 10
✓ running postgresql server
✓ running ngrok server
✓ running android studio
✓ running 15 chrome tabs
✓ running ubuntu virtualbox
✓ scraping a website with python in ubuntu
✓ laptop freezes
✓ gently slam the laptop so it can go to sleep mode
✓ try to login so it can unfreeze
✓ get a purple screen of death
✓ system crashed and has to restart
✓ get a blue screen of death for memory diagnostic tool
✓ all unsaved work is lost
✓ gtx 1060
✓ 8gb ram
✓ acer laptop of $2400
✓ regret buying acer laptop11
(probably a stretch and only Aussies will understand half of what I'm saying with this one buuut)
Not at all, I did a certificate 3 from TAFE in information and technology with a prominent amount of the course being on software diagnostic and web development and to this day have used absolutely 0 of the knowledge I gained and half of it is now deprecated and obsolete anyway ¯\_(ツ)_/¯4
PM: Page load times are up. It might be your API blocking requests.
Me: Possible, though most of my load testing was performed against a random sample of requests at nearly 5 times the expected average per minute rate. I can add some logs but I think this is a red herring theory.
PM: Yes add logs, and New Relic and get it released ASAP.
Me: To confirm, you want me to make a bunch of diagnostic changes to a mission-critical API the day before Holiday break...
I felt like that guy from the Apollo 13 team warning Gene Kranz that the LEM was not built for this and I can make no guarantees... Released an hour before we went home for the weekend.
You know your doing a quick and dirty job for a diagnostic tool when every method you create in alternate namespaces is static...
Suppose I should probably stop doing that aye...3
Windows diagnostic tool wants to search for a solution to my desktop's network problems on the internet :/
Plus, I discovered my Windows installation created 40+ ethernet connections on its' own behalf... Time to wipe Windows.. :/
Luckily I use Linux on my Notebook...2
Windows Memory Diagnostic Tool did not bring me good news.... Hello 2017 you seem to be serving me a crap sandwich already..
An App/service which would help doctors and patients to schedule consultation hours. Also an patient would always have medical relevant documents as digital documents. No more waiting for faxes, no more lost diagnostic sheets everything is always in reach. No more search for an MRI appointment and no more overfilled waiting rooms. Better programms for docs! The ones ive seen in hospitals and doctors offices looks horrific..1
Build a automatic diagnostic system that sequences your DNA and runs blood tests in hours, then creates personalised medication for your illness.
I accidentally installed a malware on systrm some chinese software now appears with devices with this pc. And i am able to boot upto windows home screen. When services are starting my oc shows BSOD page fault. I ran diagnostic tools it showed 2100 error in hd 0. Can anyone help. I need to recover imp files from OS drive :(10