Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "parse"
I’m surrounded by idiots.
I’m continually reminded of that fact, but today I found something that really drives that point home.
Gather ‘round, everybody, it’s story time!
While working on a slow query ticket, I perused the code, finding several causes, and decided to run git blame on the files to see what dummy authored the mental diarrhea currently befouling my screen. As it turns out, the entire feature was written by mister legendary Apple golden boy “Finder’s Keeper” dev himself.
To give you the full scope of this mess, let me start at the frontend and work my way backward.
This function allows the user to better see the rows in the API Calls table, for which there is a also search feature — the very thing I’m tasked with fixing.
It’s worth noting that above the search feature are two inputs for a date range, with some helpful links like “last week” and “last month” … and “All”. It’s also worth noting that this table is for displaying search results of all the API requests and their responses for a given merchant… this table is enormous.
This search field for this table queries the backend on every character the user types. There’s no debouncing, no submit event, etc., so it triggers on every keystroke. The actual request runs through a layer of abstraction to parse out and log the user-entered date range, figure out where the request came from, and to map out some column names or add additional ones. It also does some hard to follow (and amazingly not injectable) orm condition building. It’s a mess of functional ugly.
The important columns in the table this query ultimately searches are not indexed, despite it only looking for “create_order” records — the largest of twenty-some types in the table. It also uses partial text matching (again: on. every. single. keystroke.) across two varchar(255)s that only ever hold <16 chars — and of which users only ever care about one at a time. After all of this, it filters the results based on some uncommented regexes, and worst of all: instead of fetching only one page’s worth of results like you’d expect, it fetches all of them at once and then discards what isn’t included by the paginator. So not only is this a guaranteed full table scan with partial text matching for every query (over millions to hundreds of millions of records), it’s that same full table scan for every single keystroke while the user types, and all but 25 records (user-selectable) get discarded — and then requeried when the user looks at the next page of results.
What the bloody fucking hell? I’d swear this idiot is an intern, but his code does (amazingly) actually work.
No wonder this search field nearly crashed one of the servers when someone actually tried using it.
Interviewer: "Using this 2D array and calculate.."
Me: "This input isn't a 2D array though. Do you want me to parse or construct a 2D array then.."
"It is a 2D array."
"Uh.. ok..and if it's not what if we.."
"Look my notes say you must use this input, and treat it as a 2D Array"
"What if I wrote a function for a 2D array similar to this input, but actually a 2D array"
"You must use only the input provided"
Me: does rain dance code for 20 minutes.
Interviewer: "hmm, maybe it wasn't a 2D Array. I like your efforts but that's all the time we have today."
I promise I can code, sometimes. It does help to have correct questions to give correct answers.1
4 hours! four fucking hours! f.o.u.r. h.o.u.r.s.!
It's the amount in the time domain this bug has cost me to fix. The cost in the sanity domain is immeasurable...
I swear, the god damn ass births of devs who coded this abomination should be slowly mutilated and then raped by their own severed limbs.
It took me 4 hours to figure out that their 12 year old binary CLI tool they used to generate PDFs from PHP could not handle neither HTML5 nor some linebreaks at specific places. Some part of it is due to them using REGEX to find and replace HTML tag.
Yes, I am indeed very pissed. And I need a 🥃 or 3
What we learned:
- Don't use REGEX to "parse" HTML
- Don't call random compiled CLI tools from PHP if there are PHP packages to do the same shit9
My God is map development insane. I had no idea.
For starters did you know there are a hundred different satellite map providers?
Just kidding, it's more than that.
Second there appears to be tens of thousands of people whos *entire* job is either analyzing map data, or making maps.
Hell this must be some people's whole *existence*. I am humbled.
I just got done grabbing basic land cover data for a neoscav style game spanning the u.s., when I came across the MRLC land cover data set.
One file was 17GB in size.
Worked out to 1px = 30 meters in their data set. I just need it at a one mile resolution, so I need it in 54px chunks, which I'll have to average, or find medians on, or do some sort of reduction.
Ecoregions.appspot.com actually has a pretty good data set but that's still manual. I ran it through gale and theres actually imperceptible thin line borders that share a separate *shade* of their region colors with the region itself, so I ran it through a mosaic effect, to remove the vast bulk of extraneous border colors, but I'll still have to hand remove the oceans if I go with image sources.
It's not that I havent done things involved like that before, naturally I'm insane. It's just involved.
The reason for editing out the oceans is because the oceans contain a metric boatload of shades of blue.
If I'm converting pixels to tiles, I have to break it down to one color per tile.
With the oceans, the boundary between the ocean and shore (not to mention depth information on the continental shelf) ends up sharing colors when I do a palette reduction, so that's a no-go. Of course I could build the palette bu hand, from sampling the map, and then just measure the distance of each sampled rgb color to that of every color in the palette, to see what color it primarily belongs to, but as it stands ecoregions coloring of the regions has some of them *really close* in rgb value as it is.
Now what I also could do is write a script to parse the shape files, construct polygons in sdl or love2d, and save it to a surface with simplified colors, and output that to bmp.
It's perfectly doable, but technically I'm on savings and supposed to be calling companies right now to see if I can get hired instead of being a bum :P20
To finish my photography portfolio website and get it online. I've been putting this off for YEARS. Just started again (and from scratch) and I've been making some progress for the last couple of days. I don't want to even look at that old project I scrapped, or maybe I will once I finish (read: publish) this one.
My problem before was that I was always looking at the big picture and was trying to figure everything out in one go.
In contrast with that, I now figured out a relatively simple and straightforward way to start off with no back end at all and just use static resources instead (with some logic to parse them every time I "upload" new stuff), which should be fine even in the long run if I end up being too lazy and/or busy to do the back end. In general, I now try to tackle small tasks one by one (even if I don't always write them down and/or track them) and realise that it's better to be done (even not in the best way I imagine it) than to not be done at all. It's as if I learn how to do stuff properly for the first time. Oh, well...5
Until today, I had assumed deploying stuff to prod would NOT be one of my responsabilities in this company. Apparently that's not the case.
Had to deploy my code and pray it didn't break anything. Why is this a big deal at all?
Well you see, there is no repository. At all. No git, no svn, not even duplicate folders. No tests, no pipeline. Just a bunch of CPanels.
Had to manually copy files and folders from the development site to the production site and partially copy a database. "Just drag and drop" were the instructions I was given.
As if using CakePHP2, PHP5 and having to parse fucking Excel files wasn't bad enough, now I have to deal with one of the worst ways to deploy code.
Fuck it, I'm switching on the looking-for-job flag on linkedin.5
Fuck Apache TIKA.
Its supposed to be a "universal file reader" or some shit. Im trying to use it as a PDF/image parser that does OCR when needed and yelds a full-file string. It does so, but the text ends up being IN THE WRONG FUCKING ORDER.
WTF would I want to parse the text out of a PDF in any order that is not the one the text is supposed to be read?!?!
"It is more efficient to work in random ordering", says the docs. No shit, really? Wouldn't it be even more efficient to just spit out random strings? Just as useful and 100% CPU-bound.
"You can add a property to forcefully put the text in the right order". THEN WHY THE FUCK IT IS NOT THE DEFAULT SETTING?
Srsly, what's the use case to a parser that yields scrambled text?!?1
Terraform + helm-chart ... I really ned a break. Who the fuck invented this shit.
The HCL format sucks
The documentation sucks
The dev tools suck
The debug output sucks
But I'm ok with that, I can manage.
But today really it shot the bird ... I can't have a fucking comma in a string? Because idk why the fuck helm-release tries to parse that fucking string and wants to make an array or whatever out of it? Why, you fucking abomination?
Something in the docs? Nah, who reads them anyway.
Because you know it's totally not strange that a string is analyse and oh wait there's a comma in it, the dev surely wants me to make an array out of it, because you know ...
So now I have to escape my fucking comma to prevent it to parse my fucking string. I just want to have a fucking string you hideous monstrosity ....1
I completely *detest* that the MongoDB *shell* is just a fucking jS interpreter with extra API calls sprinkled on top and whoever came up with that idea should have all their commits reverted immediately, working with that thing is a punishment!
I don't even know a way to parse and chew through the json it spits out in my own json viewers, as it's "Extended", and none of my editors understand that!
Ugh, haven't been this frustrated with a tool for a while...5