Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "data integrity"
Good Morning!, its time for practiseSafeHex's most incompetent co-worker!
Todays contestant is a very special one.
*sitcom audience: WHY?*
Glad you asked, you see if you were to look at his linkedin profile, you would see a job title unlike any you've seen before.
*sitcom audience oooooooohhhhhh*
were not talking software developer, engineer, tech lead, designer, CTO, CEO or anything like that, No No our new entrant "G" surpasses all of those with the title ..... "Software extraordinaire".
*sitcom audience laughs hysterically*
I KNOW!, wtf does that even mean! as a previous dev-ranter pointed out does this mean he IS quality code? I'd say he's more like a trash can ... where his code belongs
*ba dum tsssss*
Ok ok, lets get on with the show, heres some reasons why "G" is on the show:
One of G's tasks was to build an analytics gathering library for iOS, similar to google analytics where you track pages and events (we couldn't use google's). G was SO good at this job he implemented 2 features we didn't even ask for:
- If the library was unable to load its config file (for any reason) it would throw an uncatchable system integrity error, crashing the app.
- If anything was passed into any of the functions that wasn't expected (null, empty array etc.) it would crash the app as it was "more efficient" to not do any sanity checks inside the library.
This caused a lot of issues as some of the data needed to come from the clients server. The day we launched the app, within the first 3 hours we had over 40k crash logs and a VERY angry client.
Now, what makes this story important is not the bugs themselves, come on how many times have we all done something stupid? No the issue here was G defended all of this as the right thing to do!
.. and no he wasn't stoned or drunk!
G claimed if he couldn't get the right settings / params he wouldn't be able to track the event and then our CEO wouldn't have our usage data. To which I replied:
"So your solution was to not give the client an app instead? ... which also doesn't give the CEO his data".
He got very angry and asked me "what would you do then?". I offered a solution something like why not have a default tag for "error" or "unknown" where if theres an issue, we send up whatever we have, plus the file name and store it somewhere else. I was told I was being ridiculous as it wasn't built to track anything like that and that would never work ... his solution? ... pull the library out of the app and forget it.
... once again giving everyone no data.
G later moved onto another cross-platform style project. Backend team were particularly unhappy as they got no spec of what needed to be done. All they knew was it was a single endpoint dealing with very complex model. There was no Java classes, super classes, abstract classes or even interfaces, just this huge chunk of mocked data. So myself and the lead sat down with him, and asked where the interfaces for the backend where, or designs / architecture for them etc.
His response, to this day frightens me ... not makes me angry, not bewilders me ... scares the living shit out of me that people like this exist in the world and have successful careers.
G: "hhhmmm, I know how to build an interface, but i've never understood them ... Like lets say I have an interface, what now? how does that help me in any way? I can't physically use it, does it not just use up time building it for no reason?"
us: "... ... how are the backend team suppose to understand the model, its types, integrate it into the other systems?"
G: "Can I not just tell them and they can write it down?"
I'll just pause here for a moment, as you'll likely need to read that again out of sheer disbelief
I've never seen someone die inside the way the lead did. He started a syllable and his face just dropped, eyes glazed over and he instantly lost all the will to live. He replied:
" wel ............... it doesn't matter ... its not important ... I have to go, good luck with the project"
*killed the screen share and left the room*
now I know you are all dying in suspense to know what happened to that project, I can drop the shocking bombshell that it was in fact cancelled. Thankfully only ~350 man hours were spent on it
... yep, not a typo.
G's crowning achievement however will go down in history. VERY long story short, backend got deployed to the server and EVERYTHING broke. Lead investigated, found mistakes and config issues on every second line, load balancer wasn't even starting up. When asked had this been tested before it was deployed:
G: "Yeah I tested it on my machine, it worked fine"
lead: "... and on the server?"
G: "no, my machine will do the same thing"
lead: "do you have a load balancer and multiple VM's?"
G: "no, but Java is Java"
... and with that its time to end todays episode. Will G be our most incompetent? ... maybe.
Tune in later for more practiceSafeHex's most incompetent co-worker!!!33
Oh, man, I just realized I haven't ranted one of my best stories on here!
So, here goes!
A few years back the company I work for was contacted by an older client regarding a new project.
The guy was now pitching to build the website for the Parliament of another country (not gonna name it, NDAs and stuff), and was planning on outsourcing the development, as he had no team and he was only aiming on taking care of the client service/project management side of the project.
Out of principle (and also to preserve our mental integrity), we have purposely avoided working with government bodies of any kind, in any country, but he was a friend of our CEO and pleaded until we singed on board.
Now, the project itself was way bigger than we expected, as the wanted more of an internal CRM, centralized document archive, event management, internal planning, multiple interfaced, role based access restricted monster of an administration interface, complete with regular user website, also packed with all kind of features, dashboards and so on.
Long story short, a lot bigger than what we were expecting based on the initial brief.
The development period was hell. New features were coming in on a weekly basis. Already implemented functionality was constantly being changed or redefined. No requests we ever made about clarifications and/or materials or information were ever answered on time.
They also somehow bullied the guy that brought us the project into also including the data migration from the old website into the new one we were building and we somehow ended up having to extract meaningful, formatted, sanitized content parsing static HTML files and connecting them to download-able files (almost every page in the old website had files available to download) we needed to also include in a sane way.
Now, don't think the files were simple URL paths we can trace to a folder/file path, oh no!!! The links were some form of hash combination that had to be exploded and tested against some king of database relationship tables that only had hashed indexes relating to other tables, that also only had hashed indexes relating to some other tables that kept a database of the website pages HTML file naming. So what we had to do is identify the files based on a combination of hashed indexes and re-hashed HTML file names that in the end would give us a filename for a real file that we had to then search for inside a list of over 20 folders not related to one another.
So we did this. Created a script that processed the hell out of over 10000 HTML files, database entries and files and re-indexed and re-named all this shit into a meaningful database of sane data and well organized files.
So, with this we were nearing the finish line for the project, which by now exceeded the estimated time by over to times.
We test everything, retest it all again for good measure, pack everything up for deployment, simulate on a staging environment, give the final client access to the staging version, get them to accept that all requirements are met, finish writing the documentation for the codebase, write detailed deployment procedure, include some automation and testing tools also for good measure, recommend production setup, hardware specs, software versions, server side optimization like caching, load balancing and all that we could think would ever be useful, all with more documentation and instructions.
As the project was built on PHP/MySQL (as requested), we recommended a Linux environment for production. Oh, I forgot to tell you that over the development period they kept asking us to also include steps for Windows procedures along with our regular documentation. Was a bit strange, but we added it in there just so we can finish and close the damn project.
So, we send them all the above and go get drunk as fuck in celebration of getting rid of them once and for all...
Next day: hung over, I get to the office, open my laptop and see on new email. I only had the one new mail, so I open it to see what it's about.
Lo and behold! The fuckers over in the other country that called themselves "IT guys", and were the ones making all the changes and additions to our requirements, were not capable enough to follow step by step instructions in order to deploy the project on their servers!!!
[Continues in the comments]26
A Month ago...
Me: when are you going to complete the report
Friend: we can do it in minutes
Me: you can't Ctrl + c and Ctrl +v as there is plagiarism check
Friend: we have spin bot
Me: you do that now itself . if something happens? You can join me .
Friend: just chill
Me: done with report
Friend: feeding it to spin bot!
Feeds text related to database security....
Garbage collector == city worker
SQL statements == SQL explanation
SQL queries == SQL interrogation
SQL injection == SQL infusion
Attack == assault
Malicious == noxious
Data integrity == information uprightness
Sensitive == touchy
Me: told you so...
**spin not == article rewriter3
Got in trouble today during a Data integrity meeting, everyone kept talking about data massaging. "Massage this data", "massage that data table", "insert massaged data".
Finally I just blurted out, "yeah massage it all you want but how do I get a data happy ending?"
I thought it was hilarious. The other DBA and backend devs thought it was jokes, my manager... Not so much.
Apparently, I need to keep "thoughts and comments about data happy endings to myself moving forward".
I've optimised so many things in my time I can't remember most of them.
Most recently, something had to be the equivalent off `"literal" LIKE column` with a million rows to compare. It would take around a second average each literal to lookup for a service that needs to be high load and low latency. This isn't an easy case to optimise, many people would consider it impossible.
It took my a couple of hours to reverse engineer the data and implement a few hundred line implementation that would look it up in 1ms average with the worst possible case being very rare and not too distant from this.
In another case there was a lookup of arbitrary time spans that most people would not bother to cache because the input parameters are too short lived and variable to make a difference. I replaced the 50000+ line application acting as a middle man between the application and database with 500 lines of code that did the look up faster and was able to implement a reasonable caching strategy. This dropped resource consumption by a minimum of factor of ten at least. Misses were cheaper and it was able to cache most cases. It also involved modifying the client library in C to stop it unnecessarily wrapping primitives in objects to the high level language which was causing it to consume excessive amounts of memory when processing huge data streams.
Another system would download a huge data set for every point of sale constantly, then parse and apply it. It had to reflect changes quickly but would download the whole dataset each time containing hundreds of thousands of rows. I whipped up a system so that a single server (barring redundancy) would download it in a loop, parse it using C which was much faster than the traditional interpreted language, then use a custom data differential format, TCP data streaming protocol, binary serialisation and LZMA compression to pipe it down to points of sale. This protocol also used versioning for catchup and differential combination for additional reduction in size. It went from being 30 seconds to a few minutes behind to using able to keep up to with in a second of changes. It was also using so much bandwidth that it would reach the limit on ADSL connections then get throttled. I looked at the traffic stats after and it dropped from dozens of terabytes a month to around a gigabyte or so a month for several hundred machines. The drop in the graphs you'd think all the machines had been turned off as that's what it looked like. It could now happily run over GPRS or 56K.
I was working on a project with a lot of data and noticed these huge tables and horrible queries. The tables were all the results of queries. Someone wrote terrible SQL then to optimise it ran it in the background with all possible variable values then store the results of joins and aggregates into new tables. On top of those tables they wrote more SQL. I wrote some new queries and query generation that wiped out thousands of lines of code immediately and operated on the original tables taking things down from 30GB and rapidly climbing to a couple GB.
Another time a piece of mathematics had to generate all possible permutations and the existing solution was factorial. I worked out how to optimise it to run n*n which believe it or not made the world of difference. Went from hardly handling anything to handling anything thrown at it. It was nice trying to get people to "freeze the system now".
I build my own frontend systems (admittedly rushed) that do what angular/react/vue aim for but with higher (maximum) performance including an in memory data base to back the UI that had layered event driven indexes and could handle referential integrity (overlay on the database only revealing items with valid integrity) or reordering and reposition events very rapidly using a custom AVL tree. You could layer indexes over it (data inheritance) that could be partial and dynamic.
So many times have I optimised things on automatic just cleaning up code normally. Hundreds, thousands of optimisations. It's what makes my clock tick.4
Coworker: so once the algorithm is done I will append new columns in the sql database and insert the output there
Me: I don't like that, can we put the output in a separate table and link it using a foreign key. Just to avoid touching the original data, you know, to avoid potential corruption.
C: Yes sure.
< Two days later - over text >
C: I finished the algo, i decided to append it to the original data in order to avoid redundancy and save on space. I think this makes more sense.
No. Learn this principal:
" The original data generated by the client, should be treated like the god damn Bible! DO NOT EVER CHANGE ITS SCHEMA FOR A 3RD PARTY CALCULATION! "
Put simply: D.F.T.T.O
Don't. Fucking. Touch. The. Origin!5
TLDR: Find a website that requires a subscription but doesn't check their cookies' integrity, now I'm on a website for free.
>wonder if it's possible to intercept browser data
>find that none of these really fit me
>go to youtube, search how to intercept POST data
>find something called BurpSuite
>Totally what I was looking for
>start testing BurpSuite on devrant
>I can see all the data that's being passed around
>wonder if I can use it on a website where my subscription recently ended.
>try changing my details without actually inputting anything into the website's form
>send the data to the server
>refresh the page
>Huh what's this?
>must be a userID
>increment it by 1 and change some more details
>refresh the page
>didn't work 😐
>Hmmm, let's try forwarding the data to the browser after incrementing the uid
>can see the details of a different user
>except I see his details are the details I had entered previously
>begin incrementing and decrementing the uid
>realize that the uid is hooked up to my browsers local cookie
>can see every user's details just by changing my cookie's uid
>Wonder if it's possible to make the uid persistent without having to enter it in every time
>look up cookie manipulator
>go back to website
>examine current uid
>it's my uid
>change it to a different number
>refresh the webpage
>IT FUCKING WORKED
>MFW I realize this website doesn't check for cookie integrity
>MFW I wonder if there are other websites that are this fucking lazy!!!
>MFW they won't fix it because it would require extra work.
>MFuckingFW they tell me not to do it again in the future
>realize that since they aren't going to fix it I'll just put myself on another person's subscription.5
If anyone has been keeping up with my data warehouse from hell stories, we're reaching the climax. Today I reached my breaking point and wrote a strongly worder email about the situation. I detailed 3 separate cases of violated referential integrity (this warehouse has no constraints) and a field pulling from THE WRONG FLIPPING TABLE. Each instance was detailed with the lying ER diagram, highlighted the violating key pairs, the dangers they posed, and how to fix it. Note that this is a financial document; a financial document with nondeterministic behavior because the previous contractors' laziness. I feel like the flipping harbinger of doom with a cardboard sign saying "the end is near" and keep having to self-validate that if I was to change anything about this code, **financial numbers would change**, names would swap, description codes would change, and because they're edge cases in a giant dataset, they'll be hard to find. My email included SQL queries returning values where integrity is violated 15+ times. There's legacy data just shoved in ignoring all constraints. There are misspellings where a new one was made instead of updating, leaving the pk the same.
Now I'd just put sorting and other algos, but the data is processed by a crystal report. It has no debugger. No analysis tools. 11 subreports. The thing takes an hour to run and 77k queries to the oracle backend. It's one of the most disgusting infrastructures I've ever seen. There's no other solution to this but to either move to a general programming language or get the contractor to fix the data warehouse. I feel like I've gotten nowhere trying to debug this for 2 months. Now that I've reached what's probably the root issue, the office beaucracy is resisting the idea of throwing out the fire hazard and keeping the good parts. The upper management wants to just install sprinklers, and I'm losing it.
Today, I had a segmentation fault in a data structure, so I wrote a function to test the integrity of the data structure. It worked, and I found and fixed the bug. However, the test still complained about the integrity. After debugging some very strange errors for 5 hours, I discover that there was a bug in the integrity test. The data structure works just fine. FML.1
What the actual fuck...
What kind of API does not do data integrity validation, and allows me to subscribe a user to a newsletter list with a non-existant list id ?
That's some fucking bullshit. fucktards at www.make.as1
Today we learned:
Don't run a backup integrity check on terabytes of data in multiple jobs your cpu can't handle.
And i cannot abort the process...
Guess i have to go outside, in the cold, brrrr.1
Recently at my work everyone got all hyped about blockchain. I've spent years studying it and especially crypto so I like to think I know a lot about it. Now every time some feature is discussed (like tracking video views) someone non tech shouts out "Oh we can use a blockchain!" And I have to spend half an hour convincing them why it's a bad idea in this particular case. Like we don't need to share that data, we don't need to ensure integrity without trust etc. I mean a blockchain is a great piece of tech but please stop trying to apply it everywhere...
It's like people think that the blockchain features at its most simplified level don't work anywhere else. E.g. as if stuff you put into MySQL suddenly changes because data integrity is not a 'feature' in that sense...2
How much do you earn for your skill set in your country vs your cost of living?
See how much I & others earn.
Recently I became aware of just how massive the gap in developers earnings are between countries. I'd love to calculate a fixed score for income vs cost of living.
I know this stuff is sensitive to some so if you prefer just post your score (avg income p/m after tax / cost of living).
I'm not shy so I'll go first:
Normal Rate (Long term): $23
Consulting / Short term: $30-$74
Pen Test: $1500 once off.
Pen Test Fixes: consulting rate.
Simple work/websites: min $400+
Family & Friends: Dev friends are usually free (when mutually beneficial). Family and others can fuck off, even if they can pay (I pass their info to dev friends with fair warning).
Experience: 9 years
Country: South Africa
Developer rareness in country: Very Rare (+-90 job openings per job seeker).
Middle class wage in country: $1550 p/m (can afford a new car, decent apartment & some luxuries like beer/eating out).
Employment type: Permanent though I can and do freelance occasionally.
Client Locality: Mostly local.
Developer Type: Web Developer (True web dev - I do anything web related from custom HTTP servers to sockets, services, advanced browser api's, apps & more).
STACKS / SKILLSETS
I'M PROFICIENT IN:
I DABBLE WITH:
ASP.net, C++, ruby, GO, nginx, tesseract
application architecture, automation, integrations, db's, real time data, advanced browser apps/extensions (webRTC, canvas etc).
Avg income p/m after tax: $2250
Cost of living (car+rent+food): $1200
*Note: For integrity when calculating my cost of living I excluded debt repayments and only kept my necessities which are transport, food & shelter.
I really hope you guy's post your results, it would be great to get an idea of which is really the worst / best country to be a developer in.20
When your company has data integrity issues and they expect hacky workarounds instead of fixing the data
"Our supplier asks that you double the number of php child processes for this fpm pool"
"Are you aware, that that would lead to about 100% of memory overcommit, taken the current limit of 128MB/child, and that if a lot of them started at once, the system would probably go for OOM-Kill, which would most probably kill your database, that still runs on 100% MyISAM tables that do not support transactions, and you'd have to kiss your data integrity goodbye, right?"
"Uh... Nevermind then"
I get that some people are not IT-versed, but really... Hire someone who knows what they are doing and doesn't live 20 years in the past, god damn it!
I'm gonna kill him...no don't defend him, you lot are meant to be on my side!
IT'S FUCKING HUNTING SEASON.
I know, I know, you want context but I just can't now. I'm completely annoyed. Fuck data integrity, just duplicate rows randomly cos fuck you.
Work is about to turn into a murder mystery...without the mystery 😒
He doesn't even work here anymore, hope he forgets his password and gets locked out of everything he has.
A prototype being used as production code written in procedural PHP with the code drawn using echo and MySQL (not MySQLi) all mixed together and the configuration with world readable database stored as config.inc.
All backed by a database with no foreign keys or data integrity of any kind.
My boss is being a stupid cunt. To give you a background we were facing issues with our Collections system. First week December 2019, I and a colleague of mine came up with a new efficient collections architecture. My colleague and I started to Code and create automation scripts mid December and completed it in First week of Jan 2020. This PoC version was supposed to be just between the Dev team(App Dev and Back end, also one from the Ops side to verify the data). I did not receive any feedback on the actual collections system and the data integrity but during this time all they’ve done is take meetings with no real outcome. I raised this and the only email I got is data is looking fine when I know it is not.Now in First week of Feb, he is stressing us to go ahead and deploy the architecture in Production and we have not done any Code Review, Static Code analysis, any real tests on Code and deployment scripts. Have not discussed any metrics for our dashboard and alerting. I have no idea how to handle this cunt. I have even asked for resources to atleast productionalize the code and move ahead the deployment and still no out come. I’ll go in a meeting with him in an hour, I will be very blunt and tell him that whatever he is doing is a foolish way and maybe resign in couple of weeks6
Is a masters in statistics worth it?
A bit of background:
I got my bachelor in actuarial math (statistics for insurance risk), then found machine learning and got a couple of gigs in software development and data engineering. I became my previous employers the go to guy for questions about data integrity and structure.
Now I am heading to a new job that specializes in ML for gambling. And while I love the math, I really see myself doing more software development and system architecture work (with some analysis). I already started this masters program, so I got less than a year to finish, but starting to feel like its a waste of my time, but also, I dont want to just quit it.
I had a discussion with my colleagues about my bachelor thesis.
Together we created within the last 18 month a REST-API where we use LDAP/LMDB as database (tree structured storage). Of course our data is relational and of course we have a high redundancy there. It's a 170 call API and I highly doubt that it's actually conforming REST.
Ensuring DB integrity is done in the backend and coding style there is "If we change it at one place, let's make sure to also change it everywhere else", so you get a good impression how much of spaghetti code we have there.
Now I proposed to code a solution in my bachelor thesis where we use a relational database (we even have an administrated Oracle DB with high availability) and have a write-only layer to also store the data in LDAP but my colleagues said that "it would add too much complexity to the system".
Instead I should write the relational layer myself and fetch the data somehow from the existing LDAP tree.
What the actual fuck, spaghetti code is what makes the system really unnecessarily complex so that no one will understand that code in 2 years.
Congratulations, you just created legacy code that went into production in 2018 while not accepting the opportunity to let that legacy code get eliminated.
Now good luck with running and maintaining that system and it's inconsistencies.1
As you have seen this week again we have a problem with data imports from <tool 1> and <tool 2>. Clearly the existing setup is not working. We are now encouraging more and more people from the project to start using our portal and we need a stable and up to date environment.
Can we have a call please to discuss the following topics:
1. Portal data integrity
3. Project Communication
4. Development efficiency
<Drama Queen PM>1
# ./symfony test:unit
Propel-Exception: Unable to execute DELTE ALL statement [...] Integrity constraint violation: 1451 Cannot delete or update a parent row: a foreign key constraint fails.
WHY ist a UNIT TEST reaching out to a REAL data base?
And who in their right mind would create a different data base schema for the tests?
This was with a clone of the real thing. Removing the FK results in double PK-errors...5