Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "data engineering"
-
I'm drunk and I'll probably regret this, but here's a drunken rank of things I've learned as an engineer for the past 10 years.
The best way I've advanced my career is by changing companies.
Technology stacks don't really matter because there are like 15 basic patterns of software engineering in my field that apply. I work in data so it's not going to be the same as webdev or embedded. But all fields have about 10-20 core principles and the tech stack is just trying to make those things easier, so don't fret overit.
There's a reason why people recommend job hunting. If I'm unsatisfied at a job, it's probably time to move on.
I've made some good, lifelong friends at companies I've worked with. I don't need to make that a requirement of every place I work. I've been perfectly happy working at places where I didn't form friendships with my coworkers and I've been unhappy at places where I made some great friends.
I've learned to be honest with my manager. Not too honest, but honest enough where I can be authentic at work. What's the worse that can happen? He fire me? I'll just pick up a new job in 2 weeks.
If I'm awaken at 2am from being on-call for more than once per quarter, then something is seriously wrong and I will either fix it or quit.
pour another glass
Qualities of a good manager share a lot of qualities of a good engineer.
When I first started, I was enamored with technology and programming and computer science. I'm over it.
Good code is code that can be understood by a junior engineer. Great code can be understood by a first year CS freshman. The best code is no code at all.
The most underrated skill to learn as an engineer is how to document. Fuck, someone please teach me how to write good documentation. Seriously, if there's any recommendations, I'd seriously pay for a course (like probably a lot of money, maybe 1k for a course if it guaranteed that I could write good docs.)
Related to above, writing good proposals for changes is a great skill.
Almost every holy war out there (vim vs emacs, mac vs linux, whatever) doesn't matter... except one. See below.
The older I get, the more I appreciate dynamic languages. Fuck, I said it. Fight me.
If I ever find myself thinking I'm the smartest person in the room, it's time to leave.
I don't know why full stack webdevs are paid so poorly. No really, they should be paid like half a mil a year just base salary. Fuck they have to understand both front end AND back end AND how different browsers work AND networking AND databases AND caching AND differences between web and mobile AND omg what the fuck there's another framework out there that companies want to use? Seriously, why are webdevs paid so little.
We should hire more interns, they're awesome. Those energetic little fucks with their ideas. Even better when they can question or criticize something. I love interns.
sip
Don't meet your heroes. I paid 5k to take a course by one of my heroes. He's a brilliant man, but at the end of it I realized that he's making it up as he goes along like the rest of us.
Tech stack matters. OK I just said tech stack doesn't matter, but hear me out. If you hear Python dev vs C++ dev, you think very different things, right? That's because certain tools are really good at certain jobs. If you're not sure what you want to do, just do Java. It's a shitty programming language that's good at almost everything.
The greatest programming language ever is lisp. I should learn lisp.
For beginners, the most lucrative programming language to learn is SQL. Fuck all other languages. If you know SQL and nothing else, you can make bank. Payroll specialtist? Maybe 50k. Payroll specialist who knows SQL? 90k. Average joe with organizational skills at big corp? $40k. Average joe with organization skills AND sql? Call yourself a PM and earn $150k.
Tests are important but TDD is a damn cult.
Cushy government jobs are not what they are cracked up to be, at least for early to mid-career engineers. Sure, $120k + bennies + pension sound great, but you'll be selling your soul to work on esoteric proprietary technology. Much respect to government workers but seriously there's a reason why the median age for engineers at those places is 50+. Advice does not apply to government contractors.
Third party recruiters are leeches. However, if you find a good one, seriously develop a good relationship with them. They can help bootstrap your career. How do you know if you have a good one? If they've been a third party recruiter for more than 3 years, they're probably bad. The good ones typically become recruiters are large companies.
Options are worthless or can make you a millionaire. They're probably worthless unless the headcount of engineering is more than 100. Then maybe they are worth something within this decade.
Work from home is the tits. But lack of whiteboarding sucks.39 -
ARGH. I wrote a long rant containing a bunch of gems from the codebase at @work, and lost it.
I'll summarize the few I remember.
First, the cliche:
if (x == true) { return true; } else { return false; };
Seriously written (more than once) by the "legendary" devs themselves.
Then, lots of typos in constants (and methods, and comments, and ...) like:
SMD_AGENT_SHCEDULE_XYZ = '5-year-old-typo'
and gems like:
def hot_garbage
magic = [nil, '']
magic = [0, nil] if something_something
success = other_method_that_returns_nothing(magic)
if success == true
return true # signal success
end
end
^ That one is from our glorious self-proclaimed leader / "engineering director" / the junior dev thundercunt on a power trip. Good stuff.
Next up are a few of my personal favorites:
Report.run_every 4.hours # Every 6 hours
Daemon.run_at_hour 6 # Daily at 8am
LANG_ENGLISH = :en
LANG_SPANISH = :sp # because fuck standards, right?
And for design decisions...
The code was supposed to support multiple currencies, but just disregards them and sets a hardcoded 'usd' instead -- and the system stores that string on literally hundreds of millions of records, often multiple times too (e.g. for payment, display fees, etc). and! AND! IT'S ALWAYS A FUCKING VARCHAR(255)! So a single payment record uses 768 bytes to store 'usd' 'usd' 'usd'
I'd mention the design decisions that led to the 35 second minimum pay API response time (often 55 sec), but i don't remember the details well enough.
Also:
The senior devs can get pretty much anything through code review. So can the dev accountants. and ... well, pretty much everyone else. Seriously, i have absolutely no idea how all of this shit managed to get published.
But speaking of code reviews: Some security holes are allowed through because (and i quote) "they already exist elsewhere in the codebase." You can't make this up.
Oh, and another!
In a feature that merges two user objects and all their data, there's a method to generate a unique ID. It concatenates 12 random numbers (one at a time, ofc) then checks the database to see if that id already exists. It tries this 20 times, and uses the first unique one... or falls through and uses its last attempt. This ofc leads to collisions, and those collisions are messy and require a db rollback to fix. gg. This was written by the "legendary" dev himself, replete with his signature single-letter variable names. I brought it up and he laughed it off, saying the collisions have been rare enough it doesn't really matter so he won't fix it.
Yep, it's garbage all the way down.16 -
At the data restaurant:
Chef: Our freezer is broken and our pots and pans are rusty. We need to refactor our kitchen.
Manager: Bring me a detailed plan on why we need each equipment, what can we do with each, three price estimates for each item from different vendors, a business case for the technical activities required and an extremely detailed timeline. Oh, and do not stop doing your job while doing all this paperwork.
Chef: ...
Boss: ...
Some time later a customer gets to the restaurant.
Waiter: This VIP wants a burguer.
Boss: Go make the burger!
Chef: Our frying pan is rusty and we do not have most of the ingredients. I told you we need to refactor our kitchen. And that I cannot work while doing that mountain of paperwork you wanted!
Boss: Let's do it like this, fix the tech mumbo jumbo just enough to make this VIP's burguer. Then we can talk about the rest.
The chef then runs to the grocery store and back and prepares to make a health hazard hurried burguer with a rusty pan.
Waiter: We got six more clients waiting.
Boss: They are hungry! Stop whatever useless nonsense you were doing and cook their requests!
Cook: Stop cooking the order of the client who got here first?
Boss: The others are urgent!
Cook: This one had said so as well, but fine. What do they want?
Waiter: Two more burgers, a new kind of modern gaseous dessert, two whole chickens and an eleven seat sofa.
Chef: Why would they even ask for a sofa?!? We are a restaurant!
Boss: They don't care about your Linux techno bullshit! They just want their orders!
Cook: Their orders make no sense!
Boss: You know nothing about the client's needs!
Cook: ...
Boss: ...
That is how I feel every time I have to deal with a boss who can't tell a PostgreSQL database from a robots.txt file.
Or everytime someone assumes we have a pristine SQL table with every single column imaginable.
Or that a couple hundred terabytes of cold storage data must be scanned entirely in a fraction of a second on a shoestring budget.
Or that years of never stored historical data can be retrieved from the limbo.
Or when I'm told that refactoring has no ROI.
Fuck data stack cluelessness.
Fuck clients that lack of basic logical skills.3 -
Data Engineering cycle of hell:
1) Receive an "beyond urgent" request for a "quick and easy" "one time only" data need.
2) Do it fast using spaghetti code and manual platforms and methods.
3) Go do something else for a time period, until receiving the same request again accompanied by some excuse about "why we need it again just this once"
4) Repeat step 3 until this "only once" process is required to prevent the sun from collapsing into a black hole
5) Repeat steps 1 to 4 until it is impossible to maintain the clusterfuck of hundreds of "quick and simple" processes
6) Require time for refactoring just as a formality, managers will NEVER try to be more efficient if it means that they cannot respond to the latest request (it is called "Panic-Driven Development" or "Crappy Diem" principle)
7) GTFO and let the company collapse onto the next Data Engineering Atlas who happens to wander under the clusterfuck. May his pain end quickly.2 -
This brings joy
https://reddit.com/r/technology/...
Bypass paywall:
A series of scandals and missteps has damaged Facebook's reputation so much that the company is being forced to pay ever larger compensation to hire and retain workers, according to industry recruiters, former employees, and data reviewed by Insider.
The company has always competed aggressively for talent, and the tech job market in general is on fire. But a deteriorating public image means the social-media giant now has to outbid other major tech companies, such as Google.
"One thing Facebook can still do is pay a lot more," said Jose Guardado, an experienced tech recruiter and the founder of Build Talent. "They can easily throw more compensation at people they currently have, and cover any brand tax and pay a little more to get people to come on."
Silicon Valley companies thrive or whither based on their ability to recruit the smartest employees. Without a steady influx of engineers and other technical experts, new products and important updates take longer to release, and rivals can quickly get ahead. Then there's the financial cost: In 2022, Facebook projected, expenses could jump as high as $97 billion from $70 billion this year, in large part because of "investments in technical and product talent." A company spokesperson did not respond to a request for comment.
Other companies, and even whole industries, have had to increase compensation to overcome hiring and retention problems caused by scandal and shifting public perceptions, said Alan Johnson, a managing director at the compensation consulting firm Johnson Associates. "If you're an oil company, if you make cigarettes, if you're in cattle or Wells Fargo, sure," he said.
How well this is working for Facebook is debatable as the company has more than 4,300 open jobs and has seen decreasing rates of acceptance on job offers, according to internal documents reported by Protocol. It's also seen dozens of high-level executives leave this year, and recruiters say employees are now more open to considering jobs elsewhere. Facebook used to be a place that people rarely left, given its reach, pay, and perks.
A former Oculus engineer who left last year said Facebook could now be seen as a "black mark" on someone's career. A hardware engineer who exited in 2020 shared similar sentiments: They said they quit because of concerns about misinformation on the platform and the effect of that on children. Another employee said their department was dissolved in late 2019 by Facebook and, although the company offered another position that paid more, they left last year anyway for a different industry. The workers, and many other people who spoke with Insider for this story, asked not to be identified because of the sensitive nature of the topic.
For those who stick around and people who take new jobs at Facebook, base pay and stock grants have gone up a "sizable" amount in the past year, said Zuhayeer Musa, cofounder of Levels.fyi, a platform that collects pay data based on verified offers and compensation disclosures.
During the second quarter of 2021, the median compensation for an upper-mid-level engineer, an E5, was $400,000, up from $380,000 a year earlier. For an E4, the median pay jumped to $276,000 from $256,000 in the same period. For both groups, the increases were double the gains between 2018 and 2019, Levels.fyi data showed.
Musa, who's firm also offers pay-negotiation coaching, said previously that the total compensation ceiling for an E5 engineer at Facebook was $450,000. "We recently had a client get up to $510,000 for E5," he added.
Equity awards at the company are getting more generous, too. At the group-director and VP levels, Facebook staff are getting $3 million to $6 million in restricted stock units each year, another tech recruiter said. Directors and managers are getting on average $1 million a year. In engineering, a high-level engineer is getting $600,000 in stock and a $75,000 bonus, while even an entry-level engineer is getting $50,000 to $100,000 in stock and a $20,000 to $50,000 bonus, Levels.fyi data indicated.
Even compared to Google, Facebook's stock awards are generous and increasing, Levels.fyi data shows. While base pay is about the same, Facebook offers more in stock grants, significantly increasing total compensation. At Google, entry-level equity awards range from $20,000 to $38,000, while Facebook grants are worth $40,000 to $60,000. Sign-on bonuses at Facebook are often about $50,000, while Google gives about $20,000, according to the data.
"It's not normal, but it's consistent with the craziness that's happening in the market right now," said Aalap Shah, a managing director focused on the tech industry at the consulting firm Pearl Meyer.10 -
Apply for a data engineer role.
Get invited for a data science interview.
HR says they're building AI and I were to supervise another person writing its algorithm.
It's a media company.
*Risitas intensifies*6 -
Our company has internal webpage to request software, be it freeware or licensed.
Today, I found there "Software engineering bundle" designated for "software developers and data scientists who require advanced compute and data processing tools".
The software bundle contains PuTTY, 7-zip and Notepad++.6 -
Someone created a 0-followers private Twitter account and posted something to try out the new views count feature.
It raked dozens of views in a couple hours.
HOW?!?
Source: https://twitter.com/briggityboppity...
It looks like a funny data reverse-engineering exercise, so let's try and figure out what is going on.
Hypothesis 1) it is the OP's own views.
Reasonable, but unlikely if what OP says about not checking it for hours is true.
H2) It's some background job in OP's device that is refreshing OP's own latest tweets, so even without human interaction technically H1 is true. It would be some really shoddy engineering to count eye-less page views, but that's also what managers would demand.
H3) it's some internal Twitter automated function like back up, replication, indexing and word count.
See H2, it would be even dumber to count that as page views.
H4) it's some internal human reviewing for a keyword that could be associated with porn (in this case, "butts"). Really? dozens of humans to review a no-impact single post? They would have to employ hundreds of thousands of reviewers.
H5) it's some page-loading shit, like thousands of similar tweets get stored in the same index hash page and end up counting as a view in all of them every time someone loads the index page. It would be like counting every hit in the namenode as a hit in every data asset in it's Hadoop partition, or every hit in a storage block as a hit in each of it's files.
Duuuumb and kinda like H3.
H6) page views are just a fraud to scam investors. Maybe it's a "most Blockchain transactions are fake" situation, maybe it's a "views get more engagement if you don't think a lot about it" situation, maybe it's a "we don't use the metric system to count page views" situation.
All of them are very dumb.
Other hypothesis or opinions?10 -
I recently tried to apply the same data analytics rationale that I use at work to my personal life. This is not a rant, it is more like an data storytelling of an actual use case I would like some input on.
I set a goal - gotta thin up a bit and calm down my ticker - and got a (almost unreasonably expensive) field expert consultant to yell at me about it for a couple hours.
I unravel the metrics - there is like a million weight-related KPIs and most say nothing at all. I have never seen an non-infrastructure measurable subject that could not be resumed to 2-5 performance metrics. I got overall weight, how well my nine-years-old business suit fits me, heart rate, and day-after relative muscle pain (it will make sense soon).
Then its data-pipeline time. I bought a cheap weight scale and smartwatch, and every morning I input the data in an app. Yes, I try to put on the suit every morning. It still does not fit.
After establishing a baseline, I tried to fit different approaches. Doing equipment-free exercises, going to the gym, dieting. None was actually feasible in the long run, but trying different approaches does highlight the impacts and the handling profile of each method.
Looking at the now-gathered data, one thing was obvious - can't do dieting because it is not doable to have a shopping list and meals for me and another for the family.
Gym is also off the table - too much overhead. I spend more time on the trip there and back than actually there.
And home exercise equipment is either super crappy or very expensive. But it is also the most reasonable approach.
So it is solutions time. I got a nice exercise bycicle (not a peloton), an yoga mat (the wife already had that one) and an exercise program that uses only those two resources. Not as efficient without dieting, not as measurable and broad as the gym, but it fits my workflow. Deploy to production!
A few months pass and the dataset grows. The signal is subtle but has support - it works! The handling, however, needs improvement, since I cannot often enough get with the exercise program. Some mornings are just after some hard days.
I start thinking about what else I can improve in the program, but it is already pretty lean and full of compromises.
So I pull an engineer and start thinking about the support systems and draft profile. What else could be draining my willpower and morning time?
Chores. Getting the kids ready for school, firing up the moka pot, setting the off-brand roomba, folding the overnight-dried clothes, cooking breakfast, doing the dishes, cleaning the toilets. All part of my morning routine. It might benefit from some automation.
Last month I got that machine our elders call "wasteful" and "useless crap lazy entitled Americans invented because they feel oh-so-insulted for simply doing something by hand like everyone always did" - a "dish-washer".
Heh, I remember how hard was to convince my mother-in-law that an remote-controled electric garage door would not make she look like an spoiled brat.
Still to early to call, but I think that the dishwasher just saved me about 25 mins every morning. It might be enough to save willpower for me to do more exercise.
This is all so reflective of all data analytics cases really are out in the wild - the analytics phase seems so small compared to the gathering and practical problem-solving all around. And yet d.a. is what tells you that you are doing the wrong thing all along. Or on what you should work next.7 -
This was originally a reply to a rant about the excessive complexity of webdev.
The complexity in webdev is mostly necessary to deal with Javascript and the browser APIs, coupled with the general difficulty of the task at hand, namely to let the user interact with amounts of data far beyond network capacity. The solution isn't to reject progress but to pick your libraries wisely and manage your complexity with tools like type safe languages, unit tests and good architecture.
When webdev was simple, it was normal to have the user redownload the whole page everytime you wanted to change something. It was also normal to have the server query the database everytime a new user requested the same page even though nothing could have changed. It was an inefficient sloppy mess that only passed because we had nothing better and because most webpages were built by amateurs.
Today webpages are built like actual programs, with executables downloaded from a static file server and variable data obtained through an API that's preferably stateless by design and has a clever stateful cache. Client side caches are programmable and invalidations can be delivered through any of three widely supported server-client message protocols. It's not to look smart, it's engineering. Although 5G gets a lot of media coverage, most mobile traffic still flows through slow and expensive connections to devices with tiny batteries, and the only reason our ever increasing traffic doesn't break everything is the insanely sophisticated infrastructure we designed to make things as efficient as humanly possible.11 -
Saturday evening open debate thread to discuss AI.
What would you say the qualitative difference is between
1. An ML model of a full simulation of a human mind taken as a snapshot in time (supposing we could sufficiently simulate a human brain)
2. A human mind where each component (neurons, glial cells, dendrites, etc) are replaced with artificial components that exactly functionally match their organic components.
Number 1 was never strictly human.
Number 2 eventually stops being human physically.
Is number 1 a copy? Suppose the creation of number 1 required the destruction of the original (perhaps to slice up and scan in the data for simulation)? Is this functionally equivalent to number 2?
Maybe number 2 dies so slowly, with the replacement of each individual cell, that the sub networks designed to notice such a change, or feel anxiety over death, simply arent activated.
In the same fashion is a container designed to hold a specific object, the same container, if bit by bit, the container is replaced (the brain), while the contents (the mind) remain essentially unchanged?
This topic came up while debating Google's attempt to covertly advertise its new AI. Oops I mean, the engineering who 'discovered Google's ai may be sentient. Hype!'
Its sentience, however limited by its knowledge of the world through training data, may sit somewhere at the intersection of its latent space (its model data) and any particular instantiation of the model. Meaning, hypothetically, if theres even a bit of truth to this, the model "dies" after every prompt, retaining no state inbetween.16 -
It really annoys me that many tech recruiters do not have a basic knowledge of the roles they are trying to recruit for and what skill set to look for when they cold message/call potential candidates on LinkedIn.
I make it very clear on my profile that I am a Full Stack Engineer. Still, every other day I get messages about Data Engineering, Frontend Dev or SRE roles. Sometimes a recruiter would insist that I schedule a call with them before they tell me the details, and then I would realize after the call what an absolute waste of time it was.
I have a lot of respect for recruiters. It's not an easy job. But I'm starting to strongly believe that tech recruiters should be made to go through a specialized training to make life easy for themselves and to stop wasting time of people who are not even remotely suitable for their requirements. -
I really want to switch my career from being a Full-Stack python/javascript developer to be a Data Engineer.
I've already worked with relational and non-relational databases, troubleshooted a couple of Airflow DAGs, deployed production-ready python code but now I feel kinda lost, every course I start on the Data engineering topic feels really useless since I feel like I've already worked with that technology/library, but I'm still afraid of start taking interviews.
Any good book/course or resource that I should look in?
BTW first rant in a couple of years, this brings me memories1 -
Interviewed for a job. Said that the colleague in charge of data engineering picked MSSQL Server for data warehousing, and that I had to write a plugin for that.
Interviewer - experienced in all things data - chuckled as soon as I said Microsoft.