Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "etl"
Boss: I need you to start on this new project, how long will it take?
Me: well, hard to say with no specs whatsoever...
Boss: just your best guess
Me: 4 to 6 month I guess?
Boss: so 3 months it is. When can you start?
Me: no specs, sir...and I said 4 to 6
Boss: the specs are almost ready, I know you can simplify it
Boss: just start with the basic infrastructure already
(4 months later)
Boss: here you are the specs, they might change a little in behaviour and design, but all the main stuff is here
(Hands me a A3 with a total of 21 pictures in InDesign)
Me: o....Kay. what happens when I click here?
Boss: oh, we should still talk about the app workflow, I'll get you updated
(2 weeks and 16 total rewrites of the "specs" later)
Boss: you told me it was a 2 months job, why aren't you finished yet? We must deploy in 3 weeks!
HR Project update meeting.
1) recap of previous meeting
2) overview of what we will discuss in next meeting.3
Boss: hey mech eng, we need a setup modification
Mech: no prob, boss, we can have it prototyped in 3 weeks, industrialized in 2 months
Boss: oh, right then, go on
Boss: hey, Soft Eng, we need a functional modification
Soft Eng: no prob boss, we can develop it in 4 hours, tested and documented in 2 hours and integrated and shipped to the client by tomorrow morning
Boss: what??? One day?? You just need to edit a couple of lines of code! I want it ready in ten minutes top!5
My second job. I've been hired as a research specialist, not a developer, but they found out I could code during the interview.
Boss: hey, so we have our main product line that shares the control panel for all the models, right?
Me: unh, yeah
B: well, we need to know how it works.
B: yeah, I mean, we should have a manual with all the tech documentation so we know how everything works
M: ...and didn't you handle the tech docs to the developers?
B: uh...no, actually we requests feature to the devs (note: external company) with a phone call, or email...now we need the specs.
The other company (which is part of the same group) handles me the source code.
It is a huge, 25k lines of spaghetti written by at least 7 people, one at a time, uncommented.
After a month I produce a 50page doc with how everything works, after actually compiling my resignation letter 3 times.
M: boss, here the docs
B: fine, I'll take a look
15 mins later
B: this is not what we need! You cannot describe those algorithm like this!
( I described the algorithms with their block flow, with a punctual verbal description)
M: umh.. So how do you need it?
B: we need an excel table, with all the entering conditions on the rows and all the exit conditions in columns, and the description of the condition of work in the crossing cells!
M: are you even serious?7
One month ago. By email.
Boss: so, this client A has a problem with one of our devices and he believes that it's a bug in the software.
Me: all right then, what happens?
Boss: well, he says that the parameter P in the option menu does not changes the device's behaviour as it is supposed to. I'll forward you his mail. You will find attached an excell file with the results of his test performed with and without the parameter active.
Me: < read mail, read excell file > well, boss, his tests are performed in completely different conditions, how could he expect to infer a meaningful results from this?
Boss: damn, you are right. Send him a test plan and follow up.
Me: < send detailed test plan >
No answer in a week. Then...
Client: hi, there, I made this tests, I attached the excell with the results, can you check the software now?
Me: < read another bullshit filled excell file with none of the suggested test performed >
You know what? Just download the procedures you are using from the device and send them by mail, specifying the software version you are using so we can perform some tests here in the lab and get yo a solution asap.
No response. For a MONTH.
Super Boss: client A still has his problem, how could possibly be that it takes more than A FUCKING MONTH to solve his issue??
C: hey mate, what's the best tool to open up this 31.1M rows x 106 cols CSV file?
M: Umh...Pandas DataFrame or R DataTable I guess?
C: all right, Excell will do, thanks!
M: erhm...yeah, anytime?11
Designer decide to have a meeting with stakeholders about UX/UI workflow for control panel of our new embedded system (no framework, no library, gui is bit per bit rendered on frame buffer).
A week later, still nothing on my table, not a mail, not a call. Meanwhile I wrote a framework, the control system, renderer, and messaging queues between tasks.
Wrote some widgets, a layout system and a view swtching mechanism, and a separate stack control to use a "back" button.
Now I am stuck for I do not know what should happen when clicking on various (non obvious) items on the touchscreen.
Fine, I'll ask the designer.
"Oh, I will write the workflow next week" (ETA time, 2 weeks. Seriously? You take a week to draw on Adobe Illustrator 20 screenshot with text and I have another week to write it from scratch in C?)
Ok, while you write it, just tell me what should happen when I click an active item.
"Well, we didn't talk about that. We just decided the colour of the icons on the screen..."
For fuck sake...8
How do you deal with open space offices?
I find it quite difficult to focus, the constant chatting, the constant questions, phone ringing, surprise meeting, more question, arrays of interruptions and questions again. I believe I would be a lot more productive if left alone in the total, undiscontinued silence.
Have you found your escape, your zen, your inner focus? Please share, I need some ideas16
Me: What filters would you like on this report?
VP: Here's the logic for the filter I want.
Me: Great! Anything else?
... Days of DB, ETL, and Report refactoring later ...
Me: Here's the updated report!
VP: Can we add this other filter?
Me: (You're welcome...)
Looking for advice, serious advices.
I work in C.
Also, I work in Python.
I have worked for a couple of year in C++.
I have a fair knowledge of the Data Science workflow, and some experience in Machine Learning.
I have tinkered with some other languages (Java, Ruby, Go, JS among the others, nothing serious nor professional)
I'm the kind of person who needs constant problems to face in order to keep engaged, satisfied, happy. And I need to learn new stuff, or refining my knowledge constantly, or I stagnate. I believe that this is true for quite a share of people here.
I would like to spend some spare time (I seldom have) in a project. Personal projects are rarely good enough to improve one's cv, so I thought I could partecipate in some Open Source projects.
Does anyone here have some suggestion about some interesting and satisfying OSProject, or some general suggestion on the matter?
It would be so apreciated.3
I was out sick the day an urgent ETL job I was building would be due, so it got reassigned. When I return, I find most of my code commented out and replaced.
The first step was rewritten, with a comment that reads "Made changes to run faster." What used to be a single execution lasting 30 seconds was now a 4 step process taking 5 minutes, and yielding identical results.
Being a one-time execution (not a recurring job), I'm left wondering why they thought execution speed was even an issue, let alone what about their redesign they felt was an improvement...2
Since I'm back to working for myself again and haven't been able to find a reliable hire, I'm alone. In this bubble, no one cares/sees/appreciates my backend code and I just realized that's why I've been slacking so bad on this ETL process. No one gives a shit about it but me. If I build an interface, I get kudos and everyone celebrates, but working on a three server process with layers of abstraction, auto-scaling, etc...and people just wonder if I'm jerking off all day.
Sometimes it sucks to be a lone ranger.2
Time to switch to offline and hide in some dark corner to get work done. Tired of all the IM’s and coming over to my desk from 1 person for “critical” work. If they’re all critical then none of them are truly critical. If you sit on the data for 2 months, and then today is the day it becomes critical and the compliance issue is because of your ineptitude then its a you problem not an IT problem. Then on top of that you submit your data to be loaded in the incorrect request form and spreadsheet format you can go fuck yourself asking this be done in an hour. It could be done in 15 minutes if you had it in the correct format as specified in the 20 meetings over the past year which removed all manual analysis and automated the entire process you idiot. Now I have to get it into the correct format in that hour so I don’t have to do the analysis for you.
I have other things to do besides your etl tickets, like finding the actual problems in our actual critical applications. You know the ones where the VP’s of this giant corporation start calling if they go down.
Sorry for the rambling guys.
I’m not a web programmer; I’m an application and SQL developer. So when I’m tasked with scrapping a web site for an ETL feed, I thought it would just be a ton of substring and Post/Get calls.
Used a wrong filter during loading of a table in ETL. Did not test and migrated to production. 80% of users had empty reports.
Had to stay awake till 4AM to get it fixed.
Realized an important lesson -
' A test in time saves nine'
Interviewed with a company, it was a direct hire SQL Dev/Analyst role(ETL,BI etc). Had three interviews in a row all of which went great. We laughed, I was able to answer every technical question with no problem. Each person clearly enjoyed the interview, I ended up going over the specified amount of time set aside for the interview... Still didn't get the job. They said "There is no doubt he can do the job, but we don't think he's passionate enough about the position." What?!?! So confused. It's also odd to me because every job before this If I had an in person interview I was offered the job... I don't get it.6
So I did this https://devrant.io/rants/797965/... which works fine until medium sized data.
However for large data the ETL pegs a 6 core Xeon (2.2GHz) with 50GB of ram. Because of it ends up doing six threaded compares, so 12 different data sets. Other than "pull less data", any tips?
Code (C#) is basically a Linq multi column join between two DataTables and when the compared columns don't match it returns as a var which is turned into a third DataTable to be SqlBulk loaded into the DB.
Table1 is external API return data (no windowing) and Table2 is from our DW.7
Most hacky things I've ever done:
A windows scheduled task that kicks off a massive as fuck ETL job, riddled with errors. Damn thing had a mind of its own and only ran whenever it felt like it. Client was happy, deadlines were met, boss moved me to another task.
Fuck this.. I have to tell you this annoying wtf..
Hi btw this is my first rant so pls dont blame me :)
I am working on an etl project for our company to connect data sources like netbase, similar web, etc. to alooma (a data pipeline).
Now I got the task to add another data source called BrightEdge to it.
BUT WHAT THE ACTUAL FUCK.. IT TAKES 3 MONTH TO GET AN ACCOUNT. AND U KNOW WHAT.. I DIDNT EVEN GET THE API CREDENTIALS. THIS IS MY FINAL PROJECT FOR MY TRAINING TO BECOME AN IT SPECIALIST.3
Finished writing an ETL job in dev and scheduled it right before I left.
About an hour later I get an email it failed, now I get to wait till Monday to figure out why.1
I was studying a lot the last year, i learned a lot about Machine Learning/Deep Learning, Data Gathering, Data Analysis, ETL, Model Architecture Design, Training, Fine Tuning, Backend Development, DataBases, API Development, ORMs, Rest, GraphQL, OAuth, CI/CD, Docker, Deployment to Production environments like Heroku, Git and more stuff i dont remember while writing this. I built and keep adding stuff to my Github Portafolio.
Im not able to get a job. I started looking for jobs as Data Scientists, no response never. I take a look at freelancer sites, nothing seems to fit my skills. And when there is a minimal fit, they always want a Full Stack Web Developer, i dont know Frontend Development, i dont like do it.
Dont know what to do or how to land any job.
My options aeems to be:
1.Learn Frontend Dev and work as Full Stack in underpaying freelance jobs
2.Keep applying to Remote-Only startups, but they still wants people with 3+ years of experience.
i cant work in my city, here are not any company startup hiring no one, we are 30 years in the past here.
What you do in my place?10
Before my vacation I’d been chatting with one of our dbas about an etl tool we needed for a customer we’d already signed all the contracts with saying we would provide one for a historical database of old data. They had been looking at one from SAP but in typical fashion a license was worth more than the actual contract.
Anyway long story short on the weekend before I went back to work I rattled together a little python proof of concept using a couple of azure databases and when I went back demo’d it to the pm and dba they loved it and we built on the poc to have a working loader which saved us about £30k by not buying the SAP product and just wrote our own.
Getting out of bed at 11pm to fix an etl failure. I was just about to get off devrant and go to sleep... (the second sentence was !true
The customer wants to migrate his old store into WooCommerce. Here's a MySQL dump with 130 tables and no documentation on how they're related.
You also have to scrape all of the couple thousand product images off their site because they don't want their old dev knowing, so you can't just have FTP access...1
I had used a computer since the win 3.1 days and I fooled around with VB on win 95 or 98. I didn't know it was going to be my passion until i wrote a whole data structures library in c++ based on my double linked list i wrote for a class. I called it the ETL, for easy template library (like the STL was hard!!). Thats when i knew i had a knack for it and began really learning.
Imagine enabling verbose logging for a complex ETL process that typically takes 8 hours to run but has been failing for some reason after running for about 7 hours. Naturally, you want to check the log file to find out what went wrong.
Now imagine not having read access to the log file.
My team has a Database Admin 2 position open on the Arvest Career site. We are looking for someone with Data Warehousing/Data Integration background with SQL Server, ETL, SSIS, or equivalent. Also looking for a physical DBA with background in SQL Server, performance tuning, partitioning, DR/HA, Database migrations, dB refresh, dB restore, building out clusters.
The company wants to implement a CRM with the cleansing rules built and managed by the business. What software did you buy or stack did you build on; for business directed data cleansing? We are a Microsoft shop but we can adapt is the solution is in the “magic quadrant”.
"No, the Client doesn't like stored procedures so we have done all our TL parts of the ETL using a bunch of views on top of views on top of views."
Wish I could have been here at the start so I could have pushed back, sigh.
Siiiggghhhh, yet the client is anal about performance and even consistency in SSIS packages.....siiiggghhhhhh but we dont have SHOWPLAN permissions or even sp_who2 access...siiiigggghhhhhh.
If i expanded one of the final views, would be like 1k lines. For the amount of data, we move, there shouldn't be any noticeable processing time but it can take anywhere from 10mins to an hour.2
So, the PowerQuery type system appears to be a Joke.
For those you that aren't familiar with PowerQuery, it's the ETL language that is used in PowerBI, and some other parts of the MS PowerPlatform. It was formerly known as the M Language.
The language has a type system, that includes records (think hashes) and tables, which are, for practical purposes, a list of records.
The wonderful M language specification document states that:
"Any value that is a record conforms to the intrinsic type record, which does not place any restrictions on the field names or values within a record value. A record-type value is used to restrict the set of valid names as well as the types of values that are permitted to be associated with those names."
Except that the restriction is only to the set of valid names, and the language interpreter doesn't throw an error when I place a number into a text field, but also doesn't do any sort of implicit conversion. This is all hunky-dory, until you then try to load the data into the Tabular Model that underlies the query engine, which does expect the values to be of the type that is specified, and it throws an error.
But PowerBI, in its infinite wisdom, doesn't actually *record* the error, it merely tells you the error exists, and tells you to go back to the query editor to list the errors thrown up by the powerquery engine. Which, as previously stated, doesn't throw up an error for this instance.
So I've spent all afternoon trying to work out why my queries aren't loading, because I have an error that doesn't exist. fml.
[You can follow this issue on the communtiy feedback site here: https://community.powerbi.com/t5/... ]
Question about PR best practices.
I work for an analytics company and often have to implement new ETL steps. The data transformations in these steps can be complex and the major changes are usually 500+ lines up to around 2000 (the last one had 765 lines just for schemas).
What's the best way to split up the changes into multiple PRs, bearing in mind that it isnt guaranteed that a file won't change as the change is built up?
* Canonical Data Models for Metrics and Reporting
* ETL, Table, and API designs to blend Legacy data into Cloud data via said Canonical Models
* Teaching n-Tier and Domain Driven Development models
Welcome to the Office of the CIO.
Incomptent 3rd party "services" that you have to integerate with their non-consistent API error codes and useless 'new, more optimized' half broken ETL pipelines
Best : creating a fully customizable, performance-oriented ETL service from ground up.
Bad : developing in xamarin forms in Android .
Worse : porting said xamarin forms app to ios.
Sometimes i feel that i am the only person who knows the type-2 trick in a dwh design. Why don't you read a book about dwh101 before doing your work? I am not a genius and its not a rocket science.
Was trying to figure out the seemingly unending line of scripts that call scripts scripts and I commented out the core functionality of a job in production. Stayed their for weeks. The script in question? Validates that correct data is loading from an etl job.
Walmart Retail Link has been down for five hours. No word on when it will be back up. All our work is halted until then.
Any guy with ETL tool background?
Is it worth learning Talend?
Our workplace decided to use this for data integration but I'm not sure this thing is currently used or not.
I've also researched a bit about other alternatives but as I've no background in this area I'm unable to decide.