Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "etl"
-
Boss: I need you to start on this new project, how long will it take?
Me: well, hard to say with no specs whatsoever...
Boss: just your best guess
Me: 4 to 6 month I guess?
Boss: so 3 months it is. When can you start?
Me: no specs, sir...and I said 4 to 6
Boss: the specs are almost ready, I know you can simplify it
Me: ...
Boss: just start with the basic infrastructure already
(4 months later)
Boss: here you are the specs, they might change a little in behaviour and design, but all the main stuff is here
(Hands me a A3 with a total of 21 pictures in InDesign)
Me: o....Kay. what happens when I click here?
Boss: oh, we should still talk about the app workflow, I'll get you updated
(2 weeks and 16 total rewrites of the "specs" later)
Boss: you told me it was a 2 months job, why aren't you finished yet? We must deploy in 3 weeks!
Me: ...34 -
HR Project update meeting.
Duration: 1h
Content:
1) recap of previous meeting
2) overview of what we will discuss in next meeting.2 -
Boss: hey mech eng, we need a setup modification
Mech: no prob, boss, we can have it prototyped in 3 weeks, industrialized in 2 months
Boss: oh, right then, go on
---
Boss: hey, Soft Eng, we need a functional modification
Soft Eng: no prob boss, we can develop it in 4 hours, tested and documented in 2 hours and integrated and shipped to the client by tomorrow morning
Boss: what??? One day?? You just need to edit a couple of lines of code! I want it ready in ten minutes top!5 -
My second job. I've been hired as a research specialist, not a developer, but they found out I could code during the interview.
Boss: hey, so we have our main product line that shares the control panel for all the models, right?
Me: unh, yeah
B: well, we need to know how it works.
M: sorry?
B: yeah, I mean, we should have a manual with all the tech documentation so we know how everything works
M: ...and didn't you handle the tech docs to the developers?
B: uh...no, actually we requests feature to the devs (note: external company) with a phone call, or email...now we need the specs.
Me: omg
...
The other company (which is part of the same group) handles me the source code.
It is a huge, 25k lines of spaghetti written by at least 7 people, one at a time, uncommented.
After a month I produce a 50page doc with how everything works, after actually compiling my resignation letter 3 times.
M: boss, here the docs
B: fine, I'll take a look
15 mins later
B: this is not what we need! You cannot describe those algorithm like this!
( I described the algorithms with their block flow, with a punctual verbal description)
M: umh.. So how do you need it?
B: we need an excel table, with all the entering conditions on the rows and all the exit conditions in columns, and the description of the condition of work in the crossing cells!
M: are you even serious?7 -
One month ago. By email.
Boss: so, this client A has a problem with one of our devices and he believes that it's a bug in the software.
Me: all right then, what happens?
Boss: well, he says that the parameter P in the option menu does not changes the device's behaviour as it is supposed to. I'll forward you his mail. You will find attached an excell file with the results of his test performed with and without the parameter active.
Me: < read mail, read excell file > well, boss, his tests are performed in completely different conditions, how could he expect to infer a meaningful results from this?
Boss: damn, you are right. Send him a test plan and follow up.
Me: < send detailed test plan >
No answer in a week. Then...
Client: hi, there, I made this tests, I attached the excell with the results, can you check the software now?
Me: < read another bullshit filled excell file with none of the suggested test performed >
You know what? Just download the procedures you are using from the device and send them by mail, specifying the software version you are using so we can perform some tests here in the lab and get yo a solution asap.
No response. For a MONTH.
Super Boss: client A still has his problem, how could possibly be that it takes more than A FUCKING MONTH to solve his issue??
Me:...4 -
C: hey mate, what's the best tool to open up this 31.1M rows x 106 cols CSV file?
M: Umh...Pandas DataFrame or R DataTable I guess?
C: all right, Excell will do, thanks!
M: erhm...yeah, anytime?11 -
!Rant
Designer decide to have a meeting with stakeholders about UX/UI workflow for control panel of our new embedded system (no framework, no library, gui is bit per bit rendered on frame buffer).
A week later, still nothing on my table, not a mail, not a call. Meanwhile I wrote a framework, the control system, renderer, and messaging queues between tasks.
Wrote some widgets, a layout system and a view swtching mechanism, and a separate stack control to use a "back" button.
Now I am stuck for I do not know what should happen when clicking on various (non obvious) items on the touchscreen.
Fine, I'll ask the designer.
"Oh, I will write the workflow next week" (ETA time, 2 weeks. Seriously? You take a week to draw on Adobe Illustrator 20 screenshot with text and I have another week to write it from scratch in C?)
Ok, while you write it, just tell me what should happen when I click an active item.
"Well, we didn't talk about that. We just decided the colour of the icons on the screen..."
For fuck sake...8 -
Me: What filters would you like on this report?
VP: Here's the logic for the filter I want.
Me: Great! Anything else?
VP: Nope!
... Days of DB, ETL, and Report refactoring later ...
Me: Here's the updated report!
VP: Can we add this other filter?
Me: (You're welcome...) -
!Rant
How do you deal with open space offices?
I find it quite difficult to focus, the constant chatting, the constant questions, phone ringing, surprise meeting, more question, arrays of interruptions and questions again. I believe I would be a lot more productive if left alone in the total, undiscontinued silence.
Have you found your escape, your zen, your inner focus? Please share, I need some ideas16 -
To all the data engineers in here: WTF is going on in your field?
I've worked closely with a dozen data engineers in the last 5 years (and talked to friends and internet strangers about this and get similiar responses), mine if them seem to know how to use a computer!
They don't understand git, ORMs, best practices, how to use a terminal, DAGs (important for using modern ETL scheduling tools like airflow and prefext), etc
Guys with 10 years of experience on their resume and they can't wrap a model into a flask app with 1 endpoint. They'll reference local files on their machine in w jupyter notebook and are shocked it won't work on other computers!17 -
!rant
Looking for advice, serious advices.
I work in C.
Also, I work in Python.
I have worked for a couple of year in C++.
I have a fair knowledge of the Data Science workflow, and some experience in Machine Learning.
I have tinkered with some other languages (Java, Ruby, Go, JS among the others, nothing serious nor professional)
I'm the kind of person who needs constant problems to face in order to keep engaged, satisfied, happy. And I need to learn new stuff, or refining my knowledge constantly, or I stagnate. I believe that this is true for quite a share of people here.
I would like to spend some spare time (I seldom have) in a project. Personal projects are rarely good enough to improve one's cv, so I thought I could partecipate in some Open Source projects.
Does anyone here have some suggestion about some interesting and satisfying OSProject, or some general suggestion on the matter?
It would be so apreciated.3 -
I was out sick the day an urgent ETL job I was building would be due, so it got reassigned. When I return, I find most of my code commented out and replaced.
The first step was rewritten, with a comment that reads "Made changes to run faster." What used to be a single execution lasting 30 seconds was now a 4 step process taking 5 minutes, and yielding identical results.
Being a one-time execution (not a recurring job), I'm left wondering why they thought execution speed was even an issue, let alone what about their redesign they felt was an improvement...2 -
Since I'm back to working for myself again and haven't been able to find a reliable hire, I'm alone. In this bubble, no one cares/sees/appreciates my backend code and I just realized that's why I've been slacking so bad on this ETL process. No one gives a shit about it but me. If I build an interface, I get kudos and everyone celebrates, but working on a three server process with layers of abstraction, auto-scaling, etc...and people just wonder if I'm jerking off all day.
Sometimes it sucks to be a lone ranger.1 -
Used a wrong filter during loading of a table in ETL. Did not test and migrated to production. 80% of users had empty reports.
Had to stay awake till 4AM to get it fixed.
Realized an important lesson -
' A test in time saves nine' -
I’m not a web programmer; I’m an application and SQL developer. So when I’m tasked with scrapping a web site for an ETL feed, I thought it would just be a ton of substring and Post/Get calls.
Nope! There is this garbage called JBOSS.A4J where the page isn’t a page but a bunch of files that are merged together and then it isn’t “real” but like a bunch of Photoshop layers that “look” like a page. JavaScript functions based on key press and things like Select/Option that looks like an element but Selenium/PhantomJS (C#) can’t find it. Or my Google-Fu isn’t working. -
Interviewed with a company, it was a direct hire SQL Dev/Analyst role(ETL,BI etc). Had three interviews in a row all of which went great. We laughed, I was able to answer every technical question with no problem. Each person clearly enjoyed the interview, I ended up going over the specified amount of time set aside for the interview... Still didn't get the job. They said "There is no doubt he can do the job, but we don't think he's passionate enough about the position." What?!?! So confused. It's also odd to me because every job before this If I had an in person interview I was offered the job... I don't get it.4
-
So I did this https://devrant.io/rants/797965/... which works fine until medium sized data.
However for large data the ETL pegs a 6 core Xeon (2.2GHz) with 50GB of ram. Because of it ends up doing six threaded compares, so 12 different data sets. Other than "pull less data", any tips?
Code (C#) is basically a Linq multi column join between two DataTables and when the compared columns don't match it returns as a var which is turned into a third DataTable to be SqlBulk loaded into the DB.
Table1 is external API return data (no windowing) and Table2 is from our DW.7 -
Time to switch to offline and hide in some dark corner to get work done. Tired of all the IM’s and coming over to my desk from 1 person for “critical” work. If they’re all critical then none of them are truly critical. If you sit on the data for 2 months, and then today is the day it becomes critical and the compliance issue is because of your ineptitude then its a you problem not an IT problem. Then on top of that you submit your data to be loaded in the incorrect request form and spreadsheet format you can go fuck yourself asking this be done in an hour. It could be done in 15 minutes if you had it in the correct format as specified in the 20 meetings over the past year which removed all manual analysis and automated the entire process you idiot. Now I have to get it into the correct format in that hour so I don’t have to do the analysis for you.
I have other things to do besides your etl tickets, like finding the actual problems in our actual critical applications. You know the ones where the VP’s of this giant corporation start calling if they go down.
Sorry for the rambling guys. -
I was studying a lot the last year, i learned a lot about Machine Learning/Deep Learning, Data Gathering, Data Analysis, ETL, Model Architecture Design, Training, Fine Tuning, Backend Development, DataBases, API Development, ORMs, Rest, GraphQL, OAuth, CI/CD, Docker, Deployment to Production environments like Heroku, Git and more stuff i dont remember while writing this. I built and keep adding stuff to my Github Portafolio.
Im not able to get a job. I started looking for jobs as Data Scientists, no response never. I take a look at freelancer sites, nothing seems to fit my skills. And when there is a minimal fit, they always want a Full Stack Web Developer, i dont know Frontend Development, i dont like do it.
Dont know what to do or how to land any job.
My options aeems to be:
1.Learn Frontend Dev and work as Full Stack in underpaying freelance jobs
2.Keep applying to Remote-Only startups, but they still wants people with 3+ years of experience.
i cant work in my city, here are not any company startup hiring no one, we are 30 years in the past here.
What you do in my place?10 -
Fuck this.. I have to tell you this annoying wtf..
Hi btw this is my first rant so pls dont blame me :)
I am working on an etl project for our company to connect data sources like netbase, similar web, etc. to alooma (a data pipeline).
Now I got the task to add another data source called BrightEdge to it.
All fine.
BUT WHAT THE ACTUAL FUCK.. IT TAKES 3 MONTH TO GET AN ACCOUNT. AND U KNOW WHAT.. I DIDNT EVEN GET THE API CREDENTIALS. THIS IS MY FINAL PROJECT FOR MY TRAINING TO BECOME AN IT SPECIALIST.3 -
Getting out of bed at 11pm to fix an etl failure. I was just about to get off devrant and go to sleep... (the second sentence was !true
-
Finished writing an ETL job in dev and scheduled it right before I left.
About an hour later I get an email it failed, now I get to wait till Monday to figure out why.1 -
Most hacky things I've ever done:
A windows scheduled task that kicks off a massive as fuck ETL job, riddled with errors. Damn thing had a mind of its own and only ran whenever it felt like it. Client was happy, deadlines were met, boss moved me to another task. -
Newtonsoft JSON
https://nuget.org/packages/...
CSV Helper
https://nuget.org/packages/...
With ETL these two cover 90% of file ingest. I’m still looking for a good XML “auto class” package.2 -
Data wrangling is messy
I'm doing the vegetation maps for the game today, maybe rivers if it all goes smoothly.
I could probably do it by hand, but theres something like 60-70 ecoregions to chart,
each with their own species, both fauna and flora. And each has an elevation range its
found at in real life, so I want to use the heightmap to dictate that. Who has time for that? It's a lot of manual work.
And the night prior I'm thinking "oh this will be easy."
yeah, no.
(Also why does Devrant have to mangle my line breaks? -_-)
Laid out the requirements, how I could go about it, and the more I look the more involved
it gets.
So what I think I'll do is automate it. I already automated some of the map extraction, so
I don't see why I shouldn't just go the distance.
Also it means, later on, when I have access to better, higher resolution geographic data, updating it will be a smoother process. And even though I'm only interested in flora at the moment, theres no reason I can't reuse the same system to extract fauna information.
Of course in-game design there are some things you'll want to fudge. When the players are exploring outside the rockies in a mountainous area, maybe I still want to spawn the occasional mountain lion as a mid-tier enemy, even though our survivor might be outside the cats natural habitat. This could even be the prelude to a task you have to do, go take care of a dangerous
creature outside its normal hunting range. And who knows why it is there? Wild fire? Hunted by something *more* dangerous? Poaching? Maybe a nuke plant exploded and drove all the wildlife from an adjoining region?
who knows.
Having the extraction mostly automated goes a long way to updating those lists down the road.
But for now, flora.
For deciding plants and other features of the terrain what I can do is:
* rewrite pixeltile to take file names as input,
* along with a series of colors as a key (which are put into a SET to check each pixel against)
* input each region, one at a time, as the key, and the heightmap as the source image
* output only the region in the heightmap that corresponds to the ecoregion in the key.
* write a function to extract the palette from the outputted heightmap. (is this really needed?)
* arrange colors on the bottom or side of the image by hand, along with (in text) the elevation in feet for reference.
For automating this entire process I can go one step further:
* Do this entire process with the key colors I already snagged by hand, outputting region IDs as the file names.
* setup selenium
* selenium opens a link related to each elevation-map of a specific biome, and saves the text links
(so I dont have to hand-open them)
* I'll save the species and text by hand (assuming elevation data isn't listed)
* once I have a list of species and other details, to save them to csv, or json, or another format
* I save the list of species as csv or json or another format.
* then selenium opens this list, opens wikipedia for each, one at a time, and searches the text for elevation
* selenium saves out the species name (or an "unknown") for the species, and elevation, to a text file, along with the biome ID, and maybe the elevation code (from the heightmap) as a number or a color (probably a number, simplifies changing the heightmap later on)
Having done all this, I can start to assign species types, specific world tiles. The outputs for each region act as reference.
The only problem with the existing biome map (you can see it below, its ugly) is that it has a lot of "inbetween" colors. Theres a few things I can do here. I can treat those as a "mixing" between regions, dictating the chance of one biome's plants or the other's spawning. This seems a little complicated and dependent on a scraped together standard rather than actual data. So I'm thinking instead what I'll do is I'll implement biome transitions in code, which makes more sense, and decouples it from relying on the underlaying data. also prevents species and terrain from generating in say, towns on the borders of region, where certain plants or terrain features would be unnatural. Part of what makes an ecoregion unique is that geography has lead to relative isolation and evolutionary development of each region (usually thanks to mountains, rivers, and large impassible expanses like deserts).
Maybe I'll stuff it all into a giant bson file or maybe sqlite. Don't know yet.
As an entry level programmer I may not know what I'm doing, and I may be supposed to be looking for a job, but that won't stop me from procrastinating.
Data wrangling is fun.1 -
Before my vacation I’d been chatting with one of our dbas about an etl tool we needed for a customer we’d already signed all the contracts with saying we would provide one for a historical database of old data. They had been looking at one from SAP but in typical fashion a license was worth more than the actual contract.
Anyway long story short on the weekend before I went back to work I rattled together a little python proof of concept using a couple of azure databases and when I went back demo’d it to the pm and dba they loved it and we built on the poc to have a working loader which saved us about £30k by not buying the SAP product and just wrote our own. -
I had used a computer since the win 3.1 days and I fooled around with VB on win 95 or 98. I didn't know it was going to be my passion until i wrote a whole data structures library in c++ based on my double linked list i wrote for a class. I called it the ETL, for easy template library (like the STL was hard!!). Thats when i knew i had a knack for it and began really learning.
-
The customer wants to migrate his old store into WooCommerce. Here's a MySQL dump with 130 tables and no documentation on how they're related.
You also have to scrape all of the couple thousand product images off their site because they don't want their old dev knowing, so you can't just have FTP access...1 -
Imagine enabling verbose logging for a complex ETL process that typically takes 8 hours to run but has been failing for some reason after running for about 7 hours. Naturally, you want to check the log file to find out what went wrong.
Now imagine not having read access to the log file. -
Question about PR best practices.
I work for an analytics company and often have to implement new ETL steps. The data transformations in these steps can be complex and the major changes are usually 500+ lines up to around 2000 (the last one had 765 lines just for schemas).
What's the best way to split up the changes into multiple PRs, bearing in mind that it isnt guaranteed that a file won't change as the change is built up? -
My team has a Database Admin 2 position open on the Arvest Career site. We are looking for someone with Data Warehousing/Data Integration background with SQL Server, ETL, SSIS, or equivalent. Also looking for a physical DBA with background in SQL Server, performance tuning, partitioning, DR/HA, Database migrations, dB refresh, dB restore, building out clusters.
https://appone.com/MainInfoReq.asp/... -
The company wants to implement a CRM with the cleansing rules built and managed by the business. What software did you buy or stack did you build on; for business directed data cleansing? We are a Microsoft shop but we can adapt is the solution is in the “magic quadrant”.
-
Was trying to figure out the seemingly unending line of scripts that call scripts scripts and I commented out the core functionality of a job in production. Stayed their for weeks. The script in question? Validates that correct data is loading from an etl job.
-
* Canonical Data Models for Metrics and Reporting
* ETL, Table, and API designs to blend Legacy data into Cloud data via said Canonical Models
* Teaching n-Tier and Domain Driven Development models
Welcome to the Office of the CIO. -
Best : creating a fully customizable, performance-oriented ETL service from ground up.
Bad : developing in xamarin forms in Android .
Worse : porting said xamarin forms app to ios. -
"No, the Client doesn't like stored procedures so we have done all our TL parts of the ETL using a bunch of views on top of views on top of views."
Wish I could have been here at the start so I could have pushed back, sigh.
Siiiggghhhh, yet the client is anal about performance and even consistency in SSIS packages.....siiiggghhhhhh but we dont have SHOWPLAN permissions or even sp_who2 access...siiiigggghhhhhh.
If i expanded one of the final views, would be like 1k lines. For the amount of data, we move, there shouldn't be any noticeable processing time but it can take anywhere from 10mins to an hour.2 -
Incomptent 3rd party "services" that you have to integerate with their non-consistent API error codes and useless 'new, more optimized' half broken ETL pipelines
-
So, the PowerQuery type system appears to be a Joke.
For those you that aren't familiar with PowerQuery, it's the ETL language that is used in PowerBI, and some other parts of the MS PowerPlatform. It was formerly known as the M Language.
The language has a type system, that includes records (think hashes) and tables, which are, for practical purposes, a list of records.
The wonderful M language specification document states that:
"Any value that is a record conforms to the intrinsic type record, which does not place any restrictions on the field names or values within a record value. A record-type value is used to restrict the set of valid names as well as the types of values that are permitted to be associated with those names."
Except that the restriction is only to the set of valid names, and the language interpreter doesn't throw an error when I place a number into a text field, but also doesn't do any sort of implicit conversion. This is all hunky-dory, until you then try to load the data into the Tabular Model that underlies the query engine, which does expect the values to be of the type that is specified, and it throws an error.
But PowerBI, in its infinite wisdom, doesn't actually *record* the error, it merely tells you the error exists, and tells you to go back to the query editor to list the errors thrown up by the powerquery engine. Which, as previously stated, doesn't throw up an error for this instance.
So I've spent all afternoon trying to work out why my queries aren't loading, because I have an error that doesn't exist. fml.
[You can follow this issue on the communtiy feedback site here: https://community.powerbi.com/t5/... ] -
Working on a feature which heavies relies on a data pipeline. I noticed it is a couple of lambda functions calling each other ( Fuck you to the guy who made it). The best way to get sanity back is build a proper etl pipeline. Any suggestions for building a etl in python with reliability.
Options already considered
1. Celery tasks - Worked well but no overview of the single task progress across celery tasks
2. Airflow - Gives good overview but the docs make less sense than a 10 yr talking. Mostly because they introduced a new syntax and not everything has migrated fully yet. Also no support for reusing dags2 -
Walmart Retail Link has been down for five hours. No word on when it will be back up. All our work is halted until then.
-
Sometimes i feel that i am the only person who knows the type-2 trick in a dwh design. Why don't you read a book about dwh101 before doing your work? I am not a genius and its not a rocket science.
-
It wasn't an entirely solo projext but ever part of it was completely solo. I felt very proud of the ETL, DICOM metadata search database and Ci/CD pipelines that I built for an oil and gas company. They didn't understand the CI/CD parts so didn't take it anywhere after we'd finished.
-
Eh...
Any guy with ETL tool background?
Is it worth learning Talend?
Our workplace decided to use this for data integration but I'm not sure this thing is currently used or not.
I've also researched a bit about other alternatives but as I've no background in this area I'm unable to decide.