Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "data warehouse"
-
Data scientist: we need to whitelist a pod to connect to a database
Me: Whitelist? We don't use whitelists on private databases
DS: It's the new data warehouse database
Me: is it on <X> VPC?
DS: I'm not sure what that means but its ip is <real world ipv4>
Me: Are you hosting a publicly accessible database with all our end users information?!
DS: ...
Me: There goes our SOC2 audit controls...
DS: how long until you can white list it?
Me: I won't be whitelisting it. You need to put it on a private VPC and peer with the cluster, you'll have to rebuild all the Terraform and redeploy
DS: We didn't use Terraform because it takes too long, just white list the pods IP.
Me: No. I'm contacting the CISO and CTO...21 -
Unaware that this had been occurring for while, DBA manager walks into our cube area:
DBAMgr-Scott: "DBA-Kelly told me you still having problems connecting to the new staging servers?"
Dev-Carl: "Yea, still getting access denied. Same problem we've been having for a couple of weeks"
DBAMgr-Scott: "Damn it, I hate you. I got to have Kelly working with data warehouse project. I guess I've got to start working on fixing this problem."
Dev-Carl: "Ha ha..sorry. I've checked everything. Its definitely something on the sql server side."
DBAMgr-Scott: "I guess my day is shot. I've got to talk to the network admin, when I get back, lets put our heads together and figure this out."
<Scott leaves>
Me: "A permissions issue on staging? All my stuff is working fine and been working fine for a long while."
Dev-Carl: "Yea, there is nothing different about any of the other environments."
Me: "That doesn't sound right. What's the error?"
Dev-Carl: "Permissions"
Me: "No, the actual exception, never mind, I'll look it up in Splunk."
<in about 30 seconds, I find the actual exception, Win32Exception: Access is denied in OpenSqlFileStream, a little google-fu and .. >
Me: "Is the service using Windows authentication or SQL authentication?"
Dev-Carl: "SQL authentication."
Me: "Switch it to windows authentication"
<Dev-Carl changes authentication...service works like a charm>
Dev-Carl: "OMG, it worked! We've been working on this problem for almost two weeks and it only took you 30 seconds."
Me: "Now that it works, and the service had been working, what changed?"
Dev-Carl: "Oh..look at that, Dev-Jake changed the connection string two weeks ago. Weird. Thanks for your help."
<My brain is screaming "YOU NEVER THOUGHT TO LOOK FOR WHAT CHANGED!!!"
Me: "I'm happy I could help."4 -
This is the fucking data warehouse............
10 FUCKING INDEXES IN THE ENTIRE THING!
Btw...that includes Primary Keys5 -
So I joined this financial institution back in Nov. Selling themselves as looking for a developer to code micro-services for a Spring based project and deploying on Cloud. I packed my stuff, drove and moved to the big city 3500 km away. New start in life I thought!
Turns out that micro-services code is an old outdated 20 year old JBoss code, that was ported over to Spring 10 years ago, then let to rot and fester into a giant undocumented Spaghetti code. Microservices? Forget about that. And whats worse? This code is responsible for processing thousands of transactions every month and is currently deployed in PROD. Now its your responsibility and now you have to get new features complied on the damn thing. Whats even worse? They made 4 replicas of that project with different functionalities and now you're responsible for all. Ma'am, this project needs serious refactoring, if not a total redesign/build. Nope! Not doing this! Now go work at it.
It took me 2-3 months just to wrap my mind around this thing and implement some form of working unit tests. I have to work on all that code base by myself and deliver all by myself! naturally, I was delayed in my delivery but I finally managed to deliver.
Time for relief I thought! I wont be looking at this for a while. So they assign me the next project: Automate environment sync between PROD and QA server that is manually done so far. Easy beans right? And surely enough, the automation process is simple and straightforward...except it isnt! Why? Because I am not allowed access to the user Ids and 3rd party software used in the sync process. Database and Data WareHouse data manipulation part is same story too. I ask for access and I get denied over and over again. I try to think of workarounds and I managed to do two using jenkins pipeline and local scripts. But those processes that need 3rd party software access? I cannot do anything! How am I supposed to automate job schedule import on autosys when I DONT HAVE ACCESS!! But noo! I must think of plan B! There is no plan B! Rather than thinking of workarounds, how about getting your access privileges right and get it right the first time!!
They pay relatively well but damn, you will lose your sanity as a programmer.
God, oh god, please bless me with a better job soon so I can escape this programming hell hole.
I will never work in finance again. I don't recommend it, unless you're on the tail end of your career and you want something stable & don't give a damn about proper software engineering principles anymore.3 -
Hired a new BI developer. She tested reasonably ok in SQL, and certainly showed good strengths in visualising data, plus had a good attitude in the interview. We hired her. She broke her laptop the first day. We got her another then she complained the camera didn't work but didn't realise the lever in front of the camera was to move the privacy shutter off and on.
Assigned her some work of taking queries that are used in a BI tool that targets the transactional database directly, and re-jigging them for Snowflake which we're using as a data warehouse now, aggregating all our data into one place. Yet, she's struggling to understand why the SQL query she's pasted in doesn't work as-is.
I go over it again; the source schemas and tables are this, but in Snowflake we've named them this. She then bemoans how much work that is to change them all - I say use find and replace. She then struggles with Snowflake syntax errors and asks for a guide on T-SQL to Snowflake. I show her Google and say "this is what I did when I hit these problems - search for 'Snowflake equivalent to T-SQL getdate()' or 'how to get current date in Snowflake' but she still doesn't understand. I ask if she's every had to work between T-SQL and MySQL or MySQL and PostgreSQL or Oracle and so on and she says yes. I say the syntax isn't the same, is it? And she goes oh, now I understand.
She scored reasonably in her SQL test but I'm now concerned there's something fundamental missing in her grasp of SQL. I gave her a detailed demo of the tools, I explained in the interview and on her start about our move to a data warehouse for all our apps, and put her through some training plus gave her time to work through our Confluence pages - not expecting she'll remember everything, but more to ensure she recalls they exist and what the general contents are.
Anyhow, that's my rant.6 -
If anyone has been keeping up with my data warehouse from hell stories, we're reaching the climax. Today I reached my breaking point and wrote a strongly worder email about the situation. I detailed 3 separate cases of violated referential integrity (this warehouse has no constraints) and a field pulling from THE WRONG FLIPPING TABLE. Each instance was detailed with the lying ER diagram, highlighted the violating key pairs, the dangers they posed, and how to fix it. Note that this is a financial document; a financial document with nondeterministic behavior because the previous contractors' laziness. I feel like the flipping harbinger of doom with a cardboard sign saying "the end is near" and keep having to self-validate that if I was to change anything about this code, **financial numbers would change**, names would swap, description codes would change, and because they're edge cases in a giant dataset, they'll be hard to find. My email included SQL queries returning values where integrity is violated 15+ times. There's legacy data just shoved in ignoring all constraints. There are misspellings where a new one was made instead of updating, leaving the pk the same.
Now I'd just put sorting and other algos, but the data is processed by a crystal report. It has no debugger. No analysis tools. 11 subreports. The thing takes an hour to run and 77k queries to the oracle backend. It's one of the most disgusting infrastructures I've ever seen. There's no other solution to this but to either move to a general programming language or get the contractor to fix the data warehouse. I feel like I've gotten nowhere trying to debug this for 2 months. Now that I've reached what's probably the root issue, the office beaucracy is resisting the idea of throwing out the fire hazard and keeping the good parts. The upper management wants to just install sprinklers, and I'm losing it. -
When your boss is too cheap to upgrade the server your data warehouse is on. 8GB of RAM is not enough when you're regularly querying for 1m+ rows.1
-
I was 1 hour into an 8 hour data warehouse build on my VM, when Windows popped up with a dialogue informing me that it would initiate Shut Down in 10 minutes. No button to Cancel or Delay, just “Okay”.
To Microsoft, I offer a hardy Dickensian “GOOD AFTERNOON, SIR!”4 -
Last Week Friday:
PM: We'll be taking you off the one project on to another, we'll send the details later.
Me: Cool
*Hours Later*
PM: Ok cool, so you'll be looking at a script that one of our Pillar heads has scripted. You need to make sure it works and that it can run on the server.
Me: *I always thought this guy was useless now i get to see what he can do* Cool, just send the documentation and i'll take a look at it over the weekend. Just tell me when you've sent it.
PM: Cool.
Project Head: I'll inform you when i send the files and how to run them.
Me: *I know how to set up a database locally, i'm not an idiot* Cool.
Whole Weekend I don't get a single message.
Monday Morning:
Project Head(PH): Have you taken a look at it yet?
Me: Taken a look at what?
PH: The Database and the Script
Me: i didn't get any message over the weekend.
PH: I sent it yesterday, it should be in your inbox.
Me: There's Nothing. Sending anything on a Sunday is expecting me not to see it, especially at 10pm. Besides i can't retrieve any of the files in the attachment(Outlook tripping), rather send it in a zip file or upload it to onedrive.
PH sends the link. I get the files, set up the DB, glance at the script.
Me: This is actually interesting.
PH: You know what it does?
Me: My SQL knowledge is below average but i can read and understand it pretty well. So your dynamically copying the database from the server to the warehouse, cool.
It's not going to work though.
PH: Check first.
I check it
Me: Doesn't work, but it sort of works.
PH: What do you mean?
Me: Some tables are populated but some aren't,, how and there's a shit tone of errors.
PH: So i does copy the data over.
Me: Some of the data.
PH: test it on the Server
Me: Not a good idea.
PH: Just try it.
PM: In the mean time i'll send you some documentation i need you to review and edit.
Me: *Idiots* Cool.
Tuesday:
Me: Have you checked it on the server yet?
PH: Not yet, busy.
Me: Where's the documentation again?
PM: I'll send it it a moment.
Me: In the mean time i'll write some script to fix that script that's definitely not going to work.
Wednesday:
Boss: I heard you done with the script
Me: It's not done, but we'll be testing it on the server later.
Boss: Then why are you running it on the server?
Me: Ask the PH and PM.
Boss: What are you doing now?
Me: Well i'm supposed to do documentation *looks at PM* but i haven't recieved any yet, so I've been writing a script to fix the copy script.
PH: Ok we'll test when the boss leaves, after all the meetings.
PM: here's the documentation.
Me: Thanks
I start on documentation.
PH: It didn't work.
Me: I know.
PH: Fix it.
Thursday:
Meeting.
PM: What you doing?
Me: Fixing the script,
PM: Do the documentation first
Me: Cool.
End of the day:
PH: Why you doing the documentation? The script has highest priority.
Me: Ask the PM.
Friday(Today):
Boss: can we talk.
Me: Sure.
Boss: I though you said the script was done?
Me: i said it sort of works, just doesn't do the job 100%.
Boss: Monday i was told it's done.
Me: i only looked through it Monday to understand it, i done nothing before Tuesday. though i have been trying to create a script to fix it.
Boss: Your working really slow hey.
Me: *It's been a week, and stupid people are in charge* I was doing what i was told.
Boss: Cool.(His Upset)
Stupid FUCKEN people, make stupid FUCKEN decisions. But Hey, the boss only see's the final result. I am a human being, even i make mistakes. But there's a huge gap between stupidity and a mistake. -
So for those of you keeping track, I've become a bit of a data munger of late, something that is both interesting and somewhat frustrating.
I work with a variety of enterprise data sources. Those of you who have done enterprise work will know what I mean. Forget lovely Web APIs with proper authentication and JSON fed by well-known open source libraries. No, I've got the output from an AS/400 to deal with (For the youngsters amongst you, AS/400 is a 1980s IBM mainframe-ish operating system that oriiganlly ran on 48-bit computers). I've got EDIFACT to deal with (for the youngsters amongst you: EDIFACT is the 1980s precursor to XML. It's all cryptic codes, + delimited fields and ' delimited lines) and I've got legacy databases to massage into newer formats, all for what is laughably called my "data warehouse".
But of course, the one system that actually gives me serious problems is the most modern one. It's web-based, on internal servers. It's got all the late-naughties buzzowrds in web development, such as AJAX and JQuery. And it now has a "Web Service" interface at the request of the bosses, that I have to use.
The programmers of this system have based it on that very well-known database: Intersystems Caché. This is an Object Database, and doesn't have an SQL driver by default, so I'm basically required to use this "Web Service".
Let's put aside the poor security. I basically pass a hard-coded human readable string as password in a password field in the GET parameters. This is a step up from no security, to be fair, though not much.
It's the fact that the thing lies. All the files it spits out start with that fateful string: '<?xml version="1.0" encoding="ISO-8859-1"?>' and it lies.
It's all UTF-8, which has made some of my parsers choke, when they're expecting latin-1.
But no, the real lie is the fact that IT IS NOT WELL-FORMED XML. Let alone Valid.
THERE IS NO ROOT ELEMENT!
So now, I have to waste my time writing a proxy for this "web service" that rewrites the XML encoding string on these files, and adds a root element, just so I can spit it at an XML parser. This means added infrastructure for my data munging, and more potential bugs introduced or points of failure.
Let's just say that the developers of this system don't really cope with people wanting to integrate with them. It's amazing that they manage to integrate with third parties at all...2 -
Completely fucked up replication of MySQL servers.
Remote: 2 different database Servers
--> made sense.
Except the misconfiguration. Or better: No configuration at all.
So how to solve the massiv delays and make everything even more crazy?
2 remote servers - 2 readonly slaves for reading data remote (master - slave)
2 local (internal) servers.
Remote - Local Master Master.
Unfucking this cluster fuck was a real nightmare.
It had to be done at night, cause everything needed to be ripped apart.
And the servers were the backend of a warehouse with supply chain and multiple selling channels (Amazon, eBay etcetera).
So. It had to run the next day at 05.00 clock so the incoming orders could be packaged / prepared for shipping.
That was fun. Not.
And the clusterfuck died spectaculously on my first work day - the old DBA was gone (fired....)
:) -
Got anymore runtime exception handlers with that?
When a paid service that promises a million connection types can't even do an initial sync 🤦♂️. Back to bash scripting it is... -
Swear work is where you I go to fix other peoples poor design decisions and clean up the bullshit that comes out of said decisions.
CANT!
BE!
FUCKED!
How you have so many years experience and still design in way that ensures that maintenance /improvements/touching in the future is a huuugggeee clusterfuck.
Hey, I got an idea, lets make this whole data warehouse without a single index or primary key cos you know, that's the Kimball Method.2 -
Just finished importing over 70,000 rows with a bunch of joins onto a data warehouse for billing purposes. all done thanks to SQL alchemy and python. I feel like a boss.2
-
So, have been working for this company for 4 years now as a warehouse associate, but over time they finally realized I can code. I was given the opportunity to work on different projects (even though the first project was a setup for failure but still prevail completing it).
Long story short, next year plan on finishing my bachelor's degree in Software Development. Once I get the degree (or during the process) should I strive to try to work at the:
Tech position (at the current job)
or
Data Analyst department (current job) ,
since I would be the only developer (for data analyst and impressed the team members at my current job,
or
should I try to find another job in software development for a new field when the opportunity come up for a fresh start in just programming and not warehouse associate work?
P. S. Close friends with the Tech department, have high recognition and have done some projects for them. They would love to see me join the team if it happens. When I am not working with the tech department during off season (needs to be approved by management to work on these projects during off season) I am literally cutting a box, wasting my skills and potential in auditing during the season.7 -
*-- There's something kind of child like and adorable about working for a client who spends THOUSANDS of dollars on their data infrastructure, yet finds it ever so difficult to provide ONE user to help reconcile and test the new data warehouse.
-
"Copy the file sent to our warehouse management system to another folder so we can keep track of the deliveries we sent data for."
Why these geniuses think that having a folder with file names containing timestamps (and not delivery numbers) would make their job easier is beyond me... -
Our software outputs some xml and a client has another company loading this xml into some data warehouse and doing reporting on it. The other company are saying we are outputting duplicate records in the data.
I look and see something like this:
<foo name="test">
<bar value="2" />
<bar value="3" />
</foo>
They say there are two foo records with the name test..
We ask them to send the xml file they are looking at. They send an xlsx (Excel!) file which looks like this:
name value
test 2
test 3
We try asking them how they get xlsx from the xml but they just come back to our client asking to find what we changed because it was working before. Well we didn't change anything. This foo has two bar inside it which is valid data and valid xml. If you cant read xml just say so and we can output another format! -
Today my project manager called Hadoop a data warehouse and a Big Data lake in a meeting. I couldn't decide whether to laugh my ass off or spend the next 30 mins explaining to her what Hadoop actually is.2
-
!rant
"This vendor system is broken and we need the data in its DB. Here's a report used to use from it. Build a new report in SSRS that we can use to pull out all the records."
+1 day tearing apart their data warehouse to find where things are.
+1 day duplicating their sample report
"Well, you did a good job duplicating what we had, but we want something that will pull every single bit of possibly related information that's in the system, and not just what was on the report we gave you."
ffs!1 -
Is there a community about infrastructures? or data warehousing? Trying to find solutions for current projects, but don't have anyone for asking questions.
-
So I've been working with heavily data centric applications for 15 years or so. And I must say moving our data warehouse into the G Cloud was the worst idea ever. No tooling, everything is barely working, debugging is a nightmare.
If you are thinking about it, just don't.3 -
Context: data warehouse
A colleague gets notified that his historisation produces duplicates. He checks the tables, sees that something has gone wrong ... and asks me for help.
I'm so sorry but I had to laugh so hard. Turns out everything crashed cause the value to be historised had changed back to an earlier version (which can happen). And now I'm here wondering how someone who calls himself a senior dev can create a historisation without taking care of gaps and islands. Seriously, how can you not think about that?4