pigs

Ranter

donkulator

3953

Comments

2

netikras

34975

2y

DB table backup?
0

Lensflare

19913

2y

Lol 😂

Have my ++!
1

SoldierOfCode

1977

2y

Holy fuck. My condolences xD
10

PaperTrail

10442

2y

Data from a government agency?

One agency we worked with had a similar no header CSV. Which was fine because the columns were supposed to be Name, Address, and so on.

Easy right? Nope. Sometimes an address was the first column, a zip code, a phone number. Almost random.

We complained the data was completely unusable and their response "The data is accurate, you are the first company to complain"

Our requests to fix the data (because we know the CSV is likely generated by a 1950 COBOL program) was replied with "Any changes to our internal processes will have to be directed by a congressional decree. We recommend you contact your state senator."

Yes, moron. A state senator will make it his first priority to make sure names are in the Name field.
3

PaperTrail

10442

2y

@Demolishun > "how do you word a contract to say data must be parse-able?"

Can't say we ever had an issue with other companies.

We were legally required to use this data as-is, so there wasn't anything we could really do.
0

myss

4451

2y

Sounds like an inside job for GPT
2

electrineer

28867

2y

Signed up on March. First post. Has not upvoted a single rant. What is this account?

Good first rant though.
3

tosensei

9077

2y

you'd rather have the same data as a 5GB .xlsx?

go count your blessings.
4

JsonBoa

3081

2y

I've seen shitheads pipe out logs from parallel processes as header-less CSV rows, and gather the files in a very hadoop-resembling reduce operation.
Then the reduced files are dumped and rotated on time buckets.
Since no row yielded can be guaranteed to be the first, each file had no header.
Seems like a no-brainer, right? Document it up and provide the header-only first file.

Buuut... the fuckers forgot that production code is _alive_, man.
And those many parallel processes were not workers of the same job, they were microservices piping out similarly-formatted logs as CSV rows.
Soon enough there were changes to a microservice or another, and exotically formatted log rows started getting mixed together in the same file.
If at least there were a version number column per row, we could do a second map-reduce and gather the similarly versioned/formatted data...

Fuck everything.
0

gemsy

27

2y

Yeah, I got a 7.5 GB CSV from a government agency, and at least it has a header. But every now and then, I come across a new kind of "NaN". Sometimes it's "---", then it's an empty column, then a "nan". What the fuck, guys?
0

kiki

37752

2y

Ask chatgpt to infer column names
0

MoboTheHobo

3008

2y

Bro welcome to my world where I deal as our company dev with consultants. For them this is ok, Im going Bonkers everday of this shitty workflow 🤣🤣

TL;DR alot of people in the IT field are just here for the high salary 😜