A 520MB CSV file with 29 columns and no headers.

If it's not an impertinent question, why in the name of Satan's magnificent testicles would anyone do that?

I hope their pig dies.

  • 2
    DB table backup?
  • 0
    Lol 😂

    Have my ++!
  • 1
    Holy fuck. My condolences xD
  • 8
    Data from a government agency?

    One agency we worked with had a similar no header CSV. Which was fine because the columns were supposed to be Name, Address, and so on.

    Easy right? Nope. Sometimes an address was the first column, a zip code, a phone number. Almost random.

    We complained the data was completely unusable and their response "The data is accurate, you are the first company to complain"

    Our requests to fix the data (because we know the CSV is likely generated by a 1950 COBOL program) was replied with "Any changes to our internal processes will have to be directed by a congressional decree. We recommend you contact your state senator."

    Yes, moron. A state senator will make it his first priority to make sure names are in the Name field.
  • 1
    @PaperTrail how do you word a contract to say data must be parse-able?
  • 4
    @Demolishun > "how do you word a contract to say data must be parse-able?"

    Can't say we ever had an issue with other companies.

    We were legally required to use this data as-is, so there wasn't anything we could really do.
  • 0
    make it 29 files and let it burn
  • 0
    Sounds like an inside job for GPT
  • 2
    Signed up on March. First post. Has not upvoted a single rant. What is this account?

    Good first rant though.
  • 0
    When I was young and naive, I thought people didn't knew better than to do what they do.

    But know, it's almost always malicious compliance and it's done to make some checkbox tick, but that's all.
  • 3
    you'd rather have the same data as a 5GB .xlsx?

    go count your blessings.
  • 4
    I've seen shitheads pipe out logs from parallel processes as header-less CSV rows, and gather the files in a very hadoop-resembling reduce operation.
    Then the reduced files are dumped and rotated on time buckets.
    Since no row yielded can be guaranteed to be the first, each file had no header.
    Seems like a no-brainer, right? Document it up and provide the header-only first file.

    Buuut... the fuckers forgot that production code is _alive_, man.
    And those many parallel processes were not workers of the same job, they were microservices piping out similarly-formatted logs as CSV rows.
    Soon enough there were changes to a microservice or another, and exotically formatted log rows started getting mixed together in the same file.
    If at least there were a version number column per row, we could do a second map-reduce and gather the similarly versioned/formatted data...

    Fuck everything.
  • 1
    Yeah, I got a 7.5 GB CSV from a government agency, and at least it has a header. But every now and then, I come across a new kind of "NaN". Sometimes it's "---", then it's an empty column, then a "nan". What the fuck, guys?
  • 0
    Ask chatgpt to infer column names
Add Comment