Random opinion question:

I'm working on a thing where the user provides a big CSV and we process it and put it in the database, or update existing records.

This data impacts other things, but the data isn't front and center as a group of n the application for them to notice / see again (well they can query for it).

I'm thinking of taking the CSV and then presenting them with a table showing how we processed that data giving them a chance to review it before they commit it to the database...

I like this idea for two reasons:

1. If something goes sideways there's a chance someone will see it and I'm not sure I can do enough validation on a big ass CSV from god knows where to be sure we're going to process it right... (I'm going to do some validation but just can't cover it all)

2. It takes some of the mystery out of what happened / is happening for the user for.

Anyone try this in the past? Seems reasonable, but lots of things do before they go sideways.

  • 9
    Load up the csv in a temp / staging table present this to the user and make them edit or confirm before pushing it through and processing the data.
  • 1
    Yes I've done something very similar. I'd only do it if it's easy and you're sure they're actually gonna check it or else you're probably wasting your time.
  • 1
    It’s a good idea but it really depends on your user and the amount of data they are updating.
    My experience is that most users don’t really get why their data needs to be accurate and will fully expect you to clear it up if they take a shit in your db.
    However at least if you give them the option of validation then less of the responsibility is on you if/when they send in half arsed, inaccurate data.
    Or I might just be a bit old and cynical!
  • 1
    If the dataset is really huge an excerpt of the records will suffice. Staging table is highly recommended even for processing trimming/validation on your end. You can display erroneous records without having to fish them out of your life db. Also remember that you need to do a lot of sanitisation for security reasons.
  • 3
    Last I had to do this:
    1. Validate the data once loaded; fail-fast asap to save resources:
    2. Push the csv to the queue for async processing;
    3. Return a 200 to the user;
    4. Process the entries from the queue asynchronously, not overloading the system;
    5. As soon as the last entry is processed - update the respective csv-uploads-history entry with a success status.

    We only had a handful of servers and we had to deal with huuuuge loads. That did the trick.
  • 1
    @netikras This worked for us too. And add an email or system notification for errors or successful confirmation.
  • 1






    Thanks everyone for your input and examples. All good stuff to think about.

    Right now I think I'm going to go with:

    - Customer uploads spreadsheet.

    - I'm going to give them a sample (or all) of the entries to preview how we processed them (they can spot check if they like).

    - They hit "yeah that looks good" and then I'll send them to the sever and I'll handle that all async and such.

    On the preview page I sort of toyed with and built a quick option for them to change data if it was wrong BUT I thought about it and that just smelled like a bad idea. I really am against that now.

    I didn't want the customers CSV, the possibly edited preview, and then the DB all... all to be the same thing but also ... possibly different.

    I want the onus to be on the customer's CSV to be right, and if it's wrong, they fix it and then just run it again.
  • 1
    @N00bPancakes inline editing only makes sense on small data sets like a Christmas greetings service where customers copy paste a list of addresses.
    If users complian about it you can always add it later. Easily uploading it again is a must though, a lot of times it will be a structural problem like wrong column name or encoding issue.
Add Comment