3
b2plane
14d

They increased 1 single file to have over 1.6 million records of data and bow the processing takes 12h to complete. They want me to improve the current bash scripts to decrease this processing time down to max 5 hours. Are you serious rn. Do i look like a magic fcking wizard 🧙🏿‍♂️🪄

Comments
  • 1
    You are not going to do it with bash. New problem new stack. Good luck!

    Yes you are a 🧙‍♂️ wizard!

    Make it happen captain.
  • 1
    Does the order matter in the 1.6 million records? If not split the file up and run in parallel.
  • 0
    @Grumpycat why would splitting the file be faster whatsoever?

    The file contains over 1.6 million rows of data
  • 1
    If you have 10 processes processing 160000 rows each they will finish faster than one process processing 1.6 million rows.. But order in the file must not matter.
  • 0
    @Grumpycat so i should write a bash script to first take the main 1.6m row file and split it in 10 files, copy 1/10 data into each file and process all 10 files in the same time?
  • 0
    @b2plane Yeah. It has worked before. Now you are going to hammer the machine though and you will find out where in your code the bottlenecks are. What does each row represent?
  • 1
    @Grumpycat no idea its some bullshit data
  • 0
    I agree with @Grumpycat parallelization is the way to go
  • 2
    I thought you were a god?
  • 1
    yesterday I found out bash has arrays and case statements and can read keyboard inputs

    but doesn't have 2d arrays 😔

    was kinda fun

    as a language, idk what's going on tho
  • 2
    ye parallelizing is what came to mind as well

    can bash do that? well it can't do 2d arrays so!

    I think there was some tool in the terminal already to do commands in parallel but I don't remember it

    it is faster because CPUs have multiple cores and applications just use one core. but you have like. 8-32 cores. so you could. just multiply your processing speed by that.

    I wondered if there was a way to inspect performance in a bash script though. seemed funner and not like cheating!

    if order in the file matters you can also just sort it after, maybe with some cleverness involved also. I'd assume it takes so long to process because it's doing a bunch of other stuff
  • 1
    Bash has a join and fork command from what I remember (its been years since I was that involved with bash). I would use another language if you can. Bash will bash your head in.
  • 0
  • 0
    @b2plane I meant to say fork and wait. join does something else.
  • 3
    @Grumpycat i suggested to replace shit bash scripts with python automation scripts but their systen which was built in 1998 is so old they cant do it, and also there are restrictions when trying to install anything, any python 3rd party libraries get blocked instsntly. Any external downloads within their private vm is blocked. I have to request for approval for everything. Dont even have vscode here i need to use notepad. Fucking shitty 90 year old grandpa bankers dont care about their bank having modern technology. All they care about is money. Which is ironic. And even more ironic that they generate trillions of dollars with a system as shitty as this one!
  • 0
    @b2plane oooh you're working for a bank

    it's not money that's the consideration but adding new libraries and network connections can open them up to security risks. banks are a prime target of criminals for obvious reasons
  • 1
    I sympathize with your plight. I worked for a bank for a couple of years. Like warfare. 99% boredom. 1% terror.
Add Comment