2
zymk
31d

Hello, I have a question for anyone familiar with multithreading!

I just started working with threading for the first time, I mostly write powershell scripts 😅, I found that certain conditions make using multithreading an absolute time saver. And of course in some tasks it's not such a big deal.

I am currently working on a project that runs multiple threads and each thread might invoke one of my functions that also threads the work.

I'm a total newbhat when it comes to this stuff, but if my main process is 4 threads, and I can spin up, up-to 4 more threads to run one of my functions, does the math equate to a possible total number of threads of 16 or is it possible to have the threading go ape-shit bananas and utterly thrash the cpu with rampant threads getting created?

I've looked online and based on some of the info that I've managed to come across on my own, the answers elude towards being safe because I'm creating pools for running the threads first and the pool is responsible for maintaining min/max threads, but I can't seem to find good info on running a pool+threads inside another thread.

Just to let you in on what the function does that requires threading in the first place, I need to basically query CloudTrail based on ARN's to find events, but I can only pass a single ARN to the find-ctevent cmdlet. So I'm essentially making 1500-ish really really small calls to AWS just to get back event data for the ARN.

Serially, this takes like almost 20 mins, on my laptop using stupid settings like 24 threads, it completes in about 95seconds. On the actual server that will be running this code, I'm going to limit it to 4 threads and try to figure out a way to cache the info locally and update the info on a cron or schedule so only the initial scrape takes forever and then the updates can be done nightly or something.

thank you in advance for your help, I'm not too sure if the question is dumb but please let me know either way!

Comments
  • 2
    Threading in Powershell? im.

    Anyway, most of time your thread waits for AWS, right? During this a thread will not do anything, and - besides memory - not consume CPU resources.
    Care has to be taken when many of these receive their response at nearly the same time and are suddenly using the CPU, but I doubt a small amount (10 to 20) will let your computer break down.
    Besides that, caching is a very good idea as you already wrote.

    Do you mind to tell if that was what you wanted to know? While you have written a lot, I'm not sure where your problem is.
  • 0
    @sbiewald Yes thank you! I'm specifically exposing these functions through a web front end so team members can basicaly run my scripts through a simple GET to a url, speeding up the process is the name of the game I'm going for, so I'd like to query nightly and cache, then pull the info from the updated cache.
  • 2
    Each thread creates a huge initial overhead. Reusing threads in a queued way allows you to save some of the cost associated with context switching.
  • 0
    No idea how you'd do that with goddamn PowerShell tho
  • 1
    @kescherRant This is actually a really interesting point you brought up, I didn't think about reusing threads, it looks like the tutorials I found could be improved upon.

    So I have created a pool with min/max threads, and I have a loop that iterates theough my AWS data and passes the ARN's into a thread object, it looks like my code is creating a new thread object, passing my scriptblock, and aws data as a param block and adds all of those to an arraylist. At the end of the loop the thread object gets invoked asynchronously. (I think this is so the next thread object can get created before waiting for the thread to process)

    then once the loop finishes processing the creation and async invoke of the threads, I have a while loop that is 'true' while there is any thread still processing. Once every thread is in the 'completed' status, I fetch my results by reiterating through the arraylist of threads and collecting the results for processing/formatting.
  • 1
    Also, I'm not using the module PSJobs for this, because I wanted to get an understanding of runspaces (powershell's multithreading)
  • 1
    You know you can include arbitrary C# code in Powershell scripts or call any .NET api? You might want to use lower level functionality in your code.
  • 0
    @sbiewald yup! I've been trying to do that whenever possible to keep things speedy. I found some other really nice little performance boosters here and there too that make some really massive improvements in execution speed.

    Like instead of using a regular array
    $myVar = @()

    use this arraylist instead
    $myVar = [system.collections.arraylist]@()

    it's soooo much faster to append items to the list. Or this new one I found just last week. There's a small pipeline tax you pay when piping to Out-Null but that can be completely avoided with this:

    [void]$myVar.add($psobject)
Add Comment