Practical PowerShell

Observations and examples of PowerShell in the real world

Split-Job: make your PC work harder

When you need to run a simple process or gather (WMI) data from many machines, you need a lot of patience. Or, you can divide and conquer using multiple PowerShell runspaces. There are many ingenious scripts available on the web that allow us to launch and manage background processes (even for PS v1). For my purposes, I found the necessary inspiration in a blog by Gaurhoth. His New-TaskPool script allows us to run multiple instances of a script block concurrently. Very cool!

The following script is a little different in that it pipes the data to each pipeline using a shared thread-safe queue rather than start a new pipeline for each input object. This reduces overhead and allows scripts that have a begin/process/end block to run more efficiently.

For instance, take this simple data gathering exercise:

Get-Content machines.txt | foreach {Get-WmiObject Win32_ComputerSystem -ComputerName $_} | Export-Csv ComputerInfo.csv

If you have a few hundred machines, this can take forever (especially if some machines are offline). Now replace the foreach alias with the Split-Job function:

Get-Content machines.txt | Split-Job {Get-WmiObject Win32_ComputerSystem -ComputerName $_} | Export-Csv ComputerInfo.csv

It will create 10 runspaces and run the WMI query concurrently, so this should be almost 10x faster. Even if one of the pipelines stalls, the others will keep going. If you already have some data gathering script that accepts pipeline input, you can just drop Split-Job in:

Get-Content machines.txt | Split-Job .\MachineReport.ps1 | Export-Csv MachineReport.csv

It is important to note that the position of the script in the pipeline is important; the command preceding it should be quick, e.g. get objects from a text file, AD, SQL etc.

This is a work in progress and I will post more about this in the following weeks. In the meantime, comments and suggestions are welcome!

Arnoud

[UPDATE] The latest version can be found here: http://www.jansveld.net/powershell/2008/06/split-job-092/

 

#requires -version 1.0
###################################################################################################
## Run commands in multiple concurrent pipelines
##   by Arnoud Jansveld
## Version History
## 0.9    Includes logic to distinguish between scriptblocks and cmdlets or scripts. If a ScriptBlock
##        is specified, a foreach {} wrapper is added
## 0.8    Adds a progress bar
## 0.7    Stop adding runspaces if the queue is already empty
## 0.6    First version. Inspired by Gaurhoth's New-TaskPool script
###################################################################################################

function Split-Job (
    $Scriptblock = $(throw 'You must specify a command or script block!'),
    [int]$MaxPipelines=10
) {
    # Create the shared thread-safe queue and fill it with the input objects
    $Queue = [System.Collections.Queue]::Synchronized([System.Collections.Queue]@($Input))
    $QueueLength = $Queue.Count
    # Set up the script to be run by each pipeline
    if ($Scriptblock -is [ScriptBlock]) {$Scriptblock = "foreach {$Scriptblock}"}
    $Script = '$Queue = $($Input); & {while ($Queue.Count) {$Queue.Dequeue()}} | ' + $Scriptblock
    # Create an array to keep track of the set of pipelines
    $Pipelines = New-Object System.Collections.ArrayList

    function Add-Pipeline {
        # This creates a new runspace and starts an asynchronous pipeline with our script.
        # It will automatically start processing objects from the shared queue.
        $Runspace = [System.Management.Automation.Runspaces.RunspaceFactory]::CreateRunspace($Host)
        $Runspace.Open()
        $PipeLine = $Runspace.CreatePipeline($Script)
        $Null = $Pipeline.Input.Write($Queue)
        $Pipeline.Input.Close()
        $PipeLine.InvokeAsync()
        $Null = $Pipelines.Add($Pipeline)
    }

    function Remove-Pipeline ($Pipeline) {
        # Remove a pipeline and runspace when it is done
        $Pipeline.RunSpace.Close()
        $Pipeline.Dispose()
        $Pipelines.Remove($Pipeline)
    }

    # Start the pipelines
    do {Add-Pipeline} until ($Pipelines.Count -ge $MaxPipelines -or $Queue.Count -eq 0)

    # Loop through the pipelines and pass their output to the pipeline until they are finished
    while ($Pipelines.Count) {
        Write-Progress 'Split-Job' "Queues: $($Pipelines.Count)" `
            -PercentComplete (100 - [Int]($Queue.Count)/$QueueLength*100)
        foreach ($Pipeline in (New-Object System.Collections.ArrayList(,$Pipelines))) {
            if ( -not $Pipeline.Output.EndOfPipeline -or -not $Pipeline.Error.EndOfPipeline ) {
                $Pipeline.Output.NonBlockingRead()
                $Pipeline.Error.NonBlockingRead() | foreach {Write-Error $_}
            } else {
                if ($Pipeline.PipelineStateInfo.State -eq 'Failed') {
                    Write-Error $Pipeline.PipelineStateInfo.Reason
                }
                Remove-Pipeline $Pipeline
            }
        }
        Start-Sleep -Milliseconds 100
    }
}
5 comments

5 Comments so far

  1. Stephen Mills June 6th, 2008 4:47 pm

    Looks like a good script. I'll probably use it some. Is there anyway to duplicate your current environment in the runspaces? Sometimes you have variables you'd like to reference and profiles you need to have loaded.

    You might want to check the $Pipelines.Count again before the start-sleep. That way it wouldn't pause once more after it is done.

    Thanks

  2. karl prosser June 7th, 2008 12:42 am

    this looks like a keeper. will have to look into it more.

  3. Stephen Mills June 10th, 2008 2:50 pm

    How do you use Cmdlets with parameters? I've played with it for a while and realized that I can't easily use parameters on Cmdlets as a command for Split-Job. I created a simple test.ps1 to demonstrate the issue. The first example thinks that -Force is a parameter of Split-Job. The second one doesn't process correctly for the pipeline, it doesn't see it as a pipeline. The third one works, but is non-intuitive. Example 4 also works, but could cause major headaches if you are already using both types of quotes.

    One possiblity that might work, would be to require a script block and then at least the second example would work. You could also add a $Foreach switch parameter to do the same thing it currently does automatically if it is a scriptblock. I think I would just use script blocks and not use the $Foreach switch parameter. That way it is consistent with what it is replacing.

    File: C:\Test.ps1
    #### Begin File ###
    param ([switch]$Force)
    process
    { if ($Force) {$_} }
    #### End File ####

    C:\> "Server1","Server2","Server3" | Split-Job c:\test.ps1 -Force
    C:\> "Server1","Server2","Server3" | Split-Job { c:\test.ps1 -Force }
    C:\> "Server1","Server2","Server3" | Split-Job { $_ | c:\test.ps1 -Force }
    Server1
    Server2
    Server3
    C:\> "Server1","Server2","Server3" | Split-Job 'c:\test.ps1 -Force'
    Server1
    Server2
    Server3

    Thanks,

    Stephen

  4. admin June 10th, 2008 4:56 pm

    Thanks for your comments Stephen. Originally I did use a ScriptBlock as the input command but I was trying to make the syntax a bit cleaner when you need a foreach. In version 0.8, the usage looked like this:

    "Server1","Server2","Server3" | Split-Job { c:\test.ps1 -Force }
    "Server1","Server2","Server3" | Split-Job { % {gwmi Win32_ComputerSystem -ComputerName $_} }

    Please note that you cannot use "foreach" instead of "%" here. Let me know what you think – I will consider rolling this back for the next version.

    Regarding your earlier question about variables and profiles: I believe that is a common limitation when dealing with runspaces. I have not run into any situations where this was a problem (as long as you don't use custom aliases in your script, which you shouldn't anyway). I am planning to go into more detail about these pitfalls and how to avoid them in an upcoming post.

    Cheers,
    Arnoud

  5. Practical PowerShell » Split-Job 0.92 June 19th, 2008 9:48 pm

    [...] is an update to the Split-Job function. Based in part on some of the comments on the previous version, I made the following [...]

Leave a reply