3

I wrote a simple script for an internship, which trawls through a provided directory and deletes any file older than a specified number of days. I've spent all my free time today attempting to tighten it up. Here's what I've got so far:

function delOld($dir, $numDays){
    $timespan = new-timespan -days $numDays
    $curTime = get-date
    get-childItem $dir -Recurse -file | 
    where-object {(($curTime)-($_.LastWriteTime)) -gt $timespan} | 
    remove-Item -whatif
}

Here is an example of a call of the function:

delOld -dir "C:\Users\me\Desktop\psproject" -numDays 5

Sorry for the difficulty of reading, I found that condensing the operations into one line was more efficient than reassigning them to legible variables each iteration. The remove-item is whatif'd at the moment for testing purposes. I'm aware that at this point, I probably cannot speed it up much, however, I am running it on over a TB of files, so every operation counts.

Thanks in advance for any advice you have to offer!

  • That's as fast as you can make it to my eyes. I really don't know how it could be faster other than perhaps designing it to throw off jobs? But redesigning like that would negate the speed increase anyway – pointerless Jun 05 '17 at 21:40
  • Have you tried Log Parser? – Bill_Stewart Jun 05 '17 at 21:41
  • 2
    99% of time is spent in `Get-ChildItem` reading the physical disk so if any method to speed it up noticeably exists it would be reading disk's MFT directly by using [Everything's API](http://www.voidtools.com/support/everything/sdk/) (time/date indexing should be enabled) with a search query that might take just a few seconds! – wOxxOm Jun 05 '17 at 21:42
  • 1
    Chasing the seconds: Optimizing file enumeration https://www.youtube.com/watch?v=erwAsXZnQ58&list=PLDCEho7foSoruQ-gL5GJw-lRkASPJOukl&index=7 – Jaqueline Vanek Jun 05 '17 at 22:05
  • Possible duplicate of [How to speed up Powershell Get-Childitem over UNC](https://stackoverflow.com/questions/7196937/how-to-speed-up-powershell-get-childitem-over-unc) – TessellatingHeckler Jun 05 '17 at 22:25

3 Answers3

8

Staying in the realm of PowerShell and .NET methods, here's how you can speed up your function:

  • Calculate the cut-off time stamp once, up front.

  • Use the [IO.DirectoryInfo] type's EnumerateFiles() method (PSv3+ / .NET4+) in combination with a foreach statement. Tip of the hat to wOxxOm.

    • EnumerateFiles() enumerates files one at a time, keeping memory use constant, similar to, but faster than Get-ChildItem.

      • Caveats:

        • EnumerateFiles() invariably includes hidden files, whereas Get-ChildItem excludes them by default, and only includes them if -Force is specified.

        • EnumerateFiles() is unsuitable if there's a chance of encountering inaccessible directories due to lack of permissions, because even if you enclose the entire foreach statement in a try / catch block, you'll only get partial output, given that the iteration stops on encountering the first inaccessible directory.

        • The enumeration order can differ from that of Get-ChildItem.

    • PowerShell's foreach statement is much faster than the ForEach-Object cmdlet, and also faster than the PSv4+ .ForEach() collection method.

  • Invoke the .Delete() method directly on each [System.IO.FileInfo] instance inside the loop body.

Note: For brevity, there are no error checks in the function below, such as for whether $numDays has a permissible value and whether $dir refers to an existing directory (if it's a path based on a custom PS drive, you'd have to resolve it with Convert-Path first).

function delOld($dir, $numDays) {
    $dtCutoff = [datetime]::now - [timespan]::FromDays($numDays)
    # Make sure that the .NET framework's current dir. is the same as PS's:
    [System.IO.Directory]::SetCurrentDirectory($PWD.ProviderPath)
    # Enumerate all files recursively.
    # Replace $file.FullName with $file.Delete() to perform actual deletion.
    foreach ($file in ([IO.DirectoryInfo] $dir).EnumerateFiles('*', 'AllDirectories')) { 
     if ($file.LastWriteTime -lt $dtCutOff) { $file.FullName }
    }
}

Note: The above simply outputs the paths of the files to delete; replace $file.FullName with $file.Delete() to perform actual deletion.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • @mklement0 I hadn't heard of EnumerateFiles(), and pre-generating a cutoff date makes me feel dumb for not thinking of it earlier! I'm reluctant to try foreach(), however, due to the large size of the directories I am working though. Is it not true that foreach() is really only effective when the size of the data is smaller than available memory? – Deusgiggity Jun 06 '17 at 16:02
  • @Deusgiggity: No, `foreach` is safe to use, because it processes items one at a time (similar to the `ForEach-Object` cmdlet, but unlike the `.ForEach()` collection operator, which operates on a preexisting entire collection). Since `EnumerateFiles()` also generates the file-info objects one at a time, this approach should work even with large directories. – mklement0 Jun 06 '17 at 17:31
2

Many of the PowerShell cmdlets are slower than their .NET equivalents. You could, for example, make a call to [System.IO.File]::Delete($_.FullName) instead, and see if there is a performance difference. Same goes for Get-ChildItem => [System.IO.Directory]::GetFiles(...).

To do that, I would write a small script that creates two temp folders with say, 100,000 empty test files in each. Then call each version of the function wrapped in [System.Diagnostics.StopWatch].

Some sample code:

$stopwatch = New-Object 'System.Diagnostics.StopWatch'
$stopwatch.Start()

Remove-OldItems1 ...

$stopwatch.Stop()
Write-Host $stopwatch.ElapsedMilliseconds

$stopwatch.Reset()
$stopwatch.Start()

Remove-OldItems2 ...

$stopwatch.Stop()
Write-Host $stopwatch.ElapsedMilliseconds

Further brownie points for PowerShell: Run Get-Verb in a Powershell window and you can see the list of approved verbs. Functions in PowerShell are suggested to be Verb-Noun named, so something like Remove-OldItems would fit the bill.

briantist
  • 45,546
  • 6
  • 82
  • 127
Dave Knise
  • 106
  • 9
  • 2
    Whether an equivalent .net method is faster entirely depends on the usage. Many PowerShell cmdlets are written to accept pipeline input and operate on multiple items, but people instead pipe to `ForEach-Object` and then call the cmdlet inside the block on each individual item. The problem with that approach is that the set/teardown code inside the cmdlet gets run for every item, whereas if the items were piped it, it would only get run once. That's just one example of how to really slow down and a cmdlet, but it all depends on context so testing is good. – briantist Jun 05 '17 at 21:52
  • This answer doesn't mention that non-SSD disk speed (random seek + read) is *multiple orders of magnitude slower* than the difference between PS cmdlets vs .NET methods. – wOxxOm Jun 05 '17 at 21:53
  • @briantist: Agreed. OP should write the quick perf test. Never know until you try, unless you actually really do know the internals of both functions. – Dave Knise Jun 05 '17 at 21:57
  • @wOxxOm: OP didn't ask how to improve the hardware of the machine they're on. They asked how to make the *code* run faster. – Dave Knise Jun 05 '17 at 21:57
  • @DavidKnise, ugh... the correct approach is to speed up the slowest part and it's possible to use a different solution e.g. Everything's API to read disk's MFT directly and execute a query that might take just a few seconds instead of minutes. – wOxxOm Jun 05 '17 at 21:59
  • @briantist Funny you should mention ForEach, removing that was actually one of my first fixes. Also, DavidKnise and wOxxOm, I've been informed that I'm limited to using the base materials, as it doesn't seem feasible to ensure every server has access to extra APIs. – Deusgiggity Jun 05 '17 at 22:16
  • @Deusgiggity, they're not extra APIs, PowerShell is built on .NET. – Dave Knise Jun 07 '17 at 17:12
1

This will delete everything in parallel processing.

workflow delOld([string]$dir, [int]$numDays){
    $timespan = new-timespan -days $numDays
    $curTime = get-date
    $Files = get-childItem $dir -Recurse -file | where-object {(($curTime)-($_.LastWriteTime)) -gt $timespan}
    foreach -parallel ($file in $files){
        Remove-Item $File
    }

}

delOld -dir "C:\Users\AndrewD\Downloads" -numDays 8

Now if its alot of folders try this

ArcSet
  • 6,518
  • 1
  • 20
  • 34