0

I am trying to split all PDF files in a folder and then move them into an output folder. The trouble is, the program I am using (PDFtk) only split one file at a time.

This works great if there is only 1 file, but frequently users scan multiple files in at a time, and that causes the PowerShell script to shrug and just perform the move to output without splitting the files. Am I using ForEach incorrectly?

$pdfPath = 'C:\Temp\Incoming'
$pdfoutPath = 'C:\Temp\Completed'
$pdfFile = Join-Path $pdfPath '*.pdf'
$SetsOfPages = 1
$Match = 'NumberOfPages: (\d+)'
$NumberOfPages = [regex]::Match((pdftk $pdfFile dump_data), $Match).Groups[1].Value
"{0,2} pages in {1}" -f $NumberOfPages, $pdfFile

Get-ChildItem $pdfFile | ForEach-Object {
    for ($Page=1; $Page -le $NumberOfPages; $Page+=$SetsOfPages) {
        $File = Get-Item $pdfFile
        $Range = "{1}" -f $page, [Math]::Min($Page+$SetsOfPages-1, $NumberOfPages)
        $OutFile = Join-Path $pdfoutPath ($File.BaseName + "_$Range.pdf")
        "processing: {0}" -f $OutFile
        pdftk $pdfFile cat $Range output $OutFile
    }
    Get-ChildItem $pdfoutPath '*.pdf' -Recurse | foreach {
        $new_folder_Year = Get-Date $_.LastWriteTime -Format yyyy
        $new_folder_Month = Get-Date $_.LastWriteTime -uformat %m
        $new_folder_Day = Get-Date $_.LastWriteTime -uformat %d
        $des_path = "${pdfoutPath}\${new_folder_Year}\${new_folder_Month}\${new_folder_Day}"

        if (Test-Path $des_path){ 
            Move-Item $_.FullName $des_path 
        } else {
            New-Item -ItemType Directory -Path $des_path
            Move-Item $_.FullName $des_path 
        }
    }
    Get-ChildItem $pdfPath '*.pdf' -Recurse | foreach {
        $new_folder_Year = Get-Date $_.LastWriteTime -Format yyyy
        $new_folder_Month = Get-Date $_.LastWriteTime -uformat %m
        $new_folder_Day = Get-Date $_.LastWriteTime -uformat %d
        $des_path = "${pdfoutPath}\${new_folder_Year}\${new_folder_Month}\${new_folder_Day}"

        if (Test-Path $des_path){ 
            Move-Item $_.FullName $des_path 
        } else {
            New-Item -ItemType Directory -Path $des_path
            Move-Item $_.FullName $des_path 
        }
    }
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
Miek Pool
  • 55
  • 2
  • 6
  • 2
    [1] why all the version tags? what version are you working with? [2] what `For-Each` --- not only do i not see any such text in your code ... but the cmdlet is `ForEach-Object` and the loop construct is `foreach ($Thing in $Collection)`. – Lee_Dailey Jan 18 '19 at 20:48
  • Part of the code [looked familiar](https://stackoverflow.com/questions/43805726/split-pdf-by-multiple-pages-using-pdftk) but that was a different task, splitting to single pages shuld be easier. See [this Q&A](https://stackoverflow.com/questions/6598937/set-output-location-for-pdftk-sample-pdf-burst) –  Jan 18 '19 at 23:19

1 Answers1

0

If I interpret your code try correctly this script might do:

## Q:\Test\2019\01\18\SO_542610444.ps1
$pdfPath    = 'C:\Temp\Incoming'
$pdfoutBase = 'C:\Temp\Completed'
$pdfFile    = Join-Path $pdfPath '*.pdf'

Get-ChildItem $pdfFile | ForEach-Object {
    "processing: {0}" -f $_.FullName
    $pdfOutPath = Join-Path $pdfoutBase $_.LastWriteTime.ToString('yyyy\\MM\\dd')
    MD $pdfoutPath | Out-Null
    $OutFile = Join-Path $pdfoutPath ("{0}_%03d.pdf" -f $_.BaseName)
    &pdftk "$($_.FullName)" Burst output $OutFile
}

it builds the folder structure from the LastWriteTime (yyyy\MM\dd) and appends the page number with 3 places to the BaseName.

With this input file:

> gci .\Temp\Incoming\

    Directory: C:\Temp\Incoming

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2018-10-27     14:17       17540001 cc.18.01.pdf

Sample tree after running the script:

tree /F
C:.
└───Temp
    ├───Completed
    │   └───2018
    │       └───10
    │           └───27
    │                   cc.18.01_001.pdf
    │                   cc.18.01_002.pdf
%<...snip...>%
    │                   cc.18.01_155.pdf
    │                   cc.18.01_156.pdf
    │
    └───Incoming
            cc.18.01.pdf