I have a text (.txt) file that looks like this:
Person Person Name Person Approval Supervisor Payroll Name Application Supplier Start Date End Date Archived Type Number Status Name Name Agency D'Cunha, Yionue 123456 NOT ENTERED Power, Projects CONTRACT Contractor Mehash SUPPLIER_1 10-DEC-16 16-DEC-16 No Employee Vughila, 132456 WORKING Miro, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Proshont Profal Monthly 10-DEC-16 16-DEC-16 No Employee Diiri, Maaor 113456 NOT ENTERED Kargannkir,Company-abcde INPayroll Bivnath Monthly 10-DEC-16 16-DEC-16 No Employee Kimit, Gongobhar111111 WORKING Chondorkor,Company-abcde INProjects 10-DEC-16 16-DEC-16 No Avissku Monthly Employee Kalvornu, 110077 WORKING Kindipur, Company-abcde INPayroll 10-DEC-16 16-DEC-16 No Churali Barinakir Monthly Agency Dhilorii, 100009 NOT ENTERED Nook, Projects CONTRACT ContractorBohishik Lurukont SUPPLIER_2
I get this file from a report generated by a software. I'd like to parse the file and export the data to CSV. I tried this, but that didn't help because the structure of my data is so different.
Then I tried this:
$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"
$data = $input[1..($input.Length - 1)]
$maxLength = 0
$objects = foreach ($record in $data) {
$split = $record -split "\s{2,}|\t+"
if ($split.Length -gt $maxLength) {
$maxLength = $split.Length
}
$props = @{}
for ($i=0; $i -lt $split.Length; $i++) {
$props.Add([String]($i+1), $split[$i])
}
New-Object -TypeName PSObject -Property $props
}
$headers = [String[]](1..$maxLength)
$objects |
Select-Object $headers |
Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"
But that messed up the second line of every row. Problem is that in the raw text file, every second line is also a part of the first line. And in some cases even the third line is a part of the first row of data.
If there's any information I can provide to phrase my question better please let me know.
After @Ansgar's comments I tried this:
# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String
# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1
$parsedData = foreach ($record in $data) {
$prop = @{}
$record -split '\n' | ForEach-Object {
$prop['PersonType'] += $_.Substring(0, 10).Trim()
$prop['PersonName'] += $_.Substring(10, 16).Trim()
$prop['PersonNumber'] += $_.Substring(26, 9).Trim()
$prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
$prop['Supervisor'] += $_.Substring(48, 11).Trim()
$prop['PayrollName'] += $_.Substring(59, 16).Trim()
$prop['ApplicationName'] += $_.Substring(75, 13).Trim()
$prop['Supplier'] += $_.Substring(88, 9).Trim()
$prop['StartDate'] += $_.Substring(97, 12).Trim()
$prop['EndDate'] += $_.Substring(109, 9).Trim()
$prop['Archived'] += $_.Substring(118, 8).Trim()
}
New-Object -Type PSObject -Property $prev
}
$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType
But now I'm getting a blank output CSV file in my target folder. Am I getting missing something somewhere?