3

I have a text (.txt) file that looks like this:

Person    Person Name     Person   Approval     Supervisor Payroll Name    Application  Supplier Start Date  End Date Archived
Type                      Number   Status       Name                       Name


Agency    D'Cunha, Yionue 123456   NOT ENTERED  Power,                     Projects    CONTRACT
Contractor                                      Mehash                                 SUPPLIER_1
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Vughila,        132456   WORKING      Miro,      Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Proshont                              Profal     Monthly
                                                                                                    10-DEC-16  16-DEC-16   No
Employee  Diiri, Maaor    113456   NOT ENTERED  Kargannkir,Company-abcde INPayroll
                                                Bivnath    Monthly
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Kimit, Gongobhar111111   WORKING      Chondorkor,Company-abcde INProjects              10-DEC-16  16-DEC-16   No
                                                Avissku    Monthly
Employee  Kalvornu,       110077   WORKING      Kindipur,  Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Churali                               Barinakir  Monthly
Agency    Dhilorii,       100009   NOT ENTERED  Nook,                      Projects    CONTRACT
ContractorBohishik                              Lurukont                               SUPPLIER_2

I get this file from a report generated by a software. I'd like to parse the file and export the data to CSV. I tried this, but that didn't help because the structure of my data is so different.

Then I tried this:

$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"  

$data = $input[1..($input.Length - 1)]

$maxLength = 0

$objects = foreach ($record in $data) {
    $split = $record -split "\s{2,}|\t+"
    if ($split.Length -gt $maxLength) {
        $maxLength = $split.Length
    }
    $props = @{}
    for ($i=0; $i -lt $split.Length; $i++) {
        $props.Add([String]($i+1), $split[$i])
    }
    New-Object -TypeName PSObject -Property $props
}

$headers = [String[]](1..$maxLength)

$objects | 
    Select-Object $headers | 
    Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"

But that messed up the second line of every row. Problem is that in the raw text file, every second line is also a part of the first line. And in some cases even the third line is a part of the first row of data.

If there's any information I can provide to phrase my question better please let me know.


After @Ansgar's comments I tried this:

# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String

# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1

$parsedData = foreach ($record in $data) {
    $prop = @{}
    $record -split '\n' | ForEach-Object {
        $prop['PersonType'] += $_.Substring(0, 10).Trim()
        $prop['PersonName'] += $_.Substring(10, 16).Trim()
        $prop['PersonNumber'] += $_.Substring(26, 9).Trim()
        $prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
        $prop['Supervisor'] += $_.Substring(48, 11).Trim()
        $prop['PayrollName'] += $_.Substring(59, 16).Trim()
        $prop['ApplicationName'] += $_.Substring(75, 13).Trim()
        $prop['Supplier'] += $_.Substring(88, 9).Trim()
        $prop['StartDate'] += $_.Substring(97, 12).Trim()
        $prop['EndDate'] += $_.Substring(109, 9).Trim()
        $prop['Archived'] += $_.Substring(118, 8).Trim()
    }

    New-Object -Type PSObject -Property $prev
}

$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType

But now I'm getting a blank output CSV file in my target folder. Am I getting missing something somewhere?

Community
  • 1
  • 1
  • You will be better off if you can get a hold of the raw data that the report was made from. That way, you won't have to undo what the report generator did to the data. The report generator is probably trying to save space in the report, and to make it more human readable. Both of these two goals are getting in your way. – Walter Mitty Dec 19 '16 at 15:00
  • Hi Walter, thank-you for the prompt response. That's one of the main issues. I can't get a hold of the report or modify the report in any way. It generates these reports and gives me a weekly dump of the file in .txt format. I've simply copy-pasted a part of the data in the .txt file here. – Rurojnikenshin_183 Dec 19 '16 at 15:17
  • I just realized that your individual records are not separated by blank lines, so my suggestion wouldn't work at all. Also, it looks like arbitrary fields can have values in the next line or be completely empty. What you need to do first is define criteria by which you can programmatically distinguish new from continued records, otherwise this is never going to work. – Ansgar Wiechers Dec 19 '16 at 19:09

1 Answers1

0

I've got a solution, but ...
It uses two splits, the first takes the words (Person|Agency|Employee)
to split records (there are flaws which required the if),
the second splits at newlines and parses offset+length afterwards.
This also isn't perfect due to inconsistencies in the sample data.

$InFile = 'Q:\Test\2016-12\19\41225200.txt'
$OutFile= 'C:\path\to\output.txt'

$Delimiter = '(Person|Agency|Employee)'
#'$Escaped   = [regex]::Escape($Delimiter)
$Split     = "(?!^)(?=$Delimiter)"

$parsedData = (Get-Content $InFile -Raw) -split $Split | 
    ForEach-Object {
        $prop = @{}
        If ($_.Length -ge 30 ) {
            ForEach ($Line in $_.split("`n")) {
                $Line+=" "*130
                $prop['PersonType']      += $Line.Substring( 0, 10).Trim()
                $prop['PersonName']      += $Line.Substring(10, 16).Trim()
                $prop['PersonNumber']    += $Line.Substring(26,  9).Trim()
                $prop['ApprovalStatus']  += $Line.Substring(35, 13).Trim()
                $prop['Supervisor']      += $Line.Substring(48, 11).Trim()
                $prop['PayrollName']     += $Line.Substring(59, 16).Trim()
                $prop['ApplicationName'] += $Line.Substring(75, 12).Trim()
                $prop['Supplier']        += $Line.Substring(87, 10).Trim()
                $prop['StartDate']       += $Line.Substring(97,  9).Trim()
                $prop['EndDate']         += $Line.Substring(108, 9).Trim()
                $prop['Archived']        += $Line.Substring(117, 8).Trim()
            }
        }
        New-Object -TypeName PSObject -Property $prop
}
$parsedData

Output

Supervisor      : ApplicatioName
ApplicationName : t Date End DName
Archived        :
PersonType      : Person   AType
PersonName      : pproval     Supe
Supplier        : ate Archiv
StartDate       : ed
ApprovalStatus  : yroll NameStatus
PayrollName     : n Supplier  Star
PersonNumber    : rvisor PaNumber
EndDate         :


Supervisor      : Power,Mehash
ApplicationName : Projects
Archived        : No
PersonType      : AgencyContractor
PersonName      : D'Cunha, Yionue
Supplier        : CONTRACTSUPPLIER_1
StartDate       : 10-DEC-16
ApprovalStatus  : NOT ENTERED
PayrollName     :
PersonNumber    : 123456
EndDate         : 16-DEC-16


Supervisor      : Miro,Profal
ApplicationName : Payroll
Archived        : NoNo
PersonType      : Employee
PersonName      : Vughila,Proshont
Supplier        :
StartDate       : 10-DEC-1610-DEC-16
ApprovalStatus  : WORKING
PayrollName     : Company-abcde INMonthly
PersonNumber    : 132456
EndDate         : 16-DEC-1616-DEC-16

My try to export-csv was also empty.

  • Thankyou! that was really helpful. Am still working on it though and will post an answer as soon as I find one. In the meantime, Thankyou so much! And I apologize for the late response, was out of town. – Rurojnikenshin_183 Dec 27 '16 at 09:30