Hope all is well! I came across a bit of a tricky issue with a flat file that is exported from Oracle PBCS with some carriage return issues. End users - when inputting data into PBCS - will often press in a specific data field input screen. When the data gets exported representing a specific record with all the data elements representing that data point (intersection) - think like a SQL record - the record element where the user pressed enter causes that record to break at that point - shifting the rest of the data elements in that record to the next line. This is very bad as each record must have the same amount of elements - causing downstream issues in a mapping. In effect one unique record becomes two broken records.
I need a Powershell script that looks at the improper CR LF (Windows system) and reforms each unique record. However, the majority of the records in the flat file are fine so the code will have to be able to discern the "mostly good" from the "very bad" cases.
My flat file is pipe delimited and has a header element. The header element may not need to be considered as I am simply trying to address the fix - a solution could potentially look at the amount of property values for the header record to determine how to format broken records based off a property count using the pipe delimiter - but not sure that is necessary.
I will be honest - there are Jython scripts I tried to no avail - so I felt given that I have employed a couple Powershell scripts for other reasons in the past that I would use this again. I have a basis of a script for a csv file - but this isn't quite working.
$file = Get-Content 'E:\EPM_Cloud\Exports\BUD_PLN\Data\EXPORT_DATA_BUD_PLN.txt'
$file| Foreach-Object {
foreach ($property in $_.PSObject.Properties) {
$property.Value = ($property.Value).replace("`r","").replace("`n","")
}
}
$file|out-file -append 'E:\EPM_Cloud\Exports\BUD_PLN\Data\EXPORT_DATA_BUD_PLN_FINAL.txt'
Here are a few examples of what the before and after case would be if I could get this code to work.
This is supposed to be one record - as you see beginning with "$43K from... the user pressed enter several times. As you see it is pipe delimited - I use the numeric line numbers to show you what I mean since this isn't notepad++. The idea is this should all just be on 1.
Contract TBD|#missing|#missing|#missing|#missing|ORNL to Perform Radio-Chemical (RCA) Measurements|#missing|#missing|#missing|#missing|"$43K from above
$92,903
$14,907
The current $150K to be reprogrammed to XXX, plus another $150K from Fuel Fac for this item to be reprogrammed to RES."|#missing|#missing|#missing|"Summary|X0200_FEEBASED|No_BOC|O-xxxx-B999|xx_xxx_xx_xxx|Plan|Active|FY19|BegBalance"|COMMIT
This is what the output should look like (I have attached screenshots instead). All in 1.
Contract TBD|#missing|#missing|#missing|#missing|ORNL to Perform Radio-Chemical (RCA) Measurements|#missing|#missing|#missing|#missing|"$43K from above $92,903 $14,907 The current $150K to be reprogrammed to XXX, plus another $150K from Fuel Fac for this item to be reprogrammed to RES."|#missing|#missing|#missing|"Summary|X0200_FEEBASED|No_BOC|O-xxxx-B999|xx_xxx_xx_xxx|Plan|Active|FY19|BegBalance"|COMMIT
In other cases the line breaks just once - all defined just by how many times the user presses enter.enter image description here
As you see in the data image - you see how the line splits - this is the point of the powershell. As you see next to that screenshot image - other lines are just fine.