I have a tab delimited data set as such (some columns removed but not relevant here):
XYZ 12345671 18884 18912 222.00 222.00 Standard 2 CMD.1
ABC 12345672 18875 397.00 Standard 1 CMD.2
ABC 12345673 18889 18919 18950 146.66 146.66 146.68 Standard 3 CMD.2
XYZ 12345674 18840 18871 18901 193.00 193.00 192.00 Standard 3 CMD.1
Unfortunately in the data set there are two multi-value fields (with 1 or more sub-values) using the same delimiter as every other column. All of the 5 digit numbers starting with 18 are one field, then all of the decimal numbers are a second field. The number of sub-values will always match between these fields (that is, if the first has 2 values the second will have 2 values). The lonae number by itself between "Standard" and "CMD.X" is the number of sub-values in each of those fields. Basically, what I'm trying to create is:
XYZ 12345671 18884;18912 222.00;222.00 Standard 2 CMD.1
ABC 12345672 18875 397.00 Standard 1 CMD.2
ABC 12345673 18889;18919;18950 146.66;146.66;146.68 Standard 3 CMD.2
XYZ 12345674 18840;18871;18901 193.00;193.00;192.00 Standard 3 CMD.1
Due to the filesize, I am trying to use this with System.IO.File (as many as 100,000k records):
$File = [System.IO.File]::ReadAllText('in_file.txt')
ForEach($RecordSet In $File){
$RecordSet = $RecordSet -CReplace '(?m)(/d/d/d/d/d)`t(/d/d/d/d/d)','(/d/d/d/d/d);(/d/d/d/d/d)'
$RecordSet = $RecordSet -CReplace '(?m)([0-9]*\.[0-9]+|[0-9]+)`t([0-9]*\.[0-9]+|[0-9]+)','([0-9]*\.[0-9]+|[0-9]+);([0-9]*\.[0-9]+|[0-9]+)'
Add-Content ('out_file.txt") $RecordSet
}
Yes, I realize the in_file.txt is a stupid format -- it's not changeable unfortunately.