0

I'm new in code so sorry if may question seems dummy

so I loop on csv datas to build an xml

my issue is that i can only pu 5000 files max in an xml and i need to put more than an million? So i'll to make multiple xml with 5000 files in each The first part of my code is done. Now i need to create a counter who would say in my first "for each" loop that as soon as i've reached 5000 lines (and so 5000 documents) i need to return to the beginnig of my first treatment I can't find a way to produce my multiple xml

do/while do/until and the way to produce it

Here is an example of my csv

    UCB63_DATENUM;U6618_FILENAME;UF6E8_CANAL;U65B8_IDRP
    7/8/19 22:27;457E6659_ZN_LIQRLVPR_A_V_ML.pdf;ML;1367091
    9/11/19 23:03;49453878_ZN_LIQRLVPR_A_V_ML.pdf;ML;106440
    9/24/19 21:04;497E585B_ZN_LIQRLVPR_A_V_CS.pdf;CS;1536658
    2/12/20 22:12;58453B75_ZN_LIQRLVPR_A_V_ML.pdf;ML;1406091

So i thought i could count the line of the first column and as soon as i reach 5000 I call my first loop

If you need further information just ask, i've got no idea how to do

Ok so i will put you the whole code

    $FREQUENCE_DECOMPTE = 'Index Nom="FREQUENCE_DECOMPTE" Valeur="MENS"'
    
    $LIBELLE_ORGANISME = 'Index Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"'
    
    $MONTANT_TOTAL = 'Index Nom="MONTANT_TOTAL" Valeur="0"'
    
    $POLE = 'Index Nom="POLE" Valeur="1ADP"'
    
    $CODE_ORGANISME = 'Index Nom="CODE_ORGANISME" Valeur="1ADP"'
    
    $RecupDateFinTraitement = Get-Date
    
    $2ndDate = $recupdatefintraitement.AddDays(30)
    
    $liste = Import-Csv -path C:\Temp\test.csv -Delimiter ';'
     
    Import-Csv -path C:\Temp\test.csv -Delimiter ';'
    
    $boucle = foreach ($list in $liste)
    
    {
    
        $list.U6618_FILENAME
    
        $FREQUENCE_DECOMPTE
    
        $LIBELLE_ORGANISME
    
        $MONTANT_TOTAL
    
        $list.UCB63_DATENUM
    
        $POLE
    
        $CODE_ORGANISME
    
        $list.U6618_FILENAME
    
        $list.U65B8_IDRP
    
        $RecupDateFinTraitement.ToString('dd/MM/yyyy')
    
        $2ndDate.ToString('dd/MM/yyyy')
    
        $ALERTEMAIL.UF6E8_CANAL
     
        $fin = $list.UF6E8_CANAL -match "ML"
    
        if ($list.UF6E8_CANAL -match "ML"){1}
    
        else {0}
    
        **## Compteur
    
        $Compte = Get-Content C:\Temp\test.csv | Measure-Object -Line
        $compte.lines
        if ($Compte.Lines -gt 2)
        {
             do{$boucle} until ($Compte -gt 5) 
        }**
     
    
    }
    
    $boucle | Out-File -FilePath C:\Temp\test2.txt

I don't think explainig the xml would help i just want to lool inside the first loop in order to relaunche the first loop called $boucle if i reach 5000 documents in the first output file (or maybe easier as soon as i reach the 5000th line of the csv) I still have work on the shape of the xml as you see by the name of the output file

Regards

Jonathon Anderson
  • 1,162
  • 1
  • 8
  • 24

2 Answers2

1

It sounds like you want to do 2 things:

  • Split an array of 1,000,000+ records from a csv file into batches of 5000
  • Create an xml document for each batch of 5000 records

We can do the first bit with a helper function something like this:

function ConvertTo-Batches
{
    param(
        [Parameter(ValueFromPipeline=$true)]
        $InputArray,
        $Size
    )
    BEGIN {
        $batch = @();
        $counter = 1;
    }
    PROCESS {
        $batch += $_;
        if( $batch.Length -eq $Size )
        {
            write-output @(, $batch);
            $batch = @();
            $counter += 1;
        }
    }
    END {
        if( $batch.Length -gt 0 )
        {
            write-output @(, $batch);
        }
    }
}

which you can use like this:

@( 1, 2, 3, 4, 5 ) | ConvertTo-Batches -Size 2 | ForEach-Object { write-host $_ }
# batches = 
# 1 2
# 3 4
# 5

Then all you need to do is pass each batch into a second function which outputs an xml document for that batch:

function Convert-BatchToXmlDocument
{
    param(
        [Parameter(ValueFromPipeline=$true)]
        $Batch
    )
    BEGIN {
        $counter = 1;
    }
    PROCESS {
        $filename = "myfile-$counter.xml";
        # ... convert a batch to xml here ...
        $counter += 1;
    }
}

Putting it together, your code becomes something like this:

$csv = @"
UCB63_DATENUM;U6618_FILENAME;UF6E8_CANAL;U65B8_IDRP
7/8/19 22:27;457E6659_ZN_LIQRLVPR_A_V_ML.pdf;ML;1367091
9/11/19 23:03;49453878_ZN_LIQRLVPR_A_V_ML.pdf;ML;106440
9/24/19 21:04;497E585B_ZN_LIQRLVPR_A_V_CS.pdf;CS;1536658
2/12/20 22:12;58453B75_ZN_LIQRLVPR_A_V_ML.pdf;ML;1406091
"@

$csv | ConvertFrom-Csv -Delimiter ";" | ConvertTo-Batches -Size 3 | Convert-BatchToXmlDocument
mclayton
  • 8,025
  • 2
  • 21
  • 26
  • I'm not sure this is the way i'd wouldlike to do it – Hadrien Beaujean Jul 03 '20 at 09:42
  • I will keep your solution as my last solution but it is not appropriate in my case This seems quite complicate for me – Hadrien Beaujean Jul 03 '20 at 09:43
  • i'd would like simply to say to my program do the loop unti line 5000 is reach. Then reloop in the first loop until the last line of my csv (or the first empty line if you rather) – Hadrien Beaujean Jul 03 '20 at 09:46
  • In order to give you a simpler answer, we need to understand what you consider to be simple. Different people have different ideas about simplicity when they first begin to code. – Walter Mitty Jul 04 '20 at 10:55
0

Not sure of course how you would like to format your XML, but the below code should give you a way to do it.

$FREQUENCE_DECOMPTE     = 'Nom="FREQUENCE_DECOMPTE" Valeur="MENS"'
$LIBELLE_ORGANISME      = 'Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"'
$MONTANT_TOTAL          = 'Nom="MONTANT_TOTAL" Valeur="0"'
$POLE                   = 'Nom="POLE" Valeur="1ADP"'
$CODE_ORGANISME         = 'Nom="CODE_ORGANISME" Valeur="1ADP"'

# read the csv file
$liste = Import-Csv -path 'D:\Test\test.csv' -Delimiter ';'

# get the total remaining records to process
$remaining = $liste.Count

# set a maximum value of items for each resulting XML file
$maxItemsPerXml = 3

# set a xml output file counter and an item index counter
$xmlFileCount = 1
$currentItem  = 0
# loop through all items
while ($remaining -gt 0) {
    $itemCount = [math]::Min($maxItemsPerXml, $remaining)
    $xmlItems = for ($i = 0; $i -lt $itemCount; $i++) {
        $item = $liste[$i + $currentItem]
        $fin  = if ($item.UF6E8_CANAL -eq "ML") {1} else {0}
        # parse the date from the 'UCB63_DATENUM' field
        $date = [datetime]::ParseExact($item.UCB63_DATENUM, 'M/d/yy HH:mm', $null)
        $RecupDateFinTraitement = $date.ToString('dd/MM/yyyy')
        $RecupDateFin30         = $date.AddDays(30).ToString('dd/MM/yyyy')

        # output each item in xml-style
        # you can change the element names to whatever you want of course
@"
    <Item>
        <U6618_FILENAME>$($item.U6618_FILENAME)</U6618_FILENAME>
        <Indices>
            <Index $FREQUENCE_DECOMPTE></Index>
            <Index $LIBELLE_ORGANISME></Index>
            <Index $MONTANT_TOTAL></Index>
            <Index $POLE></Index>
            <Index $CODE_ORGANISME></Index>
        </Indices>
        <UCB63_DATENUM>$($item.UCB63_DATENUM)</UCB63_DATENUM>
        <U65B8_IDRP>$($item.U65B8_IDRP)</U65B8_IDRP>
        <RecupDateFinTraitement>$RecupDateFinTraitement</RecupDateFinTraitement>
        <RecupDateFin30>$RecupDateFin30</RecupDateFin30>
        <ALERTEMAIL>$($item.UF6E8_CANAL)</ALERTEMAIL>
        <Fin>$fin</Fin>
    </Item>
"@  

    }

    # create a complete file path and name for the output xml
    $xmlFile = 'D:\Test\Test_{0:D8}.xml' -f $xmlFileCount
    # create the XML content, complete with declaration and root node and write it to file
@"
<?xml version="1.0" encoding="utf-8"?>
<root>
$($xmlItems -join "`r`n")
</root>
"@ | Set-Content -Path $xmlFile -Encoding UTF8

    # update the counters
    $xmlFileCount++
    $remaining -= $itemCount
    $currentItem += $itemCount
}

You may want to tinker with the way the XML is now built using Here-String templates. I simply took the header names from the CSV as element names, except for the indices variables you defined in all caps.

From those I took off the Index part and set that as element name.

Output using your example CSV and $maxItemsPerXml = 3, will be 2 files called Test_00000001.xml and Test_00000002.xml.

Test_00000001.xml

<?xml version="1.0" encoding="utf-8"?>
<root>
    <Item>
        <U6618_FILENAME>457E6659_ZN_LIQRLVPR_A_V_ML.pdf</U6618_FILENAME>
        <Indices>
            <Index Nom="FREQUENCE_DECOMPTE" Valeur="MENS"></Index>
            <Index Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"></Index>
            <Index Nom="MONTANT_TOTAL" Valeur="0"></Index>
            <Index Nom="POLE" Valeur="1ADP"></Index>
            <Index Nom="CODE_ORGANISME" Valeur="1ADP"></Index>
        </Indices>
        <UCB63_DATENUM>7/8/19 22:27</UCB63_DATENUM>
        <U65B8_IDRP>1367091</U65B8_IDRP>
        <RecupDateFinTraitement>08-07-2019</RecupDateFinTraitement>
        <RecupDateFin30>07-08-2019</RecupDateFin30>
        <ALERTEMAIL>ML</ALERTEMAIL>
        <Fin>1</Fin>
    </Item>
    <Item>
        <U6618_FILENAME>49453878_ZN_LIQRLVPR_A_V_ML.pdf</U6618_FILENAME>
        <Indices>
            <Index Nom="FREQUENCE_DECOMPTE" Valeur="MENS"></Index>
            <Index Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"></Index>
            <Index Nom="MONTANT_TOTAL" Valeur="0"></Index>
            <Index Nom="POLE" Valeur="1ADP"></Index>
            <Index Nom="CODE_ORGANISME" Valeur="1ADP"></Index>
        </Indices>
        <UCB63_DATENUM>9/11/19 23:03</UCB63_DATENUM>
        <U65B8_IDRP>106440</U65B8_IDRP>
        <RecupDateFinTraitement>11-09-2019</RecupDateFinTraitement>
        <RecupDateFin30>11-10-2019</RecupDateFin30>
        <ALERTEMAIL>ML</ALERTEMAIL>
        <Fin>1</Fin>
    </Item>
    <Item>
        <U6618_FILENAME>497E585B_ZN_LIQRLVPR_A_V_CS.pdf</U6618_FILENAME>
        <Indices>
            <Index Nom="FREQUENCE_DECOMPTE" Valeur="MENS"></Index>
            <Index Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"></Index>
            <Index Nom="MONTANT_TOTAL" Valeur="0"></Index>
            <Index Nom="POLE" Valeur="1ADP"></Index>
            <Index Nom="CODE_ORGANISME" Valeur="1ADP"></Index>
        </Indices>
        <UCB63_DATENUM>9/24/19 21:04</UCB63_DATENUM>
        <U65B8_IDRP>1536658</U65B8_IDRP>
        <RecupDateFinTraitement>24-09-2019</RecupDateFinTraitement>
        <RecupDateFin30>24-10-2019</RecupDateFin30>
        <ALERTEMAIL>CS</ALERTEMAIL>
        <Fin>0</Fin>
    </Item>
</root>

Test_00000002.xml

<?xml version="1.0" encoding="utf-8"?>
<root>
    <Item>
        <U6618_FILENAME>58453B75_ZN_LIQRLVPR_A_V_ML.pdf</U6618_FILENAME>
        <Indices>
            <Index Nom="FREQUENCE_DECOMPTE" Valeur="MENS"></Index>
            <Index Nom="LIBELLE_ORGANISME" Valeur="HUMANIS CCN OG"></Index>
            <Index Nom="MONTANT_TOTAL" Valeur="0"></Index>
            <Index Nom="POLE" Valeur="1ADP"></Index>
            <Index Nom="CODE_ORGANISME" Valeur="1ADP"></Index>
        </Indices>
        <UCB63_DATENUM>2/12/20 22:12</UCB63_DATENUM>
        <U65B8_IDRP>1406091</U65B8_IDRP>
        <RecupDateFinTraitement>12-02-2020</RecupDateFinTraitement>
        <RecupDateFin30>13-03-2020</RecupDateFin30>
        <ALERTEMAIL>ML</ALERTEMAIL>
        <Fin>1</Fin>
    </Item>
</root>
Theo
  • 57,719
  • 8
  • 24
  • 41