0

Kind of have a strange problem. I have a large JSON file that needs to be processed. Based on another question I need to stream the file since it will otherwise gets me problems because of memory: JSON Powershell memory issue

What I have is this:

get-content -Path largefile.json | ForEach-Object {
$row = $_ = $_.TrimStart('[').TrimEnd(']')
if ($_) { $_ | Out-String | ConvertFrom-Json }
New-Item -Path $($Row.Id).txt
Set-Content -Path $($Row.Id).txt -Value ($row.Body)
}

I can easily do $row to publish the last processed row in the Largefile.json. I want to create a file with the name of the Id in the row that is currently processed and add the body column to the file. But when I want to show a specific column using $row.Id, unfortunately this shows up empty.

The structure of the Largefile.json is as followed:

[{"Id":"1","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data1"}
{"Id":"2","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data2"}
{"Id":"3","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data3"}
{"Id":"4","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data4"}
{"Id":"5","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"data5"}
]

The end result should be that I have 5 files:

  • 1.txt - Value inside the file should be: data1

  • 2.txt - Value inside the file should be: data2

  • 3.txt - Value inside the file should be: data3

  • 4.txt - Value inside the file should be: data4

  • 5.txt - Value inside the file should be: data5

I use Powershell 7.1.3

Is there any way that I can use $row.Id and $row.ParentId just like a regular ForEach would do?

thanks for your help.

vulkoek
  • 23
  • 5
  • Ok, it took me a while to understand where you are looking for but I think this is the answer: In contrast to the [`ForEach` **statement**](https://learn.microsoft.com/powershell/module/microsoft.powershell.core/about/about_foreach) , the [`ForEach-Object` (alias `ForEach`) **cmdlet**](https://learn.microsoft.com/powershell/module/microsoft.powershell.core/foreach-object) has an ***automatic variable***: [**`$_`** or **`$PSItem`**](https://learn.microsoft.com/powershell/module/microsoft.powershell.core/about/about_automatic_variables) that represents the current item (`$Row`) – iRon May 04 '21 at 13:37
  • I know, but when I run $_.Id it should come up with Id only but it just shows empty – vulkoek May 04 '21 at 13:43
  • If I am still missing the point, please use the sample data in my answer below (with unique ids, etc), define the statement you would "regularly" use and what you would expect exactly as an outcome based on the example data. In other words, create an [mcve] in your question that looks like the actually situation. – iRon May 04 '21 at 13:43
  • I get invalid array for the json. It's missing commas. The code has 'invalid json primitive errors' as well. – js2010 May 04 '21 at 13:44

4 Answers4

1

The question has many errors. Assuming the json has the missing commas in, I would do it this way, if I understand the question. This should work with the new updates to the question. I also have a more unusual solution involving streaming json with jq here: Iterate though huge JSON in powershell Json streaming support may be added later: ConvertFrom-JSON high memory consumption #7698

[{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
 {"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"},
{"Id":"ID","ParentId":"parent","Name":"filename","OwnerId":"owner","CreatedDate":"date","Body":"*******"}
]
get-content -Path largefile.json | ForEach-Object {
  $_ = $_.TrimStart('[').TrimEnd(']').TrimEnd(',')
  if ($_) {
    $row = $_ | ConvertFrom-Json
    Set-Content -Path ($Row.Id + '.txt') -Value $row.Body
  }
}
get-content ID.txt

*******
js2010
  • 23,033
  • 6
  • 64
  • 66
0

I am still not sure what you expect as an outcome.
But I think you want to do this:

@'
[{"Id":"1","ParentId":"parent1","Name":"1.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data1"}
{"Id":"2","ParentId":"parent2","Name":"2.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data2"}
{"Id":"3","ParentId":"parent3","Name":"3.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data3"}
{"Id":"4","ParentId":"parent4","Name":"4.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data4"}
{"Id":"5","ParentId":"parent5","Name":"5.txt","OwnerId":"owner","CreatedDate":"date","Body":"Data5"}
]
'@ | Set-Content .\largefile.json

Get-Content .\largefile.json | ForEach-Object {
    $_ = $_.TrimStart('[').TrimEnd(']')
    If ($_) { 
        $Row = ConvertFrom-Json $_
        Set-Content -Path ".\$($Row.Name)" -Value $Row.Body
    }
}
iRon
  • 20,463
  • 10
  • 53
  • 79
  • I've added more information in the question, it shows that I want to retrieve a value to create a file. let me know if this clarifies it. And apologies for the rookie questions. – vulkoek May 04 '21 at 13:19
0

It seems to me that this is what you're looking for:

Get-Content largefile.json | ForEach-Object {
    $row = $_.TrimStart('[').TrimEnd(']') | ConvertFrom-Json
    if ($null -ne $row) {
        Set-Content -Path ($row.Id) -Value ($row.Body)
    }
}
Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
  • This only works for the first row, after that I get: ConvertFrom-Json: Line | 2 | $row = $_.TrimStart('[').TrimEnd(']') | ConvertFrom-Json | ~~~~~~~~~~~~~~~~ | Conversion from JSON failed with error: Additional text encountered after finished reading JSON content: ,. Path '', line 1, position 0. It will now process the same row over and over again – vulkoek May 04 '21 at 13:40
  • @vulkoek The error message indicates that the row has a trailing comma. Your example data is not formatted that way. You *may* just need to use `$_.TrimStart('[').TrimEnd(']').TrimEnd(',')`, but because I don't have a valid example of your JSON input this is a total shot in the dark. – Bacon Bits May 04 '21 at 13:49
0

As others already explained, your json example is invalid.

However, since this is a huge file to process, you can use switch for this.

switch -Regex -File D:\Test\largefile.json {
    '"Id":"(\d+)".*"Body":"(\w+)"' { 
        Set-Content -Path ('D:\Test\{0}.txt' -f $matches[1]) -Value $matches[2]
    }
}

Results using your example would be 5 files called 1.txt .. 5.txt, each having a single line data1 .. data5

Theo
  • 57,719
  • 8
  • 24
  • 41