2

So I want to know how I could get content from a file and count the consecutive occurrences of a string within that file? So my file has the following strings:

1
1
1
0
0
0
0
1
1
1
0
1
1
0
0
0
1
0
1
1
1
0
0

Now the thing is I know next to nothing about powershell, but know bash, so if somebody understands both, this is my desired effect:

[me@myplace aaa8]$ cat fule1|uniq -c
      3 1
      4 0
      3 1
      1 0
      2 1
      3 0
      1 1
      1 0
      3 1
      2 0

And if it's possible, also add the powershell equivalent of sort -hr :D

[me@myplace aaa8]$ cat fule1|uniq -c|sort -hr
      4 0
      3 1
      3 1
      3 1
      3 0
      2 1
      2 0
      1 1
      1 0
      1 0

So basically what this does is it tells me that the file I had has the longest streak of 4 zeroes, etc.

Is there a way to do this with powershell?

mklement0
  • 382,024
  • 64
  • 607
  • 775
Lha
  • 33
  • 3
  • Could be something like that: `[regex]::Matches('aaaaaaaaaaaaabbbbbbbbccc', '(.)\1+').Groups | Where-Object { $_.Length -gt 1 } | Sort-Object -Unique -Property Value` combined with `[RegexOptions]::Multiline` option for your task. `Measure-Object` command might be useful too. I'm not sure about your input data size and how fast regular expressions will work. – Rabash Mar 23 '19 at 23:50
  • @Rabash: `uniq -c` doesn't exclude single instances, so your solution won't work. In general, future readers benefit most from full-fledged answers, not (half-)solutions in comments – mklement0 Mar 24 '19 at 03:32

1 Answers1

1

PowerShell's equivalent to the uniq utility, the Get-Unique cmdlet, unfortunately has no equivalent to the former's -c option for prepending the number of consecutive duplicate lines (as of PowerShell v6.2).

Note: Enhancing Get-Unique to support a -c-like feature and other features offered by the uniq POSIX utility is the subject of this feature request on GitHub.

Therefore, you must roll your own solution:

function Get-UniqueWithCount {

  begin {
    $instanceCount = 1; $prevLine = $null
  }

  process {
    if ($_ -eq $prevLine) {
      ++$instanceCount
    } elseif ($null -ne $prevLine) {
      [pscustomobject] @{ InstanceCount = $instanceCount; Line = $prevLine }
      $instanceCount = 1
    }
    $prevLine = $_
  }

  end {
    [pscustomobject] @{ InstanceCount = $instanceCount; Line = $prevLine }
  }

}

The above function accepts input from the pipeline (object by object as $_ in the process { ... } block). It compares each object (line) to the previous one and, if they're equal, increments the instance count; once a different line is found, the previous line is output, along with its instance count, as an object with properties InstanceCount and Line. The end { ... } block outputs the final output object for the last block of identical consecutive lines. See about_Functions_Advanced.

Then invoke it as follows:

Get-Content fule | Get-UniqueWithCount

which yields:

InstanceCount Line
------------- ----
            3 1
            4 0
            3 1
            1 0
            2 1
            3 0
            1 1
            1 0
            3 1
            2 0

Since Get-UniqueWithCount conveniently outputs objects whose typed properties we can act on, the equivalent of sort -hr (sort by embedded numbers (-h), in descending (reverse) order (-r)) is easy:

Get-Content fule | Get-UniqueWithCount | Sort-Object -Descending InstanceCount

which yields:

InstanceCount Line
------------- ----
            4 0
            3 1
            3 1
            3 0
            3 1
            2 1
            2 0
            1 0
            1 1
            1 0
mklement0
  • 382,024
  • 64
  • 607
  • 775