62

I am trying to just remove the first line of about 5000 text files before importing them.

I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:

set-content file (get-content unless line contains amount)

However, I can't seem to figure out how to do something like contains.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Buddy Lindsey
  • 3,560
  • 6
  • 30
  • 42

11 Answers11

56

While I really admire the answer from @hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).

Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

... or in short form:

(gc $file | select -Skip 1) | sc $file
Michael Sorens
  • 35,361
  • 26
  • 116
  • 172
46

It is not the most efficient in the world, but this should work:

get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Richard Berg
  • 20,629
  • 2
  • 66
  • 86
  • When I try to run this it seems that it errors out on the -skip. Could that maybe be from a different version? – Buddy Lindsey Jan 15 '10 at 20:41
  • 2
    -Skip is new to Select-Object in PowerShell 2.0. Also, if the files are all ascii then you might want to use set-content -enc ascii. If the encodings are mixed, then it gets trickier unless you don't care about the file encoding. – Keith Hill Jan 15 '10 at 20:51
13

Using variable notation, you can do it without a temporary file:

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
hoge
  • 203
  • 1
  • 3
9

I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.

My solution was to use a more .NET approach: StreamReader + StreamWriter. See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?

Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

It returned:

188.1224443
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
AASoft
  • 1,346
  • 8
  • 13
  • IIRC this is because the parentheses around the gc|select means it reads the entire file into memory before piping it through. Otherwise the open stream causes set-content to fail. For big files I think your approach is probably best – Alex Mar 15 '13 at 15:58
  • Thank you, @AASoft, for your great solution! I've allowed myself to improve it slightly by dropping the comparison operation in every loop speeding up the process by something like 25% - see [my answer](http://stackoverflow.com/a/24746158/177710) for details. – Oliver Jul 14 '14 at 21:20
8

Inspired by AASoft's answer, I went out to improve it a bit more:

  1. Avoid the loop variable $i and the comparison with 0 in every loop
  2. Wrap the execution into a try..finally block to always close the files in use
  3. Make the solution work for an arbitrary number of lines to remove from the beginning of the file
  4. Use a variable $p to reference the current directory

These changes lead to the following code:

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.

Oliver
  • 9,239
  • 9
  • 69
  • 100
  • 1
    You may want to add `-and !$ins.EndOfStream` to the `for` loop conditional to cover the cases where the file has fewer lines than `$skip`. – AASoft Nov 10 '17 at 07:11
  • Thanks for the heads up! That makes sense :-) – Oliver Nov 10 '17 at 11:32
5
$x = get-content $file
$x[1..$x.count] | set-content $file

Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.

For example, if we define an array variable like this,

$array = @("first item","second item","third item")

so $array returns

first item
second item
third item

then we can "index into" that array to retrieve only its 1st element

$array[0]

or only its 2nd

$array[1]

or a range of index values from the 2nd through the last.

$array[1..$array.count]
noam
  • 1,914
  • 2
  • 20
  • 26
4

I just learned from a website:

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

Or you can use the aliases to make it short, like:

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Luke Du
  • 41
  • 1
2

Another approach to remove the first line from file, using multiple assignment technique. Refer Link

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename
Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
1

skip` didn't work, so my workaround is

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force
Emperor XLII
  • 13,014
  • 11
  • 65
  • 75
1

Following on from Michael Soren's answer.

If you want to edit all .txt files in the current directory and remove the first line from each.

Get-ChildItem (Get-Location).Path -Filter *.txt | 
Foreach-Object {
    (Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}
-3

For smaller files you could use this:

& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null

... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.