-1

I want to extract a few thousands lines from a giant CSV file (~15GB, 6 million lines) from line number X to line number Y, without using a lot of RAM.

Using Powershell 2.0 from the command line interpreter, I was able to extract the first 2000 lines with:

PS> Get-Content -TotalCount 2000 file.csv > first_lines.csv,

and the last 2000 lines (skipping the first 5,998,000 ones), from the cmd.exe interpreter itself, with:

more +5998000 file.csv > last_lines.csv,

but now I want to extract, say, from line 3,000,001 to line 3,002,000, without having to create huge new files or put too much pressure on RAM.

Thanks in advance!

  • PowerShell 2.0 was released over a decade ago; in 2009. It would be worth the effort to get onto the current Windows Powershell 5.1 or, better yet, PowerShell Core 7.1. https://github.com/PowerShell/PowerShell – lit Jun 18 '21 at 14:23

2 Answers2

1

The -Index parameter to Select-Object can specify the range.

Get-Content -Path .\file.csv | Select-Object -Index (3000001..3002000)

Using variables makes it more flexible.

$x = 3000001
$y = 3002000
Get-Content -Path .\file.csv | Select-Object -Index ($x..$y)
lit
  • 14,456
  • 10
  • 65
  • 119
0

What about the following :

rem // Initialise counter:
set /A "COUNT=0"
rem // Write to output file:
> "first_lines.csv" (
    rem // Loop through lines beginning at a certain line number:
    for /F usebackq^ skip^=5998000^ delims^=^ eol^= %%I in ("file.csv") do (
    rem // This is an alternative way:
    rem for /F delims^=^ eol^= %%I in ('more +5998000 "file.csv"') do (
        rem // Return currently iterated line:
        echo(%%I
        rem // Increment counter:
        set /A "COUNT+=1"
        rem // Check counter state and conditionally terminate loop:
        setlocal EnableDelayedExpansion
        if !COUNT! geq 2000 goto :NEXT
        endlocal
    )
)
:NEXT

Regard that blank lines are skipped by for /F, and that lines are limited to about 8190 characters/bytes.

aschipfl
  • 33,626
  • 12
  • 54
  • 99