4

I am a vim user and can use some basic awk or bash commands. Now I have a text (vcf) file with size more than 20G. What I wanted is to move the line #69 to below line#66:

$less huge.vcf
...
    66 ##contig=<ID=9,length=124595110>                                                                                                                                                       
    67 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    68 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
    69 ##contig=<ID=MT,length=16299>
...

What I wanted is:

...
    66 ##contig=<ID=9,length=124595110>     
    67 ##contig=<ID=MT,length=16299>                                                                                                                                                  
    68 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    69 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
...

I tried to open and edit it using vim (LargeFile plugin installed), but still not working very well.

codeforester
  • 39,467
  • 16
  • 112
  • 140
David Z
  • 6,641
  • 11
  • 50
  • 101
  • Just moving content around inside a small subset of the file, not changing the length of that section? That's very nice -- means you can actually do this efficiently! – Charles Duffy May 15 '17 at 20:06
  • 1
    (By contrast, adding new content at the beginning of a large file or deleting content in a way that modifies the overall file's length is only possible when rewriting the entire file after the spot where modifications take place if you're limited to standard UNIX syscalls. Modern Linux has some extensions that let you insert and remove sections matching block/page size -- typically, 4kb chunks -- at exact block/page boundaries when using a filesystem with the appropriate extensions, but that's typically only of limited use). – Charles Duffy May 15 '17 at 20:13
  • 1
    Part of the problem with `vim` is it tries to recalculate line numbers after an edit. I'm too lazy to create a 20GB file, but using an `ex` command like `:69m66` might help. – chepner May 16 '17 at 11:25

2 Answers2

14

The easy approach is to copy the section you want to edit out of your file, modify it in-place, then copy it back in.

# extract the first hundred lines
head -n 100 huge.txt >start.txt

# modify that extracted subset
vim start.txt

# copy that section back into the beginning of larger file
dd if=start.txt of=huge.txt conv=notrunc

Note that this only works if your edits don't change the size of the section being modified. That is to say -- make sure that start.txt has the exact same size in bytes after being modified that it had before.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

Here's an awk version:

$ awk 'NR>=3 && NR<=4{b=b (b==""?"":ORS) $0;next}1;NR==5 {print b}' file
...
    66 ##contig=<ID=9,length=124595110>
    69 ##contig=<ID=MT,length=16299>
    67 ##contig=<ID=X,length=171031299>
    68 ##contig=<ID=Y,length=91744698>
...

You need to change the line numbers in the code, though. 3 -> 67, 4 -> 68 and 5 -> 69 and redirect the output to a new file. If you' like it to perform inplace, use i inplace for GNU awk.

James Brown
  • 36,089
  • 7
  • 43
  • 59