1

I have a huge plain text file (~500Gb) on linux machine. I want the replace some string in header line (the first row of the file), but all the method I known seems to be slow and low efficiency.

example file:

foo apple cat
1    2    2
2    3    4
3    4    6
...

expected file output:

bar apple cat
1    2    2
2    3    4
3    4    6
...

sed:

sed -i '1s/foo/bar/g' file

-i can change the file in place, but this command generate a tmp file on disk and use the tmp file to replace the original one. The io waste time.


vim:

ex -c '1s/foo/bar/g' -c 'wq' file

vim doesn't generate a tmp file, but this tool load the whole file in to memory, which waste a lot of time either.


Is there a better solution that only read the first row in to memory and write it back to the original file? I known that linux head command can extract the first column very fast.

Chang Ye
  • 1,105
  • 1
  • 12
  • 25
  • Please add sample input and your desired output for that sample input to your question. – Cyrus Oct 08 '17 at 11:09
  • 3
    This can only be done if `foo` and `bar` are the same length (in bytes). Otherwise, rewriting the entire file is your only option (although it can be done in-place if the tool is smart enough). – Thomas Oct 08 '17 at 11:11
  • @cyrus I had add the example. – Chang Ye Oct 08 '17 at 11:11
  • 2
    Possible duplicate of [Reorder lines near the beginning of a huge text file (>20G)](https://stackoverflow.com/questions/43987897/reorder-lines-near-the-beginning-of-a-huge-text-file-20g) – Cyrus Oct 08 '17 at 11:14
  • 1
    @Thomas The new string and the old differs in length. It seems rewriting the file is the only solution, but I still wonder which command is the fastest way to do so. – Chang Ye Oct 08 '17 at 11:25
  • @ChangYe, could you please do let me know if you tried my awk solution, just curios to know about how it goes? – RavinderSingh13 Oct 10 '17 at 01:50

1 Answers1

0

Could you please try following awk command and let me know if this helps you, I couldn't test it as I don't have a huge size file like 500 GB. For sure it shouldn't create any temp file in backend as it is not using inplace substitution on Input_file.

awk 'FNR==1{$1="bar";print;next} 1' Input_file > temp_file && mv temp_file Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thank you @RavinderSingh13. This command can deal with this problem, but there isn't any improvement of speed. As Thomas suggested, there may not be a better way to do so, unless I keep the sting length unchanged. – Chang Ye Oct 10 '17 at 08:37