3

I would like to delete collumns with a specific string "Gtype." from a .txt tab delimited file. I already have tried this command in R: df <- df[, -grep("GType.", colnames(df))] to do this task. However my matrix is too big (more than 13 GB), and R cannot deal with it. (Error: cannot allocate vector of size....)

My input file:

Log.NE122  Gtype.NE122  Log.NE144    Gtype.NE144
-0.33          AA          1.0           AB

My expected output:

   Log.NE122  Log.NE144  
    -0.33       1.0      

I am wondering that it works in bash. If someone have other options....

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user3091668
  • 2,230
  • 6
  • 25
  • 42

2 Answers2

2

Using awk:

awk 'NR==1{for (i=1; i<=NF; i++) if ($i ~ /Gtype/) a[i]; 
     else printf "%s%s", $i, OFS; print ""; next}
     {for (i=1; i<=NF; i++) if (!(i in a)) printf "%s%s", $i, OFS; print "" }' file
Log.NE122 Log.NE144
-0.33     1.0
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Hi Anubhava, I have an almost similar problem. My string would be like this `RT12-ABS-NSA` or like this `ADM_THO_CVL2000`. Yet, when I change the part of your script to look for that string, nothing is happening. Do you have any clue on the why? – Andy K Apr 17 '14 at 14:14
  • @AndyK: It is difficult to suggest anything without looking at your sample data and expected outcome. I suggest creating a question if possible with all the relevant details. – anubhava Apr 17 '14 at 14:34
  • apologies, Anubhava. Your solution works. I've amended it for my purpose `awk -F";" 'NR==1{for (i=1; i<=NF; i++) if ($i ~ /Gtype/) a[i]; else printf "%s%s", $i, OFS; print ""; next} {for (i=1; i<=NF; i++) if (!(i in a)) printf "%s%s", $i, OFS; print "" }'` but it removes my semi-colon and give me space instead. – Andy K Apr 17 '14 at 14:34
  • my question is here. Would you mind to have a look, please? http://stackoverflow.com/questions/23134450/remove-columns-with-string-match-bash? – Andy K Apr 17 '14 at 14:36
  • @anubhava , do you have to specify `FS` and `OFS` here? – DSTO Jul 28 '21 at 10:00
  • 1
    Default FS and OFS are 1 or more of spaces or tabs. In this problem it appears from question that default values would work well – anubhava Jul 28 '21 at 10:09
2

You can also try using the 'data.table' package and assign the columns NULL:

dt <- data.table(df)
dt[, colToDelete := NULL]

"data.table" tries to do most of its operations without having to make copies. The way that you are doing it on data.frames requires a copy to be made.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • `setDT(df)` prevents even the copy of `data.frame` to `data.table` - by converting by reference :). `setDT(df)[, col_to_delete := NULL]` – Arun Apr 26 '14 at 00:57