0

I have text files with data that looks like this

chrom start end gene mutation
chr1 12756 12790 DVL1 T/C
chr1 12856 12890 DVL2 ./.
chr1 12956 12990 DVL3 T/C

I need to delete all the lines that contain ./. in them, the files are around 500 lines so I don't need anything super efficient.

I've tried a bunch of different approaches with no success, both to cut out the "./." and to cut out the lines that don't contain "./." in the final column.

grep -v "./." input.txt > output.txt

awk '/"./."/' input.txt > tmpfile && mv tmpfile output.txt

fgrep -xv "./." input.txt > output.txt

awk -F',' '$5 !~ "./." {print $0}' input.txt > output.txt

awk 'BEGIN { OFS=FS="\t" } $5 !~ /^('./.')/' input.txt > output.txt

awk '!"./." ' input.txt > output.txt

sed -i '"./."d' input.txt > output.txt

I feel like I'm close but just can't see what i'm missing, any help is appreciated.

kellogg76
  • 31
  • 1
  • 5
  • 3
    `grep -v '\./\.'`, escape the dots. Or use `grep -vF './.' `. See https://ideone.com/IcjM5Z – Wiktor Stribiżew Jun 10 '21 at 13:00
  • @WiktorStribiżew, OP is looking to ignore those lines, dupe given here is finding the lines with dots IMHO. – RavinderSingh13 Jun 10 '21 at 13:02
  • 1
    @RavinderSingh13 The only issue about escaping special char, a dot. Or using the `-F` option. Everything is covered in that post. In general, a dupe of [What special characters must be escaped in regular expressions?](https://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions), but I tried to find a `grep`-oriented post here – Wiktor Stribiżew Jun 10 '21 at 13:03
  • 3
    @WiktorStribiżew, I agree somewhat few things are covered, but when we have a full fledge answer then why to go for partial dupe. I am fine if we put a exact dupe to make this dupe. – RavinderSingh13 Jun 10 '21 at 13:04
  • @kellogg76, with `awk`, you can use string comparison instead of regex `'$NF != "./."'` – Sundeep Jun 10 '21 at 13:05
  • This `both to cut out the "./." and to cut out the lines that don't contain "./." in the final column` means don't print the line if there is `./.` in it right? Regardless of it being in the last column, and the result should be the 1st and the 3rd line? – The fourth bird Jun 10 '21 at 13:14
  • 2
    With your artistic rendering of what the input text looks like, we can't tell which of these is correct. `awk -F ','` would work if the input is comma-separated; `awk -F '\t'` would be correct if it's tab-separated. Either of those should have worked if you had used a correct regex, but we can't tell without further details. – tripleee Jun 10 '21 at 13:27
  • You showed us all the things you tried, but you didn't explain what didn't work. – Andy Lester Jun 10 '21 at 13:48
  • 3
    [edit] your question to show concise, testable, plain-text sample input and expected output so we can help you. No images, no links, no artistic tables, just raw text that we can copy/paste as-is to test a potential solution with. – Ed Morton Jun 10 '21 at 15:08

2 Answers2

1

Use this simple grep, simply use -v option to ignore given pattern lines.

grep -v '\./\.' Input_file

OR in awk try following:

awk '$NF=="./."{next} 1' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

We can't tell from your input data what your input file looks like. If it's tab-delimited, tell Awk to split on a tab:

awk -F '\t' '$5 != "./."' input.txt >output.txt

If you have a comma-delimited input file, the corresponding command would look like

awk -F ',' '$5 != "./."' input.txt >output.txt

The != string inequivalence operator is simpler to use than a regular expression here. We are simply saying "print lines where the fifth column is not exactly this string."

The corresponding regex would look like

awk -F ',' '$5 !~ /^\.\/\.$/' input.txt >output.txt

but you would obviously like to avoid the leaning toothpicks syndrome here.

In some more detail,

  • grep -v "./." is wrong because it removes any line with a slash with any character at all on either side. You can fix this by escaping the dots with a backslash or character class; grep -v '\./[.]' demonstrates both. This is still wrong in that it looks for the pattern anywhere, not just in the last field; but if you don't expect matches in other fields, maybe that's good enough.

  • awk '/"./."/' looks for a slash surrounded by literal double quotes on both sides, with any character in between.

  • fgrep -xv "./." is otherwise good, but the -x option limits the expression to only match lines which contain nothing else than the pattern.

  • awk -F',' '$5 !~ "./." {print $0}' would work for a comma-delimited file if you fix the regex. The { print $0 } is redundant but harmless.

  • awk 'BEGIN { OFS=FS="\t" } $5 !~ /^('./.')/' holds some promise for tab-delimited files, but the regex is hopelessly botched. The single quotes inside the regex are wrong, but will happily not break the syntax of the script; they will basically disappear before Awk processes the script because quotes are handled by the shell ... Long story short, read up on shell quoting.

  • awk '!"./." ' will not do anything; it says to print if the static string in the condition is empty, which it isn't.

  • sed -i '"./."d' input.txt > output.txt is wrong because the -i option will make changes to input.txt and not print anything to standard output; the regex is flawed both because of the quoting problems and because it needs to be surrounded by valid regex delimiters. sed '/\.\/\./d' input.txt > output.txt would work similarly to the first grep example above.

tripleee
  • 175,061
  • 34
  • 275
  • 318