Removing entries from a file by matching a column with a column from another file

Question

I have two files f1.txt and f2.txt. I want to able to take remove rows within File 1 (f1.txt) if its first column has a matching entry in File2 (f2.txt). f2 has only 1 column per line where as each row of f.txt will have two or more columns. Here is an example:

cat f1.txt

1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 1000
2, 100, 200, 300, 400
3, 100, 2000, 3000
4, 400, 500 
5, 500, 600, 700, 800, 900, 1000

cat f2. txt

2
4

Here is the desired output:

1, 10, 20, 30, 40
3, 100, 2000, 3000, 400
5, 500, 600, 700, 800

read the column from f2.txt into a set, then for each line in f1.txt, split out the first column and see if its in the set. we don't write your code, just suggest how to improve it. — tdelaney, Dec 18 '14 at 01:40
Where did the 6th and subsequent fields from lines 1 and 5 go? Where did the `400` at the end of line 3 come from? Put just a TINY bit of effort into asking the question. — Ed Morton, Dec 18 '14 at 05:03
Try this: `awk 'FNR==NR{a[$1];next} {p=1;c=+$1;for (i in a) if(c==i) p=0} p' f2.txt f1.txt` — Jotne, Dec 18 '14 at 07:29

buydadip · Answer 1 · 2014-12-18T17:35:28.760

2

Modify the pattern file f2.txt, as so :

sed -i -e 's/^/\^/;s/$/\\b/' file1

f2.txt will look like

^2\b
^4\b
etc.

Then compare the files with grep:

grep -vf f2.txt f1.txt

edited Dec 18 '14 at 17:35

answered Dec 18 '14 at 03:33

buydadip

8,890
22
79
154

1

Any time you write a shell loop to manipulate text you have the wrong approach. Also - what do you think will happen given your solution if f2.txt has a line with a 1 in it and f1.txt has a line that starts with, say, 10? – Ed Morton Dec 18 '14 at 05:05
1

@Ed Morton You're right I forgot the comma, assuming that the numbers are separated by commas. What would you suggest would be a better approach, using awk?. – buydadip Dec 18 '14 at 05:11
1

yes, awk is the tool that the guys who invented shell invented for shell to call to manipulate text. The whole script can be done concisely, efficiently, and robustly as just `awk -F, 'NR==FNR{a[$0];next} !($1 in a)' f2.txt f1.txt` – Ed Morton Dec 18 '14 at 05:52
1

@EdMorton check my edit, do you approve? It works fine if f2.txt is a single column, as OP stated. – buydadip Dec 18 '14 at 07:35
I initially said "Looks good to me." but now I realise it'll match on any field, not just the first field, so it won't work. – Ed Morton Dec 18 '14 at 14:15
@EdMorton OK, I'm sure my new solution works...its the best I can do without copying the awk solution you provided. – buydadip Dec 18 '14 at 17:36

Removing entries from a file by matching a column with a column from another file

1 Answers1