How to delete rows from a data file for which other data files have no data

Question

Suppose that I have three tab-separated value data files: 2011.txt, 2012.txt, and 2013.txt. Each file has the same format, where rows are like this:

UserID    Data    Data    Data   ...

Each file only contains data for the year it is named after. I would like to throw out all data in these files for UserIDs that do not make an appearance in either the preceding or following year. That is, I only want to keep data relating to UserIDs where I can track the UserID for at least two years in a row. How can I go about doing this? My usual tools for manipulating data files like this are vim, and using simple perl commands and regexp from the command line. If there is a way to do this using those tools, I'd like to do it that way. But I am open to learning new tools.

As an outline, I'm thinking:

run through each UserID in 2011.txt
  if UserID doesn't appear in 2012.txt, delete this row from 2011.txt
run through each UserID in 2012.txt
  if UserID doesn't appear in either 2011.txt or 2013.txt, delete this row from 2012.txt
run through each UserID in 2013.txt
  if UserID doesn't appear in 2012.txt, delete this row from 2013.txt

But I've never modified files in a way that accesses multiple files like this.

score 1 · Answer 1 · answered Oct 11 '14 at 06:11

use this:

#!/bin/bash
while (( "$2" )); do
  cut $2 -f1 | sed 's/^/\^/' > p.$2
  grep $1 -f p.$2 > $1.new
  shift
done
rm -rf p.*

Example :

$ cat 2011
1   d1  d2
2   d1  d2
3   d1  d2
4   d1  d2
5   d1  d2
6   d1  d2

$ cat 2012
1   d1  d2
3   d1  d2
4   d1  d2
6   d1  d2
7   d1  d2
8   d1  d2

$ cat 2013
1   d1  d2
2   d1  d2
4   d1  d2
5   d1  d2
6   d1  d2
8   d1  d2
10  d1  d2

running script:

./script 2011 2012 2013

two new file:

$ cat 2011.new 
1   d1  d2
3   d1  d2
4   d1  d2
6   d1  d2

$ cat 2012.new 
1   d1  d2
4   d1  d2
6   d1  d2
8   d1  d2

Please edit your answer to add an explanation of how your code works and how it solves the OP's problem. Many SO posters are newbies and will not understand the code you have posted. — i alarmed alien, Oct 11 '14 at 07:25

How to delete rows from a data file for which other data files have no data

1 Answers1