I would like to filter a file so that I can obtain rows that match in column 1 and do not match in column 2. In the following example:
00b27c71-a833-4605-9fb3-a2714ac98092 ENST00000352983.6 157 60 16
00d77e65-466e-4fe6-ad0f-bc6b3f44af75 ENST00000367142.4 130 12 4
00d77e65-466e-4fe6-ad0f-bc6b3f44af75 ENST00000367142.4 8 60 0
00b27c71-a833-4605-9fb3-a2714ac98091 ENST00000258424.2 12 60 2048
00b27c71-a833-4605-9fb3-a2714ac98091 ENST00000352983.6 157 60 16
00d77e65-466e-4fe6-ad0f-bc6b3f44af74 ENST00000367142.5 130 12 4
00d77e65-466e-4fe6-ad0f-bc6b3f44af74 ENST00000367142.7 8 60 0
00d77e65-466e-4fe6-ad0f-bc6b3f44af74 ENST00000258424.2 8 60 0
I would like to find entires in column 1 that appear exactly twice, and that do NOT match in column 2, i.e. duplicates in the combiation column1,column2 should be ignored. So the expected output would be:
00b27c71-a833-4605-9fb3-a2714ac98091 ENST00000258424.2 12 60 2048
00b27c71-a833-4605-9fb3-a2714ac98091 ENST00000352983.6 157 60 16
What is in columns 3,4,5,etc is not important for filtering, but I do need to retain the information.
I also need to pipe this in from another output that is necessary to read the file and retain the header. So I need something in the format:
samtools view -h file.bam | code that I need > results.bam
I have tried several version of awk, but to no avail. Any help would be much appreciated.