-2

I'm trying to compare two gene lists and extract the common ones. I sorted my .txt files and used comm command:

comm gene_list1.txt gene_list2.txt

Strangely, when I check the output, there are many common genes that are not printed in the third line. Here is part of the output:

enter image description here

As you can see, AAAS and AAGAB etc. exist in both files, but they are not printed as common lines! Any idea why this happens?

Thank you

Newcomer
  • 47
  • 1
  • 1
  • 5

1 Answers1

0

$ comm file1.txt file2.txt

The output of the above command contains three columns where the first column is separated by zero tabs and contains names only present in file1.txt.

The second column contains names only present in file2.txt and separated by one tab.

The third column contains names common to both the files and is separated by two tabs from the beginning of the line.

This is the default pattern of the output produced by comm command when no option is used.

I am assuming, both the input files are in the sorted order. Then the required command for your use case would be

$ comm -12 gene_list1.txt gene_list2.txt

This means both the columns (1 and 2) are suppressed (not displayed). Since you are only interested in the elements common to both the files.

Ajay Kr Choudhary
  • 1,304
  • 1
  • 14
  • 23