Combining multiple files by a common column and adding values

Question

I have around 600 .txt files in this format

Position    SRR7622449
chr1_944296 1
chr1_944307 1
chr1_946247 1
chr1_1014274    1
chr1_1401954    1
chr1_1541864    1

Each file has 2 columns, a column called Position which exists in every file; a second column which denotes the identifier and is different in every file.

The number of rows in every file is different.

I wish to merge all of these 600+ files into one dataframe and add whichever values are duplicated so that in the end I have unique rows.

This is what I tried first

require(readr)
require(dplyr)
require(tidyr)

files <- dir(pattern = "*.txt")
data <- files %>% map(read_tsv) %>% bind_rows()

This gave me a huge dataframe and i found out that the Position column is now duplicated. I want this result:

Position    SRR7622449  SRR7622450
chr1_944296 2   1

instead of

Position    SRR7622449  SRR7622450
chr1_944296 1   NA
chr1_944296 1   1

When I try

data %>% group_by(Position) %>% summarise_each(funs(max))

I seem to be losing values. What do I do to fix this?

Try `data %>% group_by(Position) %>% summarise(across(.fns = sum, na.rm = TRUE))` Or for `dplyr` < 1.0.0 `data %>% group_by(Position) %>% summarise_all(sum, na.rm = TRUE)` — Ronak Shah, Aug 03 '20 at 06:59
Thanks! That worked! How do I tag this as the correct answer so I can close this question? — Seigfried, Aug 03 '20 at 07:10
Actually, this has been asked before so I marked this question as duplicate of original question. — Ronak Shah, Aug 03 '20 at 07:12

Combining multiple files by a common column and adding values

0 Answers0