0

I have around 600 .txt files in this format

Position    SRR7622449
chr1_944296 1
chr1_944307 1
chr1_946247 1
chr1_1014274    1
chr1_1401954    1
chr1_1541864    1

Each file has 2 columns, a column called Position which exists in every file; a second column which denotes the identifier and is different in every file.

The number of rows in every file is different.

I wish to merge all of these 600+ files into one dataframe and add whichever values are duplicated so that in the end I have unique rows.

This is what I tried first

require(readr)
require(dplyr)
require(tidyr)

files <- dir(pattern = "*.txt")
data <- files %>% map(read_tsv) %>% bind_rows()

This gave me a huge dataframe and i found out that the Position column is now duplicated. I want this result:

Position    SRR7622449  SRR7622450
chr1_944296 2   1

instead of

Position    SRR7622449  SRR7622450
chr1_944296 1   NA
chr1_944296 1   1

When I try

data %>% group_by(Position) %>% summarise_each(funs(max))

I seem to be losing values. What do I do to fix this?

Seigfried
  • 51
  • 5
  • 1
    Try `data %>% group_by(Position) %>% summarise(across(.fns = sum, na.rm = TRUE))` Or for `dplyr` < 1.0.0 `data %>% group_by(Position) %>% summarise_all(sum, na.rm = TRUE)` – Ronak Shah Aug 03 '20 at 06:59
  • Thanks! That worked! How do I tag this as the correct answer so I can close this question? – Seigfried Aug 03 '20 at 07:10
  • Actually, this has been asked before so I marked this question as duplicate of original question. – Ronak Shah Aug 03 '20 at 07:12

0 Answers0