How to use wildcards to define col_type when using readr?

Question

I just asked a few days ago, how to set a specific column type when using readr package. big integers when reading file with readr in r

Is there a way to define the column names by wildcard? In my case, I have sometimes several columns starting with Intensity and an appendix depending on the experiment. It is hard to use read_tsv in a function if you not know upfront which project names where used.

So something like col_types = cols('Intensity.*' = col_double()) would be awesome.

Anyone an idea how to get this feature?

EDIT: Maybe something like read the first 2 lines, grep 'Intensity' in the names and then somehow create this parameter like cols(Intensity=col_double(), 'Intensity pg'=col_double(), 'Intensity hs'=col_double()). But I have no idea how to create this parameter value on the fly.

Maybe you can build up sth upon `txt <- "foo,bar1,bar2\n1,2,3";matches <- grep("^bar\\d+", strsplit(readLines(textConnection(txt), n=1),",",T)[[1]], value=T);read_csv(txt, col_types=setNames(rep(list(col_character()), length(matches)), matches))`. — lukeA, Aug 23 '16 at 11:44
Thanks, that works. I was not aware I can just provide a list to the `col_types`. Perfect. Would u write an answer so I can give credit?! — drmariod, Aug 23 '16 at 12:11
You can set a `.default` if you wrap the specifications in `cols`, which may be useful, depending on how many columns of what type you have. — alistaire, Aug 23 '16 at 23:32
It is pretty mixed from integer to text. So the `.default` wouldn't help much I guess. — drmariod, Aug 24 '16 at 09:08

score 3 · Accepted Answer · answered Aug 24 '16 at 09:12

I add the answer which solved my question, based on the comment of lukeA...

read_MQtsv <- function(file) {
  require('readr')
  jnk <- read.delim(file, nrows=1, check.names=FALSE)
  matches <- grep('Intensity|LFQ|iBAQ', names(jnk), value=TRUE)
  read_tsv(file, 
           col_types=setNames(
             rep(list(col_double()), length(matches)), 
             matches))
}

So I adapted the single line from the comment to a new function which I would use when reading my special files which are produced by a program called MaxQuant.

How to use wildcards to define col_type when using readr?

1 Answers1