0

I have been trying to generate an ECDF graph for a group of files that contain a column of numbers from which the file should be generated.

The input files look like this:

122
34.5
566
...

I am able to generate the ECDF graph for one file at a time with this script that I run in Rstudio:

input <- read.table('/home/agalvez/data/domains/test_ecdf.txt', sep="\t", header=FALSE)
names(input)=c("Values")

#Build the function
myecdf <- ecdf(input$Values)
#Plot the function
plot(myecdf, main = "CDF", xlab = "Bit score", ylab = "Probability")

I have been trying to apply this script to the whole directory of files (too big to do it one by one) and I did not have success. I have been reading and I think a for loop could be a good solution, but unfortunately I do not know how to implement it. Could someone give me some tips on this?

UPDATE-----

My last attempt, following your suggestions was:

library(tidyverse)
plots <-
  list.files("/home/agalvez/data/domains/bits/", full.names = TRUE, recursive = TRUE, pattern = "") %>%
  map(~ {
    input <- read.table(.x, sep = "\t", header = FALSE)
    names(input) <- c("Values")
    myecdf <- ecdf(input$Values)
    
    p <- recordPlot()
    plot(myecdf, main = "CDF", xlab = "Bit score", ylab = "Probability")
    p
  })

# first plot
plots[[1]]

It produced the following errors:

'Error in recordPlot() : no current device to record from'
'Error in plots[[1]] : subscript out of bounds'
  • 1
    you can use `list.files()` to get all the files in the directory and then use `lapply()` to read and plot every file – shs Mar 22 '22 at 12:34

1 Answers1

1
library(tidyverse)

plots <-
  list.files(".", full.names = TRUE, recursive = TRUE, pattern = "txt$") %>%
  map(~ {
    input <- read.table(.x, sep = "\t", header = FALSE)
    names(input) <- c("Values")
    myecdf <- ecdf(input$Values)
    
    p <- recordPlot()
    plot(myecdf, main = "CDF", xlab = "Bit score", ylab = "Probability")
    p
  })

# first plot
plots[[1]]
danlooo
  • 10,067
  • 2
  • 8
  • 22
  • Hi! Thanks for your help! I tried your approach but rstudio says 'could not find function "%>%" ' . I will update my question with my attempt and error message. Thanks! – Alex galvez morante Mar 22 '22 at 14:53
  • Install the tidyverse or use `|>` in newer verrsions of R > 4.1 – danlooo Mar 22 '22 at 15:09
  • Thanks! I did, and now the error is "Error in plots[[1]] : subscript out of bounds". What could be causing this error? Can it be the pattern that we are using? – Alex galvez morante Mar 23 '22 at 15:45
  • To add extra info; my files are named specificname.hmmsearch.bits.txt – Alex galvez morante Mar 23 '22 at 16:17
  • I guess you have no files left. You can also remove a pipe step and only execute the stuff until any `%>%` to see the output. I suggest setting `pattern=""` and changing `"."` to e.g. `"/absolute/path/of/directory/containing/hmm_files"` – danlooo Mar 23 '22 at 18:50
  • Thanks for your tips, I updated my code, but the same error still persists and a new one appeared. – Alex galvez morante Mar 24 '22 at 10:36