1

This is a very beginner question, so thanks in advance.

I was given an R script to align from fastq files to a genome. All I need to do is sent this R script to my uni's cluster, but I want to make sure the script is running fine on my own computer before I send it into the void. I am trying to understand why my function won't run. It starts by loading a library, then the rest of the actions are included in one function. When I cmd+enter the function it only returns the text in blue in the console, but doesn't actually run anything. Therefore, I assume if sent to the cluster it would also do nothing. But...why?

For example, if I want the first part called "buildindex" to run I need to manually activate it. But if it is called solely through the function nothing happens. Please help me understand what I need to fix. This code was given to me by a postdoc who is too busy to help me with these problems.

library(Rsubread)

analyzeRNASeq <- function(){
  
  buildindex(basename="/Users/iRebecca/Box/BEC_FILES/GENOMES/GC_", reference="/Users/iRebecca/Box/BEC_FILES/GENOMES/GCF_000006845.1_ASM684v1_genomic.fna.gz")
  
  filePath <- "/Users/iRebecca/Box/BEC_FILES/GENOMES/file_list.csv"
  fileNames <- read.table(filePath, header=TRUE, sep=",", quote="", stringsAsFactors=FALSE, comment="")
  for(n in (1:nrow(fileNames))){

    align(index="/Users/iRebecca/Box/BEC_FILES/GENOMES/GC_",
      readfile1=fileNames[n,1],
      readfile2=fileNames[n,2],
      output_file=fileNames[n,3])

  outputData <- featureCounts(files=fileNames[n,3],annot.ext="/Users/iRebecca/Box/BEC_FILES/GENOMES/GCF_000006845.1_ASM684v1_genomic.gff.gz",
               isGTFAnnotationFile=TRUE,GTF.featureType="CDS",GTF.attrType="locus_tag")
  outputFilePath <- fileNames[n,4]
  write.table(outputData[1], file=outputFilePath, quote=FALSE, sep=",")
  }
}

This is what I see on the console when I cmd+enter "analyzeRNAseq" function. What does the + on each line mean??

> analyzeRNASeq <- function(){
+   
+   buildindex(basename="/Users/iRebecca/Box/BEC_FILES/GENOMES/GC_", reference="/Users/iRebecca/Box/BEC_FILES/GENOMES/GCF_000006845.1_ASM684v1_genomic.fna.gz")
+   
+   filePath <- "/Users/iRebecca/Box/BEC_FILES/GENOMES/file_list.csv"
+   fileNames <- read.table(filePath, header=TRUE, sep=",", quote="", stringsAsFactors=FALSE, comment="")
+   for(n in (1:nrow(fileNames))){
+ 
+     align(index="/Users/iRebecca/Box/BEC_FILES/GENOMES/GC_",
+       readfile1=fileNames[n,1],
+       readfile2=fileNames[n,2],
+       output_file=fileNames[n,3])
+ 
+   outputData <- featureCounts(files=fileNames[n,3],annot.ext="/Users/iRebecca/Box/BEC_FILES/GENOMES/GCF_000006845.1_ASM684v1_genomic.gff.gz",
+                isGTFAnnotationFile=TRUE,GTF.featureType="CDS",GTF.attrType="locus_tag")
+   outputFilePath <- fileNames[n,4]
+   write.table(outputData[1], file=outputFilePath, quote=FALSE, sep=",")
+   }
+ }

In my mind, once I enter this function it should start running on my laptop but it isn't. Please help.

kefir
  • 21
  • 5
  • 3
    A function definition doesn't run the function. Perhaps add `analyzeRNASeq()` as the last line of your script to run the function you just created. – Gregor Thomas Jan 04 '21 at 20:11
  • @GregorThomas wow I think it worked. I'm going to do a test run. Thank you so much, this is due to my noob code skills – kefir Jan 04 '21 at 20:16
  • 2
    Hi @kefir, welcome to Stack Overflow. Glad you solved your problem. For any future bioinformatics-related questions you might have, you will very likely get a better response over at https://bioinformatics.stackexchange.com/ – jared_mamrot Jan 04 '21 at 23:09

1 Answers1

1

As suggested in the comments, your code defines a function, that is, explain what it should do whenever it is used ("called"), but doesn't actually use it.

The syntax to define a function in R is the following:

function_name <- function(<zero or more parameters>){<instructions>}

After these lines of code are executed, you have something new in your R session (for instance function_name), which you can access later using its name.

I assume you are using RStudio. When you "cmd-enter" in the editor, you just execute the creation of the function. You can interpret what you see in the console as a form of copy-paste of the unit of code to be executed, where R puts "+" signs in front of lines that belong to an "incomplete" unit of code.

To execute the instructions inside the function, you should then "call" it, which is done as follows:

function_name(<arguments, if needed>)

In your case, the function has no parameters, so you just have to open and close parentheses after the function name, without providing arguments, to indicate that the instructions inside the function have to be executed.

bli
  • 7,549
  • 7
  • 48
  • 94