How to plot specific data points in a column in R script

Question

Imagine there are two columns, one for p-value and the other representing slope. I want to find a way to plot only the slope data points that have a significant p-value. Here is my code:

print("State the file name (include .csv)")
filename <- readline()
file <- read.csv(filename)

print ("Only include trials with p-value < .05? (enter yes or no)")
pval_filter <- readline()
if (pval_filter == "yes"){
   i <- 0
   count <- 0
   filtered <- NULL
   while (i > length(file$pval)){
      if (file$pval[i] < .05){
         filtered[count] <- i
         count <- count + 1
      }
      i <- i + 1
   }

   x <- 0
   while (x != -1){
      print("State the variable to be plotted")
      temp_var <- readline()
      counter <- 0
      var <- NULL
      while (counter > length(filtered)){
         var[counter] = file [, temp_var][filtered[counter]]
         counter <- counter + 1
         }

      print ("State the title of the histogram")
      title <- readline()
      hist(var, main = title, xlab = var)
      print("Enter -1 to exit or any other number to plot another variable")
      x <- readline()
    }
}

score 4 · Accepted Answer · answered Oct 21 '13 at 21:07

Isn't this much shorter and produces roughly the same:

df = read.csv('file.csv')
df = df[df$pval < 0.05,]
hist(df$value)

This should at least get you started.

Some remarks regarding the code:

You use a lot of reserved names (var, file) as an object name, that is a bad idea.
If you want the program to work with user input, you need to check it before doing anything with it.
There is no need to explicitly loop over rows in a data.frame, R is vectorized (e.g. see how I subsetted df above). This style looks like Fortran, there is no need for it in R.

score 2 · Answer 2 · answered Oct 21 '13 at 21:09

It is hard to tell exactly what you want. It is best if an example is reproducible (we can copy/paste and run, we don't have your data so that does not work) and is minimal (there is a lot in your code that I don't think deals with your question).

But some pointers that may help.

First, the readline function has a prompt argument that will give you better looking interaction than the print statements.

If all your data is in a data frame with columns p and b for p-value and slope then you can include only the b values for which p<=0.05 with simple subsetting like:

hist( mydataframe$b[ mydataframe$p <= 0.05 ] )

or

with( mydataframe, hist(b[p<=0.05]) )

Is that enough to answer your question?

yea. this is very similar to the answer above which works. Thanks! — kevin ko, Oct 23 '13 at 18:51

PascalVKooten · Answer 3 · 2013-10-21T21:13:36.270

1

Given that data = cbind(slopes, pvalues) (so col(data) == 2)

Like this:

plot(data[data[ ,2] < 0.05 , ])

Explanation:

data[ ,2] < 0.05 will return a vector of TRUE/FALSE with the length of the columns.

so then you will get:

data[c(TRUE, FALSE....), ]

From there on, only the data will be selected where it says TRUE.

You will thus plot only those x's and y's where the pvalue is lower than 0.05.

edited Oct 21 '13 at 21:13

answered Oct 21 '13 at 21:07

PascalVKooten

20,643
17
103
160

score 0 · Answer 4 · answered Oct 22 '13 at 05:46

Here is the code to plot only the slope data points with significant p-value: Assuming the column names of the file will be pval and slope.

# Prompt a message on the Terminal
filename <- readline("Enter the file name that have p-value and slopes (include .csv)")
# Read the filename from the terminal
file     <- read.csv(filename, header = TRUE)

# Prompt a message again on the Terminal and read the acceptance from user
pval_filter <- readline("Only include trials with p-value < .05? (enter yes or no)")    

if (to-lower(pval_filter) == "yes"){
   # Create a filtered file that contain only rows with the p-val less than that of siginificatn p-val 0.05
   file.filtered <- file[file$pval < 0.05, ]    

   # Get the title of the Histogram to be drawn for the slopes (filtered)
   hist.title <- readline("State the title of the histogram")
   # Draw histogram for the slopes with the title
   #     las = 2 parameter in the histogram below makes the slopes to be written in parpendicular to the X-axis
   #     so that, the labels will not be overlapped, easily readable. 
   hist(file.filtered$slope, main = hist.title, xlab = Slope, ylab = frequency, las = 2)
}

Hope this would help.

How to plot specific data points in a column in R script

4 Answers4