I would like to select specific rows in a dataframe when I get a value in some row. These selected lines (plus initial selected line) must compose a new dataframe and the dataframe name must be = $Name
in initial selected line.
The logic:
1 - The initial selected lines must have $FC
=> 0.7.
2 - The selected lines to form a dataframe must $chr
= to initial selected line.
3 - The selected lines must have $Position
inside 5000 window (comparing with $Position
in initial selected line).
*3a) In this example, the line $Name
= BD22 cannot be included in BD13
dataframe because $Position
is out of window number (5000 window to 3000 vary since $Position
= 500 until $Position
= 5500)
Follows above a simplified example:
My input input
dataframe:
Name FC chr Position
BD10 0.1 chr1 1000
BD11 0.1 chr2 1000
BD12 0.2 chr3 2000
BD13 0.7 chr3 3000
BD14 0.4 chr3 4000
BD22 0.1 chr3 7000
BD23 0.2 chr4 1000
I expect a dataframe with name line as output, in this example = BD13
:
Name FC chr Position
BD12 0.2 chr3 2000
BD13 0.7 chr3 3000
BD14 0.4 chr3 4000
After, I would like to plot each composed dataframe like this:
pdf(BD13.pdf)
plot(BD13$Name, BD13$FC, main="BD13",
xlab="Name", ylab="FC")
dev.off()
I have tried:
out <- subset(input, FC >= 0.7)
out$startw <- (out$Position - 2500)
out$endw <- (out$Position + 2500)
library(plyr)
lvl <- dlply(out, .(Name))
for (i in 1:length(lvl)) {
Neigh1 <- subset(input, input$Position >= lvl[i]$startw & lvl[i]$chr == input$chr)
Neigh2 <- subset(input, input$Position <= lvl[i]$endw & lvl[i]$chr == input$chr)
Neight <- rbind(Neigh1, Neigh2)
pdf(sprintf("%s.pdf", [i]))
boxplot(Neigh$Name, Neigh$FC, xlab=[i], ylab="FC", main="[i]")
dev.off()}
But Neigh1
and Neigh2
are empty...
Thank you!