1

I have some transport data which I would like to perform a rowwise if comparison within a for loop. The data looks something like this.

# Using the iris dataset 
> iris <- as.data.frame(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Where the result would record the instances of sepal lengths with equal petal width in each species. Such that we record the pairs of sepal lengths with equal petal width (this is only an illustration having no scientific significance). Which would yield something like this:

Species Petal.Width Sepal.Length1 Sepal.Length2
setosa          0.2         5.1             4.9
setosa          0.2         5.1             4.7
setosa          0.2         4.9             4.7
setosa          0.2         5.1             4.6
...

My initial Python-ish thought was to perform a for loop within a for loop, looking something like this:

for s in unique(Species):
  for i in 1:nrow(iris):
    for j in 1:nrow(iris):
      if iris$Petal.Width[i,] == iris$Petal.Width[j,]:
        Output$Species = iris$Species[i,]
        Output$Petal.Width = iris$Petal.Width[i,]
        Output$Sepal.Length1= iris$Sepal.Length[i,]
        Output$Sepal.Length2= iris$Sepal.Length[j,]
    end
  end
end

I had thought about using group_by to classify Species first to achieve the first for loop for s in unique(Species):. But I don't know how to rowwise compare each observation in the dataset, and to store it like the second block of code. I have seen questions on for loops in dplyr and rowwise quantities. My apologies if the code above is not as clear. First time asking a question here.

HiChiu5493
  • 25
  • 5

1 Answers1

2

Using dplyr:

library(dplyr)    

iris %>%
      group_by(Species,Petal.Width) %>%
      mutate(n = n()) %>%
      filter(n > 1) %>%
      mutate(Sepal.Length1 = Sepal.Length,
             Sepal.Length2 = Sepal.Length1 - Petal.Width) %>%
      arrange(Petal.Width) %>%
      select(Species, Petal.Width, Sepal.Length1, Sepal.Length2)

This is grouping Species and Petal.Width, counting instances where they are the same, only selecting cases where there are more than 1 unique pairing, and then renaming Sepal.Length to Sepal.Length1, and creating a new variable Sepal.Length2 = Sepal.Length1 - Petal.Width

For recording Sepal.Length for each Species within a defined range:

minpw <- min(Petal.Width)
maxpw <- max(Petal.Width)

iris %>%
  group_by(Sepal.Length, Species, petal_width_range = cut(Petal.Width, breaks = seq(minpw,maxpw,by=0.2))) %>%
  summarise(count = n())
Matt
  • 7,255
  • 2
  • 12
  • 34
  • Thanks @Matt for the answer! What if I would like instead of pairs of equal `Petal.Width`, I would apply a range (say 0.2). So now pairs with `Petal.Width` of 0.2 and 0.4; or 0.4 and 0.6 would also satisfy the conditions. – HiChiu5493 Apr 25 '20 at 02:05
  • Maybe just because it's late on Friday, but I'm having trouble following. You want to group `Petal.Width` that falls within a specified range of an absolute difference of 0.2? That would mean that a `Petal.Width` of .4 would satisfy both conditions, is that correct? – Matt Apr 25 '20 at 02:21
  • Exactly. My apologies for not making it clearer. So in a sense, I want to find pairs with similar `Petal.Width` (a range of 0.2 here) in each `Species` and record their `Sepal.Length`. – HiChiu5493 Apr 25 '20 at 03:49
  • I updated the post with an attempt, but not sure if it's exactly what you're looking for. – Matt Apr 25 '20 at 04:17
  • Thanks for the update. I get what you are doing here and technically it does answer my question, but it's not what I was looking for. Would you please look at the revised question here - https://stackoverflow.com/questions/61421335/rowwise-operation-with-adaptive-range-using-dplyr – HiChiu5493 Apr 25 '20 at 05:28