-1

Say I have a dataframe with 250 rows and 1000 columns. I want to find the 10 maximum values and the 10 minimum values, which I can easily to with tail and head function applying on the vectorised dataframe.

But how do I now which columns and rows the highest values are in?

Example:

df <- data.frame(c("house1", "house2", "house3"), c(2006, 2007, 2008), c(0.3, 0.5, 0.1), c(0.9, 0.4, 0.01))
colnames(df) <- c("house.num", "year", "data1", "data2")

> df
 house.num year data1 data2
1    house1 2006   0.3  0.90
2    house2 2007   0.5  0.40
3    house3 2008   0.1  0.01

I would like to have the 3 maximum values of the data1 and data2 columns and the information in which row and column these values are, so I know which house, which year and which datatype match the best.

Ezra
  • 159
  • 1
  • 10

3 Answers3

1

Edit: Removed the by-column grouping.

This may not be the format you're looking for, but here's a solution that will return results in "long" format: 20 rows, one for each of the top and bottom 10 records. Each row has the house number, year, row number, name of the column, and whether the record is in the top or bottom 10.

df = data.frame(house.num = paste("house", 1:250, sep = ""),
                year = sample(2000:2018, 250, replace = T))
for(i in 1:1000) {
  df[,paste("data", i, sep = "")] = rnorm(250)
}

library(dplyr)
library(tibble)
library(tidyr)
df.min.max = df %>%
  rownames_to_column("row.name") %>%
  gather(column.name, column.value, -row.name, -house.num, -year) %>%
  mutate(top.10 = rank(desc(column.value), "min") <= 10,
         bottom.10 = rank(column.value, "min") <= 10) %>%
  filter(top.10 | bottom.10)
A. S. K.
  • 2,504
  • 13
  • 22
1

Using data.table

library(data.table)
setDT(df)
# Add row number 
df[, row := .I]
columns <- c("row", paste0("data", 1:2))
df[, ..columns]
melt(df[, ..columns], id.vars = "row", variable.name = "column")[order(-value)][1:3]
   row column value
1:   1  data2   0.9
2:   2  data1   0.5
3:   2  data2   0.4
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

Try this:

library(data.table)
df <- data.frame(c("house1", "house2", "house3"), c(2006, 2007, 2008), c(0.3, 0.5, 0.1), c(0.9, 0.4, 0.01))
colnames(df) <- c("house.num", "year", "data1", data2")
setorder(df, -data1)

It will order your data.frame by data1 from largest to smallest and then you can use head and tail to get max/min 10 values. Do the same for data2

LocoGris
  • 4,432
  • 3
  • 15
  • 30
  • Thank you! The issue is, that I don't have 2 data columns, but a thousand. I don't need the max. values per column, but per data matrix. – Ezra Jan 22 '19 at 08:44