Tail and Head of whole dataframe in R with column and row number

Question

Say I have a dataframe with 250 rows and 1000 columns. I want to find the 10 maximum values and the 10 minimum values, which I can easily to with tail and head function applying on the vectorised dataframe.

But how do I now which columns and rows the highest values are in?

Example:

df <- data.frame(c("house1", "house2", "house3"), c(2006, 2007, 2008), c(0.3, 0.5, 0.1), c(0.9, 0.4, 0.01))
colnames(df) <- c("house.num", "year", "data1", "data2")

> df
 house.num year data1 data2
1    house1 2006   0.3  0.90
2    house2 2007   0.5  0.40
3    house3 2008   0.1  0.01

I would like to have the 3 maximum values of the data1 and data2 columns and the information in which row and column these values are, so I know which house, which year and which datatype match the best.

Please share your expected output. – s_baldur Jan 23 '19 at 08:17 — s_baldur, Jan 23 '19 at 08:17

A. S. K. · Answer 1 · 2019-01-23T15:15:58.783

Edit: Removed the by-column grouping.

This may not be the format you're looking for, but here's a solution that will return results in "long" format: 20 rows, one for each of the top and bottom 10 records. Each row has the house number, year, row number, name of the column, and whether the record is in the top or bottom 10.

df = data.frame(house.num = paste("house", 1:250, sep = ""),
                year = sample(2000:2018, 250, replace = T))
for(i in 1:1000) {
  df[,paste("data", i, sep = "")] = rnorm(250)
}

library(dplyr)
library(tibble)
library(tidyr)
df.min.max = df %>%
  rownames_to_column("row.name") %>%
  gather(column.name, column.value, -row.name, -house.num, -year) %>%
  mutate(top.10 = rank(desc(column.value), "min") <= 10,
         bottom.10 = rank(column.value, "min") <= 10) %>%
  filter(top.10 | bottom.10)

s_baldur · Answer 2 · 2019-01-23T08:35:56.303

1

Using data.table

library(data.table)
setDT(df)
# Add row number 
df[, row := .I]
columns <- c("row", paste0("data", 1:2))
df[, ..columns]
melt(df[, ..columns], id.vars = "row", variable.name = "column")[order(-value)][1:3]
   row column value
1:   1  data2   0.9
2:   2  data1   0.5
3:   2  data2   0.4

edited Jan 23 '19 at 08:35

answered Jan 23 '19 at 08:30

s_baldur

29,441
4
36
69

score 0 · Answer 3 · answered Jan 21 '19 at 17:46

0

Try this:

library(data.table)
df <- data.frame(c("house1", "house2", "house3"), c(2006, 2007, 2008), c(0.3, 0.5, 0.1), c(0.9, 0.4, 0.01))
colnames(df) <- c("house.num", "year", "data1", data2")
setorder(df, -data1)

It will order your data.frame by data1 from largest to smallest and then you can use head and tail to get max/min 10 values. Do the same for data2

answered Jan 21 '19 at 17:46

LocoGris

4,432
3
15
30

Thank you! The issue is, that I don't have 2 data columns, but a thousand. I don't need the max. values per column, but per data matrix. – Ezra Jan 22 '19 at 08:44

Tail and Head of whole dataframe in R with column and row number

3 Answers3