Let the table be as follows:
v1 | v2 | v3 |
---|---|---|
A | B | A |
B | B | A |
A | C | |
D | C | D |
What i want R to create a table for number of occurences of unique values for each column:
v1 | v2 | v3 |
---|---|---|
A | 1 | 1 |
B | 1 | 2 |
C | 0 | 1 |
D | 1 | 0 |
Try table
like this
> table(unlist(df),names(df)[col(df)])
V1 v2 v3
A 1 1 2
B 1 2 0
C 0 1 1
D 1 0 1
> dput(df)
structure(list(V1 = c("A", "B", NA, "D"), v2 = c("B", "B", "A",
"C"), v3 = c("A", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-4L))
One option could be:
sapply(df, function(x) table(factor(x, levels = unique(unlist(df)))))
V1 v2 v3
A 1 1 2
B 1 2 0
D 1 0 1
C 0 1 1
To add to the set, a tidyverse version.
library(tidyverse)
df %>%
pivot_longer(
everything(),
values_to="Value",
names_to="Variable"
) %>%
group_by(Variable, Value) %>%
summarise(N=n(), .groups="drop") %>%
filter(!is.na(Value)) %>%
pivot_wider(values_from=N, names_from=Variable, values_fill=0) %>%
arrange(Value)
# A tibble: 4 x 4
Value v1 v2 v3
<chr> <int> <int> <int>
1 A 1 1 2
2 B 1 2 0
3 C 0 1 1
4 D 1 0 1
For the sake of completeness, here is an approach which uses a combination of melt()
and dcast()
:
library(data.table)
dcast(melt(setDT(df1), measure.vars = patterns("^v"))[value != ""], value ~ variable)
value v1 v2 v3 1: A 1 1 2 2: B 1 2 0 3: C 0 1 1 4: D 1 0 1
The approach is similar to Limey's answer in reshaping the data from wide to long and back to wide but less verbose.
Instead of dcast()
, table()
can be called after reshaping from wide to long:
melt(setDT(df1), measure.vars = patterns("^v"))[value != ""][
, table(value, variable)]
variable value v1 v2 v3 A 1 1 2 B 1 2 0 C 0 1 1 D 1 0 1
Note that data.table chaining is used here.
And, to save a few keystrokes:
melt(setDT(df1), measure.vars = names(df1))[value != ""][, table(rev(.SD))]
df1 <- fread("
|v1|v2|v3|
|A |B | A|
|B |B | A|
| |A | C|
|D |C | D|",
drop = c(1,5), header = TRUE)
We can use mtabulate
library(qdapTools)
t(mtabulate(df))
V1 v2 v3
A 1 1 2
B 1 2 0
C 0 1 1
D 1 0 1
df <- structure(list(V1 = c("A", "B", NA, "D"), v2 = c("B", "B", "A",
"C"), v3 = c("A", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-4L))