10

I have a large vector variable holding exactly 5000 elements, and would like to know what these are, knowing that there are several repetitions. An introduction to R doesn't seem to say anything besides basic data structures, and I don't know if R offers this feature as built-in.

If there is no such data structure, is there some built-in function to filter out repeated elements in a vector or list?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Acsor
  • 1,011
  • 2
  • 13
  • 26
  • if your vector or list is x, unique(x) returns the unique values. In the case of a dataframe it returns unique rows. – Florian Jul 21 '17 at 10:36
  • If by "data structure" you mean how many elements are for each value, then you could try this `library(data.table); dt <- data.table(v1=c(rep(1,2500),rep(2,2500))); dt[,.N,v1]` – quant Jul 21 '17 at 10:41
  • What if I want to have absence of duplicated elements even while I'm adding into a vector, list or some other data structure? From what I have learned so far it doesn't seem easy to dynamically extend data structures, but to me the problem still applies. – Acsor Jul 21 '17 at 10:42
  • You could use a hashmap, whose keys are basically a set, if that might help. – Tim Biegeleisen Jul 21 '17 at 10:43
  • @quant By the way, the term *data structure* seems to be part of the standard jargon in R, at least from what I read. Section 2.1 of *An introduction to R* reads exactly «R operates on named *data structures*. [...]». Am I allowed to remove the quotes to *data structure* inside my question? – Acsor Jul 21 '17 at 10:47
  • No need for quotes. If you have items that are repeated many times, then `table` can be pretty useful. When fed a single vector, it returns a table with the original values as names and the counts of each element as values. – lmo Jul 21 '17 at 14:23

3 Answers3

6

To remove multiple occurrences of a value within a vector use duplicated()

an example would be

x <- c(1,2,3,3,4,5,5,6)
x[!duplicated(x)]
# [1] 1 2 3 4 5 6

This is returning all values of x which are not (!) duplicated.

This will also work for more complex data structures like data.frames. See ?duplicated for further information.

unique(x) provides all values occurring in the vector.

table(x) shows the unqiue values and their number of occurrences in vector x

table(x)
# x
# 1 2 3 4 5 6 
# 1 1 2 1 2 1 
loki
  • 9,816
  • 7
  • 56
  • 82
  • The implemetion above is not full represantion of set, as it should possess proporty that set(1,2)=set(2,1), order is not important. – Artem Jan 13 '23 at 13:17
3

I'd also recommend looking into the sets library. Install it with install.packages('sets') and see, if the following works for you.

sets::as.set(c(1, 1, 2, 4, 3, 5, 5, 5))
# output: {1, 2, 3, 4, 5}
Felix Jassler
  • 1,029
  • 11
  • 22
2

unique() function will work.

unique("name of vector")
desertnaut
  • 57,590
  • 26
  • 140
  • 166
PritamJ
  • 337
  • 4
  • 10