38

I have a data frame

ID, VID
 1 , xyz-0001

I would like to replace VID with md5 hash of VID column value.

How would i do that in R? I looked at digest package but can not figure out how to put that in R Code

Thanks

redmode
  • 4,821
  • 1
  • 25
  • 30
user3056186
  • 839
  • 1
  • 11
  • 16

3 Answers3

42

Package digest absolutely suitable for this task, so firstly we load it:

library(digest)

Then create/load/etc. test data.frame df:

txt <-
"ID,VID
1,xyz-0001
2,abc-0987"

df <- read.table(header=T, text=txt, sep=",", stringsAsFactors=F)
df

The initial data looks like:

  ID      VID
1  1 xyz-0001
2  2 abc-0987

Then we can use function digest with specified algorithm:

df$VID <- sapply(df$VID, digest, algo="md5")
df

Now we have hashed column VID in df:

  ID                              VID
1  1 44e3a9cf85f802ef50f18e64e01c5e32
2  2 c576ff180b2046c1a3ae939766588fd3
redmode
  • 4,821
  • 1
  • 25
  • 30
30

With an addition to redmode's answer:

library(digest)
txt <- "hello world"
hash <- digest(txt, algo="md5", serialize=F)
hash

[1] "5eb63bbbe01eeed093cb22bb8f5acdc3"

Setting serialize option to FALSE makes your results consistent with what you would get from online hash generators such as this or this.

agondiken
  • 863
  • 1
  • 11
  • 17
20

Another option is install the openssl package and use its MD5 hashing function. It's a vectorised function so unlike with digest so you won't have to use sapply on it.

library(openssl)

df$VID <- md5(df$VID)

This will replace the characters in the VID column with their MD5 hashed equivalents.

Note: This function requires data to be a character type so if you want to use this on a column of integers you will need to convert them to characters with the as.character function first.

Derwin Brennan
  • 424
  • 5
  • 5