1

Please check the code below, I have created a data frame using three variables below, the variable "y123" computes the similarity between columns a2 with a1. The variable "y123" gives me total 16 values where every a1 value gets compared with a2. My need is that when a particular "a1" value is compared with a particular "a2" value, I want the corresponding "a3" value next to "a2" be displayed besides. So the result should be a data frame with column y123 and a second column with corresponding "a3" column appearing four times i.e 16 values. Thanks and please help.

library(stringdist)
library(RecordLinkage)
a1 = c(103,120,142,153)
a2 = c(113,453,142,102)
a3 = c("a1","b1","c1","d1")
a1 = as.character(a1)
a2 = as.character(a2)
a3 = as.character(a3)
a123 = data.frame(a1,a2,a3)
y123 = sapply(a1, function(i) RecordLinkage::levenshteinSim(i,a2))
b1 = c(y123)
b1

I need something list this:

new_data = data.frame(b1,new_column)
Ashmin Kaul
  • 860
  • 2
  • 12
  • 37
  • Maybe add an example of how the result data.frame would look like? I am a bit confused when you say column y123 because that is a data.frame with multiple columns. – LyzandeR Dec 07 '17 at 11:08
  • @LyzandeR, thanks for replying, I have made it very clear for you now. – Ashmin Kaul Dec 07 '17 at 11:11

1 Answers1

0

I think this is what you want. I modified your sapply function:

data.frame(y123 = c(y123), a3 = rep(a3, times = length(a3)))
#        y123 a3
#1  0.6666667 a1
#2  0.3333333 b1
#3  0.3333333 c1
#4  0.6666667 d1
#5  0.3333333 a1
#6  0.0000000 b1
#7  0.3333333 c1
#8  0.3333333 d1
#9  0.3333333 a1
#10 0.0000000 b1
#11 1.0000000 c1
#12 0.6666667 d1
#13 0.6666667 a1
#14 0.6666667 b1
#15 0.3333333 c1
#16 0.3333333 d1
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • this is good, however, using a function will make it very slow as I need to apply your logic on a huge dataset. Also, the data frame will have y123 column, and "a1","a2","a3","a4" occuring one after the other four times. – Ashmin Kaul Dec 07 '17 at 11:37
  • I simplified it then. This will be really fast too. – LyzandeR Dec 07 '17 at 11:41
  • Hmm, you can do a small tweak with a3 column here, hard coding is good in case of small datasets, I would be using your logic for large data, sm way of parametrizing it should do? Something similar like subset() function if you can check. – Ashmin Kaul Dec 07 '17 at 11:44
  • Do you mean for the 4 there?. `times` argument would be the length of `a3` i.e. `times = length(a3)` – LyzandeR Dec 07 '17 at 11:47
  • Thanks, I'll use this logic. – Ashmin Kaul Dec 07 '17 at 11:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160763/discussion-between-ashmin-kaul-and-lyzander). – Ashmin Kaul Dec 08 '17 at 05:32
  • Hi can you please check this post in which I am facing issue.https://stackoverflow.com/questions/47920689/displaying-data-in-the-chart-based-on-plotly-click-in-r-shiny/47921511#47921511 – Ashmin Kaul Dec 22 '17 at 14:11