0

I have a data-frame with 2 columns that contains two different types of text

The first column contains codes that are strings in the form of DD-HI-HO (DD being the code)

Column 2 is free text which anyone can insert I am trying to populate the third column based on three statements which use the logic below to give a single vector column of 1 or 0

i don't seem to be able to update a vector column to incorporate all three rules. Below is Pseudo code

Basic info: Codes is a vector (basically a reference table with one column) Fuzzy is a vector (basically another reference table with one column)

#----CHECK SEQUENCES----
# Check if code is applied in column 1
  Data$Has.Code <- grepl(pattern = "(HC|HD|HE|HK|HM|HH|HY|HL)", Data.Raw$Col1)

# Check if string contains relevant text in col 2
  Data$Has.DG <- if(length(intersect(Codes, Data$Contents)) > 0) {1}
# Check how closely Strings are related. Take the highest match If its over 45% then set flag as 1
  levenshteinSim(Fuzzy ,Data$Contents)



-------Added Table with sample data
Col1, Col2, Col3
1.HC-IE, Ice-cream, 1
2.IE-GB, Volvo, 0
3,IE-DE, Iced_Lollipop, 1

Record 1, Rule number 1 would catch "HC" in Col1 and so set Col 3 to 1 (boolean) Rule number 2 would also catch something in Col2 for record 1 as the vector Codes contains "Ice" as an element. It wouldn't execute in any case because Rule one supercedes it

Record 2 None of the rules would return anything for the second item so col 3 is set to 0

Record 3 A bit of a daft example but the levenschtein distance computes a 75% similarity between Col 2 and one of the elements in the vector Fuzzy. This is above our stated threshold so col 3 is set to 1

Can anyone help

Thank you for your help

John Smith
  • 2,448
  • 7
  • 54
  • 78
  • 1
    Can you provide a few rows of your source data and the expected output? – Minnow Mar 11 '15 at 13:42
  • `?ifelse` might be what you need. Not sure what your first line of code has to do with the rest. – Zelazny7 Mar 11 '15 at 14:15
  • Hi has anyone had any luck with this...I think I have a way of doing it with for loops but I was hoping there would be a more clever way...I will post the answer when I am satisfied with the code at the end of the week – John Smith Mar 15 '15 at 07:00

0 Answers0