I have two databases, one designated data and another data1 (reference), where I want to compare the codes of each data designation and data2, I have to do it by writing the designations, if they are written the same or similar, I have to have the same code, but he can find more than one line of the dictionary database with the same writings and happen when that wants him to compare the code with the one that has equal or closer words length.
> dados=data.frame(designacao = c("arroz","arroz agulha","arroz agulha","arroz grao medio", "arro","arroz medio","Leite pasteurizado meio gordo"),
+ codigo = c("11111","11111","11111","11112","11111","11114","1141204"))
> dados1=data.frame(designacao = c("arroz","arroz grao medio longo","arroz grao medio", "Leite pasteurizado meio gordo"),
+ codigo = c("11111","11113","11112","1141202"))
> dados
designacao codigo
1 arroz 11111
2 arroz agulha 11111
3 arroz agulha 11111
4 arroz grao medio 11112
5 arro 11111
6 arroz medio 11114
7 Leite pasteurizado meio gordo 1141204
> dados1
designacao codigo
1 arroz 11111
2 arroz grao medio longo 11113
3 arroz grao medio 11112
4 Leite pasteurizado meio gordo 1141202
Three possible cases to find: - only one line with the maximum number of words - more than two lines with maximum number of words but different lengths: when this happens, take the line with the closest word length.
- more than two lines with the maximum number of words, but equal lengths: when this happens, compare the data designation code with any of the codes on the lines with the maximum number of words and check if the data designation has any code.
> dados
designacao codigo resultado_codigo
1 arroz 11111 Codigo correto
2 arroz agulha 11111 Codigo correto
3 arroz agulha 11111 Codigo correto
4 arroz grao medio 11112 Codigo correto
5 arro 11111 Codigo correto
6 arroz medio 11114 codigo invalido, revise o nome da designacao ou o código
7 Leite pasteurizado meio gordo 1141204 codigo invalido, revise o nome da designacao ou o código