Questions tagged [agrep]

An approximate grep for fuzzy matching

agrep (approximate ) is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

89 questions
0
votes
1 answer

R : Record Linkage problem with all fields combined in 1 column

I have to match column a from dataset A to column b in dataset B. But the different variables aren't in separate fields(columns a, b, c) but in the same one. I have been looking at packages RecordLinkage & fastLink they work great with the fields…
Yeshyyy
  • 669
  • 6
  • 21
0
votes
0 answers

Text matching using R when strings are dissimilar

I am trying to identify observations that match between two datasets, using text string vectors $contractor and $employer, and create a TRUE/FALSE indicator on whether the contractor is in the employer…
Eric
  • 1
  • 3
0
votes
0 answers

Looking for fuzzy match function, possible bug in R's agrepl

I'm trying to build a function that uses R's agrepl for approximate matching. I am using a regex pattern which from my perspective is not treated as regex. I came to this conclusion by running following test in my REPL: > patterns <-…
sgp667
  • 1,797
  • 2
  • 20
  • 38
0
votes
1 answer

Fuzzy match only if exact match doesn’t exist

I’m trying to write a function to get album data from Spotify’s API for a data frame of albums and artists. Because there are some misspellings in the dataset, I need to use a fuzzy matching function (like agrepl). However, some artists, like Absu,…
Evan O.
  • 1,553
  • 2
  • 11
  • 20
0
votes
1 answer

agrep output approximate macthing

Having agrep('timothy', c('timo','tim','timoth', 'timothys'), max.distance = 0.01, value=TRUE) I want to output the original string and all possible results together in a data frame as below. Original Replace1 Replace2 timothy timoth …
Rtab
  • 123
  • 10
0
votes
2 answers

How not to alter duplicate names with sapply?

I have a text vector with the names of drugs already registered, and another with the names of new drugs. I want to know whether the new drugs look like an already existing drug or not. For example, if supercure is a drug which can be producted…
Vincent
  • 955
  • 2
  • 15
  • 32
0
votes
2 answers

Referencing Elements in Nested for loops in R

I'm trying to write a loop to perform the following actions on a data frame: For every name in the 'Name' column, check to see if a rough match (accomplished with agrep() ) exists in the 'Referral' column. If a match exists, replace all cells in…
Will
  • 7
  • 1
  • 4
0
votes
1 answer

agrep function of R is not working for text matching

I am trying to match string using agrep function of R. I do not understand, why it's not returning any value. I am looking a solution which will give closed match of the given text. In the given example it should show "ms sharda stone crusher prop…
honey
  • 89
  • 2
  • 8
0
votes
1 answer

How to match a string with a tolerance of one character?

I have a vector of locations that I am trying to disambiguate against a vector of correct location names. For this example I am using just two disambiguated locations tho: agrepl('Au', c("Austin, TX", "Houston, TX"), max.distance = .000000001,…
Dambo
  • 3,318
  • 5
  • 30
  • 79
0
votes
0 answers

Name matching with different length data frames in R

I have two dataframes with numerous variables. Of primary concern are the following variables, df1.organization_name and df2.legal.name. I'm just using fully qualified SQL-esque names here. df1 has dimensions of 15 x 2700 whereas df2 has dimensions…
Zach
  • 1,316
  • 2
  • 14
  • 21
0
votes
2 answers

dplyr filter function in combination with agrep

I'm trying to filter only rows from my table that have the word "dog" in the title column but I cannot get it to work. Here's a data example: ID NozamaItemID NozamaTitle 1 4557 12000017544…
DirkLX
  • 1,317
  • 1
  • 10
  • 16
0
votes
1 answer

Element to save results with different length in R

I want to extract similar text strings using the agrep function and save them in a list or vector, but the result has different length (even replacement could has length zero),so I get an error. How can I define a list or vector in order to save the…
Israel
  • 260
  • 3
  • 15
0
votes
3 answers

strsplit with non-character data

1I want to do a strsplit on one variable ID1 to split into ID1_s1 and ID1_s2 and I need to get rid of the strings that are in brackets. # dummy data df1 <- data.frame(ID1=c("Gindalinc","Xaviertechnolgies","anine.inc(Nasq)","Xyzinc"),…
user3570187
  • 1,743
  • 3
  • 17
  • 34
0
votes
1 answer

Successively agrep names in a variable, then create a new variable with the shortest name for close matches

Assume a character vector of company names where the names come in various forms. Here is a small version of 10,000 row data frame; it shows the desired second vector ("two.names"). structure(list(firm = structure(1:8, .Label = c("Carlson Caspers",…
lawyeR
  • 7,488
  • 5
  • 33
  • 63
0
votes
1 answer

Partial Matching two data frames having a common column(by words) in R/Python

I have two dataframes as csv files where df1 has more rows than df2: Df1 Name Count xxx yyyyyy bbb cccc 15 fffdd 444 ggg 20 kkbbb ccc dd 29p 5 22 cc pbc2 kmn3 b23 efgh 4 ccccccccc…
warwick12
  • 316
  • 3
  • 12