Questions tagged [agrep]

An approximate grep for fuzzy matching

agrep (approximate grep) is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the unix operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

89 questions

votes

2 answers

Fuzzy matching strings within a single column and documenting possible matches

I have a relatively large dataset of ~ 5k rows containing titles of journal/research papers. Here is a small sample of the dataset: dt = structure(list(Title = c("Community reinforcement approach in the treatment of opiate addicts", "Therapeutic…

r data.table matching sapply agrep

asked Mar 23 '21 at 01:39

jmogil

votes

1 answer

What is the logic of approximate string matching?

Does anybody know what is the reason for the following example: agrepl("cold", "cool") #> [1] FALSE agrepl("cool", "cold") #> [1] TRUE

r agrep approximate

asked Feb 28 '20 at 13:39

Mohieddin Jafari

votes

1 answer

Extract substring match from agrep

My Goal is to identify whether a given text has a target string in it, but i want to allow for typos / small derivations and extract the substring that "caused" the match (to use it for further text analysis). Example: target <- "target string" text…

r levenshtein-distance agrep

asked Nov 18 '19 at 12:47

Tlatwork

1,445
12
35

votes

3 answers

How to fix error agrep: pattern too long (has > 32 chars) it doesn't show error if there is no full stop in the string?

agrep gives the error agrep: pattern too long (has > 32 chars) when there is a full stop(.) in the pattern string but not otherwise. I want to compare(approximately) two strings, so I'm using agrep for that but its giving an error agrep: pattern too…

bash agrep

asked Aug 17 '19 at 06:36

Manik

votes

3 answers

Identify fuzzy duplicates from a single column and create a subset containing records of fuzzy duplicates using R

I have a dataset which contains a field with individual's name. Some of the names are similar with minute differences like 'CANON INDIA PVT. LTD' and 'CANON INDIA PVT. LTD.', 'Antila,Thomas' and 'ANTILA THOMAS', 'Z_SANDSTONE COOLING LTD' and…

r duplicates levenshtein-distance fuzzy-comparison agrep

asked Jul 12 '19 at 12:52

Jazz

votes

1 answer

R: Fuzzy merge using agrep and data.table

I try to merge two data.tables, but due to different spelling in stock names I lose a substantial number of data points. Hence, instead of an exact match I was looking into a fuzzy merge. library("data.table") dt1 = data.table(Name = c("ASML…

r data.table agrep

asked Sep 19 '18 at 09:38

Hjalmar

votes

2 answers

Partial string matching in R and trim the characters

Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "mnopqr", "qrst")) vec <- c("ab", "mnop", "ijk") Now, for all the values in var1 that matches closest (I would like to match the first n characters) with the values…

r string-matching fuzzy-search agrep fuzzyjoin

asked Jun 28 '18 at 17:26

Geet

2,515
2
19
42

votes

1 answer

fuzzy string matching with agrep()

I´m matching a list of company names against itself with R and agrep() because the data was stored wrong in a legacy system - No 4th normal form, companys were recorded on the same level as customers, which means a new company entry for every new…

r string-matching fuzzy-search fuzzy agrep

asked Dec 18 '17 at 10:29

Salfii

votes

2 answers

Alternative approach to using agrep() for fuzzy matching in R

I have a large file of administrative data, about 1 million records. Individual people can be represented multiple times in this dataset. About half the records have an identifying code that maps records to individuals; for the half that don't, I…

r string-matching agrep

asked Jul 28 '17 at 02:33

edstatsuser

votes

1 answer

Fuzzy mapping in R

I am trying to use agrep command for fuzzy matching. I have a data frame in which one column contains the audience response and another dataframe in which segment and subsegment are listed. the column audience response contains the words that are…

r matching fuzzy agrep

asked Mar 09 '17 at 06:21

Shaz

votes

1 answer

R agrep() function behaviour

I have some trouble to understand the result of agrep() function. I don't understand what I have missed in the description of the function. agrep() is for fuzzy matching and I'd like to use it to correct some misspelling. I'd like to allow only a…

r string-matching agrep

asked Jan 07 '16 at 21:56

Vivien

votes

0 answers

R: slow fuzzy matching with agrep

I have a vector of patterns and a large vector of potential match candidates. For each element in x I use agrep to obtain a list of close matches in y. Problem is that the code is very slow - it takes approximately 2 seconds per each element from x.…

r fuzzy-search agrep

asked Oct 23 '15 at 01:48

Alexey Ferapontov

5,029
4
22
39

votes

1 answer

agrep string matching in R

I have two list of some product names. My problem is "Operating system" is matching with "system", "cooling system",etc. But it has to match only with "Operating","OS". Another example is "Key Board" should be matched with "key" or "KB" but not with…

r string-matching tm agrep qdap

asked Jun 23 '15 at 08:33

Kavipriya

votes

6 answers

SQL: match a string pattern irrespective of it's case, whitespaces in a column

I need to find the frequency of a string in a column, irrespective of its case and any white spaces. For example, if my string is My Tec Bits and they occur in my table like this, as shown below : 061 MYTECBITS 12123 102 mytecbits 24324 103…

mysql sql regex agrep

asked Apr 29 '15 at 09:56

sunitprasad1

votes

2 answers

Approximate string matching with a letter confusion matrix?

I'm trying to model a phonetic recognizer that has to isolate instances of words (strings of phones) out of a long stream of phones that doesn't have gaps between each word. The stream of phones may have been poorly recognized, with letter…

grep string-matching agrep

asked Apr 23 '10 at 22:26

a_cactus_on_the_stair

Prev 1

3 4 5 6 Next