Questions tagged [agrep]

An approximate grep for fuzzy matching

agrep (approximate ) is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

89 questions
2
votes
0 answers

How to determine what profile is matching the best for a set of test cases?

I have a case where I have a description of requirements (profiles) and a number of cases (testcases) that I would like to match to the profiles. I have identified the function agrep, which seems to do some of the work. At least it figures out what…
Jochem
  • 3,295
  • 4
  • 30
  • 55
2
votes
1 answer

R: agrep error when replacing string with another string

after a lot of trial/error and the search function I am still somewhat clueless about an I-thought-simple-thing (as always, hrmpf): I have a column in a data frame x$question and within that column, there is an expression 'A/V' every once in a…
Al_
  • 89
  • 1
  • 7
2
votes
1 answer

issue with agrep

why does agrep find a match although I restrict max.distance to zero? adist does correctly tell me that I need two insertations... > agrep("ab", "abcd", max = list(del = 0, ins = 0, sub = 0), value = T) [1] "abcd" > drop(attr(adist("ab", "abcd",…
Kay
  • 2,702
  • 6
  • 32
  • 48
1
vote
1 answer

Understanding constraints in agrep fuzzy matching in R

This seems really simple but for some reason, I don't understand the behavior of agrep fuzzy matching involving substitutions. Two substitutions produce a match as expected when all=2 is specified, but not when substitutions=2. Why is this? # Finds…
Atakan
  • 416
  • 3
  • 14
1
vote
1 answer

Calling the agrep .Internal C function from Rcpp

In short: How can I call, from within Rccp C++ code, the agrep C internal function that gets called when users use the regular agrep function from base R? In long: I have found multiple questions here about how to invoke, from within Rcpp, a C or…
1
vote
3 answers

Grep a string with number greater than 45

I have multiple files in a directory. I want to extract each line in all the files containing which has integer value greater than 45. Currently, I am using : grep "IO resumed after" * Its displaying me all the files which this string "IO resumed…
Deadpool
  • 13
  • 1
  • 3
1
vote
1 answer

fuzzy Logic for a String in R

I have 2 dataframe: DF1 ID Address AB1 VILL +PO CHAPAR TAPUKADA ALWAR AB2 VILL WARD NO 02 THIKARIYA CHAND RAWAT JUNA PADA POST BADANA 0 SIROHI AB3 RAMKUMAR YADAV VILL KANSL 0 JAIPUR AB4 VILL KHERKI MUKKER POSTPANIYA PUTLI …
1
vote
1 answer

R Finding elements matching with each other within a vector

I have a list of addresses. These addresses were input by various users and hence there are lot of differences in the way a same address is written. For example, "andheri at weh pump house", "andheri pump house","andheri pump house(mt)","weh…
Apricot
  • 2,925
  • 5
  • 42
  • 88
1
vote
2 answers

Return vector of words matched with fuzzy matching

I am using agrepl() to filter a data.table by fuzzy matching a word. This is working fine for me, using something like this: library(data.table) data <- as.data.table(iris) pattern <- "setosh" dt <- data[, lapply(.SD, function(x)…
Jaccar
  • 1,720
  • 17
  • 46
1
vote
1 answer

Writing a script that uses agrep to loop through lines in a document one by one against lines in another document and getting a result

I am trying to write a script that uses agrep to loop through files in one document and match them against another document. I believe this might use a nested loop however, I am not completely sure. In the template document, I need for it to take…
kjustin9
  • 13
  • 5
1
vote
1 answer

duplicates in agrep function

I have the following code: x <- data.frame("SN" = 1:2, "Name" = c("aaa","bbb")) y <- data.frame("SN" = 1:2, "Name" = c("aa1","aa2")) x$partials<- as.character(sapply(x$Name, agrep, y$Name,max.distance = 1,value=T)) x The output is the…
evdokimos
  • 33
  • 6
1
vote
1 answer

using agrepl within a loop -- 'pattern' has length > 1 and only the first element will be used

I'm trying to go through a list of artists and albums, and get the audio features of each song of each album into a data frame (using spotifyr package). However, in my list, there are some misspellings of the album titles, so I'm trying to use agrep…
Evan O.
  • 1,553
  • 2
  • 11
  • 20
1
vote
1 answer

Grouping string variables from a dataframe by best string match to make subsets

I have a dataframe with a column with names of countries. Those names are written different even when they are the same country for example, there are differences in lower-upper cases, some letters missing, some extra letters and son on. So I need…
ronnyhdez
  • 21
  • 6
1
vote
1 answer

How to get around a bug in the agrep function regular expression logic in R?

So I've run into a small bug/feature in R where the agrep function does not accept the "|" character as valid regular expression logic (others have had this problem too), when used in the argument. I'm trying to do a fuzzy match of 30 different,…
Perf Gigi
  • 21
  • 1
1
vote
1 answer

Why does agrep in R not find the best match?

I am attempting string matching in R using the agrep command. However I am concerned that it stops when it finds a good match, rather than optimize to find the best one. Though it is possible my understanding of how it works is incorrect. My example…
LanieD
  • 30
  • 8