Questions tagged [agrep]

An approximate grep for fuzzy matching

agrep (approximate ) is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

89 questions
1
vote
1 answer

agrep working with del, ins arguments

How can "abteam" with "ab" be matched using this code? agrep("abteam",c("acb","abd","ab"),value=T,ignore.case = TRUE,max = list(del = 10, ins = 10, sub = 10)) The result is character(0), though I specified del=10, ins=10. What is the…
Kavipriya
  • 441
  • 4
  • 17
1
vote
1 answer

Multiple using of Java ProcessBuilder for agrep.exe

My Java program needs to launch agrep.exe with parameters for all pairs of elements in a big matrix and get number of matching errors of two stings. I've wrote a code, but it runs very slowly. Can I speed up this part of code? Or, maybe, you can…
logumanov
  • 139
  • 2
  • 11
1
vote
1 answer

How do I match a column of a dataframe of a particular length with another vector which has certain key-words to match to?

My dataframe Expenses is as shown below : date name expenditure type 23MAR2013 KOSH ENTRP 4000 COMPANY 23MAR2013 JOHN DOE 800 INDIVIDUAL 24MAR2013 S KHAN 300 …
sunitprasad1
  • 768
  • 2
  • 12
  • 28
1
vote
2 answers

faster way to agrep? Quickly find every single character mis-match

I am looking for the fastest way to find every single character mis-match between every word in a large file. If I have this: AAAA AAAB AABA BBBB CCCC I would like to get something like this: AAAA - AAAB AABA AAAB - AAAA AABA -…
user1168246
  • 33
  • 1
  • 5
1
vote
1 answer

In R, how do I use fuzzy matching to search for multiple patterns?

I have a survey dataset in which respondents described the location of their activity, usually as a town or city name. I want to identify each unique mention of the named cities and count the number of times each city was mentioned. The final output…
Marcos
  • 444
  • 4
  • 9
1
vote
0 answers

R functions with compiled C code slow

the agrep function in R is based on C code and is executed as such. However, I notice a significant (order of magnitude) performance gap between executing agrep from within R as compared to a direct system call to the command line executable of…
Markus Loecher
  • 367
  • 1
  • 16
0
votes
0 answers

Count word occurrences in a new column in R

How should I use 'agrep' function in a loop to count the 'number of mentions' in the reviews about attraction_name? Dataset_1 has 10k reviews and Dataset_2 has 90 attraction_names. Dataset_1$review_text <- c("I like park_1 compared to park_2",…
Cheese S
  • 1
  • 1
0
votes
1 answer

Filter tibble column to only include values found in separate tibble

Question I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener). How do I filter this tibble (call it symbolData) for only the companies listed in a second,…
SourceCoda
  • 105
  • 9
0
votes
2 answers

match two vectors by similar characters/strings in R

I have two vectors, like v1<-c("yellow", "red", "orange", "blue", "green") v2<-c("blues", "redx", "grean") and I want to match them, i.e., to "link" each element of v1 with the most similar element on v2, so that the result is > df v1 v2 1…
MDSF
  • 123
  • 6
0
votes
1 answer

In R, use if loop with agrep to assign value

The pattern list looks like: pattern <- c('aaa','bbb','ccc','ddd') X came from df looks like: df$X <- c('aaa-053','aaa-001','aab','bbb') What I tried to do: use agrep to find the matching name in pattern based on df$X, then assign value to an…
0
votes
0 answers

Why do I not get same fuzzy match results when I flip strings in pattern and x in R's agrep?

So the following command gives me a match: agrep(pattern = 'Ned Stark', x = 'Ned Stark**DUPL ENTRY', max.distance = 0.15, costs = NULL, ignore.case = TRUE, value = FALSE, fixed = TRUE, useBytes = FALSE) But when I flip the two strings, then I no…
Scratch
  • 57
  • 1
  • 1
  • 6
0
votes
0 answers

agrep not working when have string with spaces, for stringmatching

I have string called 'fish cakes'. I have an eligible dictionary of words, that contains "lemon", and "fish". I want agrep to match fish cakes to string fish in the eligible dictionary. But it won't work. It'll match fish with fish cakes. I want…
0
votes
1 answer

Apostrophes and optional argument (?) in grep vs agrep

When I run the below 4 lines of code I dont get the same result from all 4. Why is the last line not finding a match? grep("CPA's", c("CPA's")) agrep("CPA's", c("CPA's")) grep("CPA'?s?", c("CPA's")) agrep("CPA'?s?", c("CPA's")) I haven't yet done…
BeerSharkBot
  • 171
  • 4
0
votes
0 answers

agrep insists on matching millisecond (and not just second)

I'm trying to get agrep to match seconds to second and not millisecond, but there doesn't seem to be any value of costs to accomplish this. I am especially confused that there's no value of the cost for deletions/insertions that seems to do the…
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
0
votes
1 answer

shell - display number of errors for best matches in agrep

What I am trying to do is to get the best-matching word in a file and the number of errors for it using agrep. For now I am only able to get the word using this script: array=(bla1 bla2 bla3) for eachWord in "${array[@]}"; do result=$(yes "yes" |…
Victoria
  • 159
  • 2
  • 2
  • 13