Questions tagged [agrep]

An approximate grep for fuzzy matching

agrep (approximate ) is a proprietary fuzzy string searching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

89 questions
0
votes
1 answer

tre agrep via gnuwin32 - newb getting no output from tre agrep

Admittedly I have not used agrep on *nix before, so this may just be a newb mistake on my part, but rather than spin up a *nix box and play there, I thought I might ask the smart people here first. I installed tre agrep via gnuwin32 on my Windows…
johnnygear
  • 93
  • 1
  • 6
0
votes
1 answer

R multiple fuzzy match agrep create variable

New to R. I would like to create a test by creating a variable (yes/no) that checks to see if first name OR last name fuzzy match to email address. If so, append a 'yes' variable to that row. Data Example: id firstname lastname email address match 1…
lmcshane
  • 1,074
  • 4
  • 14
  • 27
0
votes
0 answers

How to fuzzy match character strings of persons' names listed variously firstName lastName or lastName firstName and with misspellings

I have a dataset with 6 million court proceedings from 59 different immigration courts. Each record includes among other things an attorney code. However there are multiple codes associated with each unique attorney. And in another table that…
kmayeaux
  • 67
  • 4
0
votes
4 answers

Detect rows in a data frame that are highly similar but not necessarily exact duplicates

I would like to identify rows in a data frame that are highly similar to each other but not necessarily exact duplicates. I have considered merging all the data from each row into one string cell at the end and then using a partial matching…
Braden
  • 345
  • 5
  • 11
0
votes
2 answers

Merging datasets by name when names have different formats in R

I have two different dataframes in R that I am trying to merge together. One is just a set of names and the other is a set of names with corresponding information about each person. So say I want to take this first dataframe: Name 1. Blow, Joe 2.…
ModalBro
  • 544
  • 5
  • 25
0
votes
1 answer

R Relevant match between 2 huge data sets. Even with Spelling Mistakes

I have input "I am travelling on my own, I have just brought a world ticket to go to singapore, darwin, perth, adelaide, melbourne, brisbane, gold cost, sydney Opra, christchurch,gold coast Richland, Aukland,Austrlia, and fji. It is a 10 month…
user3619015
  • 176
  • 1
  • 1
  • 9
0
votes
1 answer

" 'pattern' must be a non-empty character string" error with agrep in R

I am receiving the following error: 'pattern' must be a non-empty character string when trying to run the following: rapply(as.list(Database1), function(x) agrep(x,Database2, max.distance=c(cost=1), value=T)) with large databases >…
tomathon
  • 834
  • 17
  • 32
0
votes
1 answer

Replace misspelled values with agrep

I have a dataset of restaurants and the variable "CONAME" contains the name of each establishment. Unfortunately, there are quite a few misspellings, and I'd like to correct them. I've tried agrep for fuzzy set matching using the following code…
0
votes
1 answer

xargs string used as an input for agrep

Guys I am using xargs to pass the input for a agrep.I am using xargs like the below Script: xargs -L 1 -I string echo "RequestId="string | xargs -L 1 -I string zcat FILEB | agrep -dEOE string Output till…
User
  • 401
  • 2
  • 8
  • 15
0
votes
1 answer

Make agrep output offset of the match

I'm trying to use agrep and I can do approximate matches but I need to know where the match starts and where it finishes. Are there any flags that would allow me to do that?
Thiago Moraes
  • 617
  • 1
  • 10
  • 22
-1
votes
1 answer

Identify similar names in same row, then choose Mode

My data includes a Name column. Some names are written in upto eight different ways. I tried grouping them with the following code: groups <- list() i <- 1 while(length(x) > 0) { id <- agrep(x[1], x, ignore.case = TRUE, max.distance = 0.1) …
-1
votes
1 answer

Fuzzy String Matching in R on numbers separated by hyphens

I am trying to match Cell Phone Tower IDs contained in one table with a master table of locations(in lat long) of Cell Phone Tower IDs. The format of IDs in the locations table are different from the ones in the first table and I am trying to use…
Dhiraj
  • 1,650
  • 1
  • 18
  • 44
-1
votes
1 answer

Use agrep to return a different variable

I'm doing a lookup from one table to another using agrep, but the results I want to return are not the values being matched. They're from another column/variable. My current agrep syntax: personalfolders$DOBMatch <- lapply(personalfolders$DOB,…
shmaxnow
  • 11
  • 1
  • 4
-1
votes
1 answer

R - Merging two data files based on partial matching of inconsistent full name formats

I'm looking for a way to merge two data files based on partial matching of participants' full names that are sometimes entered in different formats and sometimes misspelled. I know there are some different function options for partial matches (eg…
1 2 3 4 5
6