0

I'm a super python noob.

I am trying to determine the metaphone code for a list of names. These codes will later on be compared to find potential similar-sounding names.

The jellyfish module suits my needs, and I am able to get the metaphone code when I create a list, as follows:

import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
        print(i, "metaphone value =", jellyfish.metaphone(i))

##OUTPUT: 
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR

However I need to get the metaphone code for a list of ~3000 names. I created a .csv with the column headers I need and the existing list of names. It looks like this:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,

So ideally I need FirstWordMeta = metaphone code for the word in the FirstWord column for each row and StMeta = metaphone code for the word in the ST_NAME column for each row. I would want the output .csv to look like this:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS

I've tried the csv module but I don't understand how to incorporate referencing the specific column when using jellyfish.metaphone()

asasBonny
  • 3
  • 1

2 Answers2

0

You can use the pandas module:

import pandas as pd
import jellyfish

data = pd.read_csv("test.csv")  # Your filename here

# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
    data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
    data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])

# Save to csv
data.to_csv("result.csv")
0

You can try this:

import csv
import jellyfish

with open('input.csv') as inputfile:
    reader = csv.reader(inputfile)
    headers = next(reader)
    inputdata = list(reader)

with open('output.csv', 'w') as outputfile:
    writer = csv.writer(outputfile)
    writer.writerow(headers)

    for row in inputdata:
        outputrow = row[:3] + [
            jellyfish.metaphone(row[2]),
            jellyfish.metaphone(row[1])
        ]    
        writer.writerow(outputrow)
Nikos Oikou
  • 231
  • 1
  • 5