-1

I need read a csv file and fill the empty/null values in column "Phone and Email" based on the person's address and write to a new csv file. Ex: if a person "Jonas Kahnwald" doesn't have the phone number or an email address but has the same address as the person above or below, say "Hannah Kahnwald", then we should fill the empty/null values with those person's details.

I won't be able to use python pandas as the rest of the code/programs are purely based on python 2.7 (unfortunately), so I just need to write a function or a logic to capture this information alone.

Input format/table looks like below with empty cells (csv file):

FirstName,LastName,Phone,Email,Address
Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,,145han street
Jonas,Kahnwald,,,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,,,887neil ave
katharina,Nielsen,,,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,,,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,,,828alex street

Output format should be like below:

Hannah,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Micheal,Kahnwald,6231897383,hannkahn@gmail.com,145han street
Jonas,Kahnwald,1457871452,hannkahn@gmail.com,145han street
Mikkel,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Magnus,Nielsen,4509213887,magnusneil@kyle.co,887neil ave
Ulrich,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
katharina,Nielsen,4509213887,mikneil@yahoo.com,887neil ave
Elisabeth,Doppler,5439001211,elsisop@amaz.com,211elis park
Peter,Doppler,5439001212,elsisop@amaz.com,211elis park
bartosz,Tiedmannn,6263172828,tiedman@skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,6263172828,tiedman@skype.com,828alex street
import csv,os

def get_info(file path):
    data = []
    with open(file, 'rb') as fin:
        csv_reader =  csv.reader(fin)
        next(reader)
        for each in csv_reader:

            FirstName = each[0]
            LN = each[1]
            Phone =  "some function or logic"
            email = " some function or logic"
            Address = each[4]
            login = ""
            logout = ""

            data.append([FirstName,LN,Phone,email,Address,login,logout])

   f.close()
   return data
Roy
  • 11
  • 1
  • 7
  • You haven't really shown the actual contents of the csv file, but your question seems closely related to [csv reader behavior with None and empty string](https://stackoverflow.com/questions/11379300/csv-reader-behavior-with-none-and-empty-string). – martineau Apr 17 '20 at 00:20
  • there replace "none" by an empty string say "" or '' (nothing between ' and ' or between " and ") – Narcisse Doudieu Siewe Apr 17 '20 at 00:20
  • @martineau thank you for the edit. Its my second post and I am a beginner in python and stackoverflow, so I am having trouble positing in a right way. My actual contents of the CSV file(separated by commas) looks exactly same. Though it contains more than 5 columns, but the above mentioned columns are the ones which I need to extract by filling the none/null/empty values with logic I referred in the body. I don't have an issue reading or writing to csv file. I can't use pandas(ffill,bfill) or excel. I hope I was clear and answered your question. – Roy Apr 17 '20 at 01:00
  • Roy: You're welcome — edit was no big deal. Note that another reason to include sample data in your question is so others don't have to create their own sample themselves if they want to test their answer to your question. Since csv file are text, doing it should mostly just be a cut & paste operation. – martineau Apr 17 '20 at 01:06
  • @martineau oh I got it now, do u mean that I can attach the CSV file which I am actually working? I don't know if that option was there. If it's there I would surely attach it. – Roy Apr 17 '20 at 01:17

1 Answers1

0

Here's a significantly updated version that attempts to fill in missing data from other entries in the file, but only if they have the same Address field. To make the searching faster it builds a dictionary for internal use called attr_dict which contains all the records with a particular address. It also uses namedtuples internally to make the code a little more readable.

Note that when retrieving missing information, it will use the data from the first entry it finds stored in this internal dictionary at the Address. In addition, I don't think the sample data you provided contains every possible case, so will need to do additional testing.

import csv
from collections import namedtuple


def get_info(file_path):

    # Read data from file and convert to list of namedtuples, also create address
    # dictionary to use to fill in missing information from others at same address.
    with open(file_path, 'rb') as fin:
        csv_reader =  csv.reader(fin, skipinitialspace=True)

        header = next(csv_reader)
        Record = namedtuple('Record', header)

        newheader = header + ['Login', 'Logout'] # Add names of new columns.
        NewRecord = namedtuple('NewRecord', newheader)

        addr_dict = {}
        data = [newheader]

        for rec in (Record._make(row) for row in csv_reader):
            if rec.Email or rec.Phone:  # Worth saving?
                addr_dict.setdefault(rec.Address, []).append(rec)  # Remember it.

            login, logout = "",  ""  # Values for new columns.
            data.append(NewRecord._make(rec + (login, logout)))

    # Try to fill in missing data from any other records with same Address.
    for i, row in enumerate(data[1:], 1):
        if not (row.Phone and row.Email):  # Info missing?
            # Try to copy it from others at same address.
            updated = False
            for other in addr_dict.get(row.Address, []):
                if not row.Phone and other.Phone:
                    row = row._replace(Phone=other.Phone)
                    updated = True
                if not row.Email and other.Email:
                    row = row._replace(Email=other.Email)
                    updated = True
                if row.Phone and row.Email:  # Info now filled in?
                    break

            if updated:
                data[i] = row

    return data


INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'

data = get_info(INPUT_FILE)

with open(OUTPUT_FILE, 'wb') as fout:
    writer = csv.DictWriter(fout, data[0])  # First elem has column names.
    writer.writeheader()
    for row in data[1:]:
        writer.writerow(row._asdict())

print('Done')

Screenshot of results in Excel:

Screenshot of results

martineau
  • 119,623
  • 25
  • 170
  • 301
  • I am sorry I still unable to figure out how to add the sample input data, I cant find an option to attach my csv file either. Thank you so much for your answer, it surely helps. I guess I was not clear in main part, is there anyway we wont fill the Alexander Washington's **Phone and email**? because his address doesn't match with any of them. I should fill the person's null/empty phone/email values, only if it matches with any other person's address. – Roy Apr 17 '20 at 03:17
  • By "same address as the person above or below" do you mean immediately above or below against everything in the file? It may require reading the entire file into memory to process it. – martineau Apr 17 '20 at 05:26
  • those are the two cases, 1) same address as the person above row or below row 2) or the matching address in the entire file. I have csv files which falls into both the categories, but if possible I would need second case scenario based script as it covers everything. (I changed the input and output file format, does that help?) – Roy Apr 17 '20 at 14:07
  • The edits you made actually hurt to some degree because you replaced what was in your question with samples of both the input and output instead of just of _adding_ a sample of the raw input as I asked. However your explanation of the two cases did clarify things. Since there's no way to know in advance which of them is being dealt with, it requires always checking against the entire file is needed. That isn't a bad as I thought because your code already did it. Moreover the code in my updated code goes even further and builds an auxiliary dictionary to make searching for matches faster. – martineau Apr 18 '20 at 01:32
  • Sorry for the late response and also for my edits which didn't help. Guess it will take time for me to understand these things. Your code worked perfectly fine. Its awesome. Though I am still in the process of understanding it. Thank you so much once again !! It helps me to challenge myself and learn more. – Roy Apr 19 '20 at 20:46
  • Roy: That's good to hear. Making the code check the contents of the entire file _did_ complicate things a bit, especially given a desire to implement it in a way that's as efficient as possible — the searching could have been very slow unless done properly. In this case creating an address dictionary avoids the worst of it I believe. – martineau Apr 19 '20 at 21:23