I have a python code and i need to use mrjob to make my python script more faster.
How do I make below script to use mrJob?
the below script works fine for small file, but when i run large file it takes forever. so I am planning to use mrJob which is a mapReducer python package. So, problem is : I dont know how to use mrJob for this script, please advise?
import os
import pandas as pd
import pyffx
import string
import sys
column='first_name'
filename="python_test.csv"
encrypted_value_list = []
alpha=string.printable
key=b'sec-key'
seperator_in='|'
seperator_out='|'
outputfile='encypted.csv'
compression_in=None
compression_out=None
df = pd.read_csv(filename,compression=compression_in, sep=seperator, low_memory=False, encoding='utf-8-sig')
df_null = df[df[column].isnull()]
df_notnull = df[df[column].notnull()].copy()
for index,row in df_notnull.iterrows():
e = pyffx.String(key, alphabet=alpha, length=len(row[column]))
encrypted_value_list.append(e.encrypt(row[column]))
df_notnull[column]=encrypted_value_list
df_merged = pd.concat([df_notnull, df_null], axis=0, ignore_index=True, sort=False)
df_merged