1

I have a python code and i need to use mrjob to make my python script more faster.

How do I make below script to use mrJob?

the below script works fine for small file, but when i run large file it takes forever. so I am planning to use mrJob which is a mapReducer python package. So, problem is : I dont know how to use mrJob for this script, please advise?

import os
import pandas as pd
import pyffx
import string
import sys

column='first_name'
filename="python_test.csv"
encrypted_value_list = []
alpha=string.printable
key=b'sec-key'
seperator_in='|'
seperator_out='|'
outputfile='encypted.csv'
compression_in=None
compression_out=None

df = pd.read_csv(filename,compression=compression_in, sep=seperator, low_memory=False, encoding='utf-8-sig')
df_null = df[df[column].isnull()]
df_notnull = df[df[column].notnull()].copy()


for index,row in df_notnull.iterrows():
   e = pyffx.String(key, alphabet=alpha, length=len(row[column]))
   encrypted_value_list.append(e.encrypt(row[column]))

df_notnull[column]=encrypted_value_list
df_merged = pd.concat([df_notnull, df_null], axis=0, ignore_index=True, sort=False)
df_merged
SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
st_bones
  • 119
  • 1
  • 3
  • 12
  • 1
    What’s the problem, exactly? – AMC Dec 05 '19 at 04:19
  • i have updated my question. the above script works fine for small file, but when i run large file it takes forever. so I am planning to use mrJob which is a mapReducer python package. So, problem is : I dont know how to use mrJob for this script, please advise? – st_bones Dec 05 '19 at 04:58
  • 2
    Well, what have you tried? Is there a specific issue? You’ve provided no context for this program. – AMC Dec 05 '19 at 04:59
  • MapReduce isn't applicable to every type of job. I would recommend you at least try Dask or pyspark first – OneCricketeer Dec 12 '19 at 02:47

0 Answers0