In addition to JonSG's answer about getting the hashing/encoding correct, I'd like to comment on how you're reading and writing the CSV files.
It took me a minute to understand how you're dealing with the header vs the body of the CSV here:
with open("File1.csv") as csvfile:
with open("File2.csv", "w") as newfile:
reader = csv.DictReader(csvfile)
for i, r in enumerate(reader):
print(i, r)
if i == 0:
newfile.write(",".join(r) + "\n") # writing csv headers
newfile.write(",".join(r.values()) + "\n")
At first, I didn't realize that calling join()
on a dict would just give back the keys; then you move on to join the values. That's clever!
I think it'd be clearer, and easier, to use the complementary DictWriter.
For clarity, I'm going to separate the reading, processing, and writing:
with open("File1.csv", newline="") as f_in:
reader = csv.DictReader(f_in, skipinitialspace=True)
rows = list(reader)
for row in rows:
row["ID"] = encode_text(row["ID"])
print(row)
with open("File2.csv", "w", newline="") as f_out:
writer = csv.DictWriter(f_out, fieldnames=rows[0])
writer.writeheader()
writer.writerows(rows)
In your case, you'll create your writer and need to give it the fieldnames. I just passed in the first row and the DictWriter() constructor used the keys from that dict to establish the header values. You need to explicitly call the writeheader()
method, then you can write your (processed) rows.
I started with this File1.csv:
ID, Phone, Email
1234680000000000, 123-456-7890, johnsmith@test.com
and ended up with this File2.csv:
ID,Phone,Email
tO2Knao73NzQP/rnBR5t8Hsm/XIQVnsrPKQlsXmpkb8=,123-456-7890,johnsmith@test.com
That organization means all your rows are read into memory first. You mentioned having "thousands of entries", but for those 3 fields of data that'll only be a few hundred KB of RAM, maybe a MB of RAM.
If you do want to "stream" the data through, you'll want something like:
reader = csv.DictReader(f_in, skipinitialspace=True)
writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
row["ID"] = encode_text(row["ID"])
writer.writerow(row)
In this example, I passed reader.fieldnames
to the fieldnames=
param of the DictWriter constructor.
For dealing with multiple files, I'll just open and close them myself, because the multiple with open(...) as x
can look cluttered to me:
f_in = open("File1.csv", newline="")
f_out = open("File2.csv", "w", newline="")
...
f_in.close()
f_out.close()
I don't see any real benefit to the context managers for these simple utility scripts: if the program fails it will automatically close the files.
But the conventional wisdom is to use the with open(...) as x
context managers, like you were. You could do nested, like you were, separate them with a comma, or if you have Python 3.10+ use grouping parenthesis for a cleaner look (also in that Q/A).