CSV file contains values such as "","ab,abc",,"abc". Note, I am referring to empty value ,, as in unknown value. This is different from "", where a value has not been set yet. I am treating these two values differently. I need a way to read "" and empty value ,, and distinguish between the two. I am mapping data to numbers such that "" is mapped to 0 and ,, is mapped to NaN. Note, I am not having a parsing issue and field such as "ab,abc" is being parsed just fine with comma as the delimiter. The issue is python reads "" and empty value,, as empty string such as ' '. And these two values are not same and should not be grouped into empty string.
Not only this, but I also need to write csv file such that "" is written as "" and not ,, and NaN should be written as ,, (empty value).
I have looked into csv Dialects such as doublequote, escapechar, quotechar, quoting. This is NOT what I want. These are all cases where delimiter appears within data ie "ab,abc" and as I mentioned, parsing with special characters is not an issue.
I don't want to use Pandas. The only thing I can think of is regex? But that's an overhead if I have millions of lines to process.
The behaviour I want is this:
a = "\"\"" (or it could be a="" or a="ab,abc")
if (a=="\"\""):
map[0]=0
elif(a==""):
map[0]=np.nan
else:
map[0] = a
My csv reader is as follows:
import csv
f = open(filepath, 'r')
csvreader = csv.reader(f)
for row in csvreader:
print(row)
I want above behaviour when reading csv files though. currently only two values are read: ' ' (empty string) or 'ab,abc'.
I want 3 different values to be read. ' ' empty string, '""' string with double quotes, and actual string 'ab,abc'