I'm using Panda read_csv. The last column on most rows is missing-data, as shown in the sample below. But on a few rows, the data is there. Instead of treating it as null, it seems to be treating it as NAN. I was trying to create an if statement to show just the rows that have data in that column.
(Sample extract from American Express to a CSV):
01/01/2018 Mon,,"GOOGLE *SVCSAPPS_NEALW - CC@GOOGLE.COM, CA",Neal Walters,XXXX-XXXXXX-XXXXX,,,4.16,,,GOOGLE SERVICES,"1600 AMPHITHEATRE PKWYMOUNTAIN VIEWCA","94043-1351UNITED STATES",'320180020394601453',
colnames=['DateTime', 'NotUsed2', 'PayeeLong', 'NotUsed4', 'NotUsed5', 'NotUsed6', 'NotUsed7', 'Amount', 'NotUsed9',
'NotUsed10', 'Payee', 'PayeeAddress', 'PayeeCountry', 'NotUsedX', 'AmexCategory']
data = pd.read_csv(filenameAmexGold, names=colnames, header=None)
# Preview the first 5 lines of the loaded data
print (data.head())
for j in range(len(data)):
#if not(math.isnan(data['AmexCategory'][j])):
# if data['AmexCategory'][j] > ' ':
print("Row ", j, data['DateTime'][j], data['Payee'][j], data['Amount'][j],
"AmexCat=", data['AmexCategory'][j],
"PayeeLong=", data['PayeeLong'][j] )
Sample output of the data.head...
DateTime NotUsed2 ... NotUsedX AmexCategory
0 01/01/2018 Mon NaN ... '320180021453' NaN
1 01/02/2018 Tue NaN ... '320180035375' NaN
2 01/04/2018 Thu NaN ... '320180043184' NaN
3 01/08/2018 Mon NaN ... '320180080899' 'Software'
4 01/13/2018 Sat NaN ... '320180133142' NaN
When I include the two commented-out if statements, I get this error:
TypeError: must be real number, not str
PART2
Similarly, Row 19 has no PAYEE, since it's a payment, not a charge.
01/26/2018 Fri,20,AUTOPAY PAYMENT - THANK YOU,Neal Walters,XXXX-XXXXXX-XXXXX,,,-347.52,,,,,,'320180260752306017',
I know this row is showing as NaN in the data.head(20), so I want to know how to test it for null or NaN. When I list the dtypes, it shows that Payee is an object (not a float). To me it's just a string field, but I guess that's an object.
#This test works
print("Test2", dfAmexGold['Payee'][19])
if (math.isnan( dfAmexGold['Payee'][19])):
print("found a NAN value")
print("Test1", dfAmexGold['Payee'][20])
if (math.isnan( dfAmexGold['Payee'][20])):
print("found a NAN value")
The test for row 20 blows up with this:
TypeError: must be real number, not str
The question is how to do If tests on individual items, and why it's not consistent using Null for empty cells instead of NaN.
I also tried, but this does not show the row as NULL (but doesn't blow up either). if dfAmexGold['Payee'][19] is None: print("found a NULL value")