NameError: global name 'NoneType' is not defined in Spark

Question

I have written a UDF to replace a few specific date values in a column named "latest_travel_date" with 'NA'. However, this column also contains many null values, so I have handled this also in the UDF. (please see below)

Query:
def date_cleaner(date_col):
    if type(date_col) == NoneType:
        pass
    else:
        if year(date_col) in ('1899','1900'):
            date_col= 'NA'
        else:
            pass
    return date_col

date_cleaner_udf = udf(date_cleaner, DateType())

Df3= Df2.withColumn("latest_cleaned", date_cleaner_udf("latest_travel_date"))

However, I am continuously getting the error: NameError: global name 'NoneType' is not defined

Can anyone please help me to resolve this?

score 4 · Answer 1 · answered Aug 19 '16 at 14:28

4

This issue could be solved by two ways.

If you try to find the Null values from your dataFrame you should use the NullType.

Like this:

if type(date_col) == NullType

Or you can find if the date_col is None like this:

if date_col is None

I hope this help.

answered Aug 19 '16 at 14:28

Thiago Baldim

7,362
3
29
51

I tried with both the options as you suggested, but it ends up in the error:AttributeError: 'NoneType' object has no attribute '_jvm' – Preyas Aug 19 '16 at 14:31
Can you do something? Can you add to your question a part of your dataframe. I did the same thing as you did in my spark. But this issue didn't happens. We need to see this dataFrame. – Thiago Baldim Aug 19 '16 at 16:03

score 1 · Answer 2 · answered Aug 19 '16 at 14:22

1

The problem is this line:

if type(date_col) == NoneType:

It looks like you actually want:

if date_col is None:

answered Aug 19 '16 at 14:22

Michael Aaron Safyan

93,612
16
138
200

When I do this, it results in the error: 'NoneType' object has no attribute '_jvm' – Preyas Aug 19 '16 at 14:29
@Preyas is that reported from the same line? What's your stack trace? – Michael Aaron Safyan Aug 19 '16 at 14:30
No the line generating the error changes between the two queries. – Preyas Aug 19 '16 at 14:40
@Preyas, it sounds like you must have code elsewhere that is operating on an object that is None. You should find the source of that stack trace and ensure that the operation is bypassed in the case where the input is None. – Michael Aaron Safyan Aug 19 '16 at 14:47

score 0 · Answer 3 · answered Aug 20 '16 at 21:48

As pointed out by Michael, you cannot do

if type(date_col) == NoneType:

However, changing that to None won't complete the task. There is another issue with

date_col= 'NA'

It is of StringType but you declared the return type to be DateType. Your _jvm error in the comment was complaining this mis-match of data types.

It seems you just want to mark date_col to be None when it is 1899 or 1900, and drop all Nulls. If so, you can do this:

def date_cleaner(date_col):
    if date_col:
        if year(date_col) in ('1899','1900'):
            return None

    return date_col

date_cleaner_udf = udf(date_cleaner, DateType())

Df3= Df2.withColumn("latest_cleaned", date_cleaner_udf("latest_travel_date")).dropna(subset=["latest_travel_date"])

This is because DateType could either take a valid datetime or Null (by default). You could do dropna to "clean" your dataframe.

NameError: global name 'NoneType' is not defined in Spark

3 Answers3