1

I am trying my hand at Python 3 type hinting in order to make my code cleaner. Whenever I try to pass my code via mypy, there are some repeated errors mentioning something about Optional data types, and I cannot figure out why.

Here is a sample function.

def directory_to_dataframe(directory_name: str) -> pandas.DataFrame:
    '''
    Get the contents of a directory as a single dataframe.

    :param directory_name: The directory to read
    '''
    # print(directory_name)
    directory: str = os.path.join(CONTAINER_ROOT, directory_name)

    def read_method(parquet: str) -> pandas.DataFrame:
        return pandas.read_parquet(path=os.path.join(directory,
                                                     parquet))

    if not os.path.isdir(s=directory):
        return pandas.DataFrame()
    files_to_read: Iterator[str] = filter(
        lambda filename: filename.endswith('.parquet'),
        os.listdir(path=directory))
    try:
        return pandas.concat(objs=map(read_method, files_to_read),
                             verify_integrity=True,
                             ignore_index=True).fillna(value=0)  # Mypy raises error
    except ValueError:
        return pandas.DataFrame()

Mypy complains that Incompatible return value type (got "Optional[DataFrame]", expected "DataFrame")

I thought pandas.concat() will always return a dataframe, so I fail to see why the return type has to be Optional (which implies a possibility of None return).

Here is another similar case.

daily_count: pandas.DataFrame = pandas.DataFrame(data=daily_ids['ID'].value_counts()).reset_index()

Mypy complains expression has type "Optional[DataFrame]", variable has type "DataFrame"

So I am missing something about when to apply Optional to variable or function return types to make mypy happy. Also, if I do apply Optional, then in any further operation involving the variable, MyPy complains the operation is not valid on None type.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Della
  • 1,264
  • 2
  • 15
  • 32
  • Oof. I think because a lot of pandas methods use the whole `inplace=True` pattern, where potentially they might return `None`. Since you know it doesn't, you could use `cast` – juanpa.arrivillaga Mar 04 '21 at 06:24
  • What version of `pandas` are you using? What stubs are you using for pandas? – juanpa.arrivillaga Mar 04 '21 at 06:30
  • I am using pandas 1.0.5, not sure how to find the stub version. concat does not have an inplace argument. Also whichever method has it, usually defaults it to False so that I get a modified copy. Does it mean I have to give the default value to make mypy happy? When I need inplace modification, can mypy adjust its expectation according to the parameter? Also, how to use cast in this instance? – Della Mar 04 '21 at 06:45
  • It's not `pd.concat`, it's `.fillna(value=0)`. The type system has no knowledge of what effect arguments can have on the return value. Again, though, you *can* just use `typing.cast` – juanpa.arrivillaga Mar 04 '21 at 06:47

0 Answers0