Start from clearing a misunderstanding
I noticed such an error in your check procedure:
Calling df.index.duplicated().any()
checks only that the index has no
duplicates.
To investigate the issue, I created my input file from your data (just 10 data rows):
colx
2017-01-06 14:37:16
2017-01-27 00:00:00
2017-01-18 00:00:00
2017-01-26 00:00:00
None
2019-10-22 11:20:03
None
2019-07-11 00:00:00
None
2019-07-15 00:00:00
I read it calling read_csv, called df.duplicated().any()
and the
result was True, so there are duplicates in colx column.
Run df.duplicated()
and you will see that True is printed for
rows with index 6 and 8 (second and third instance of None string).
Another check: Run df.info()
and you will get:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
colx 10 non-null object
dtypes: object(1)
memory usage: 208.0+ bytes
This confirms that no element has "true" None value.
There are only strings containing "None".
Another remark: all(df.colx.index == range(df.colx.shape[0]))
checks
only that the index contains consecutive numbers, which says
nothing about the content of colx.
How you read your DataFrame
I suppose your read your DataFrame calling e.g. read_csv, without any
conversion, so colx column is of object (actually string) type.
In such case an attempt to call pd.to_datetime fails on the first
element containing None (a string), because it can not be converted
to datetime.
What to do
Try the following approach:
When reading the DataFrame, pass na_values=['None'] parameter.
It provides that elements containing None are not left as strings,
but are converted to NaNs.
Print the DataFrame (read from my limited source).
Instead of None (a string) there will be NaN - a special case of float.
Run df.info(). This time the printout will be:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 1 columns):
colx 7 non-null object
dtypes: object(1)
memory usage: 208.0+ bytes
Note that there are only 7 non-null values, out of total 10,
so the 3 remaining are "true" None values, which Pandas prints as NaN.
Run pd.to_datetime(df.colx). This time there should be no error.