2

I have a DataFrame with one column storing the date.

However, some of these dates are properly formatted datetime objects like'2018-12-24 17:00:00'while others are not and are stored like '20181225'.

When I tried to plot these using plotly, the improperly formatted values got turned into EPOCH dates, which is a problem.

Is there any way I can get a copy of the DataFrame with only those rows with properly formatted dates?

I tried using

clean_dict= dailySum_df.where(dailySum_df[isinstance(dailySum_df['time'],datetime.datetime)])

methods and but it doesn't to work due to the 'Array conditional must be same shape as self' error.

    dailySum_df = pd.DataFrame(list(cursors['dailySum']))

    trace = go.Scatter(
        x=dailySum_df['time'],
        y=dailySum_df['countMessageIn']

    )
    data = [trace]
    py.plot(data, filename='basic-line')
tawab_shakeel
  • 3,701
  • 10
  • 26
tansawit
  • 35
  • 1
  • 6

2 Answers2

0

Try parsing the dates column of your dataframe using dateutil.parser.parse and Pandas apply function.

enter image description here

0

Apply dateutil.parser, see also my answer here:

import dateutil.parser as dparser
def myparser(x):
    try:
       return dparser.parse(x)
    except:
       return None

df = pd.DataFrame( {'time': ['2018-12-24 17:00:00', '20181225', 'no date at all'], 'countMessageIn': [1,2,3]})
df.time = df.time.apply(myparser)
df = df[df.time.notnull()]

Input:

                  time  countMessageIn
0  2018-12-24 17:00:00               1
1             20181225               2
2       no date at all               3

Output:

                 time  countMessageIn
0 2018-12-24 17:00:00               1
1 2018-12-25 00:00:00               2

Unlike Gustavo's solution this can handle rows with no recognizable date at all and it filters out such rows as required by your question.

If your original time column may contain other text besides the dates themselves, include the fuzzy=True parameter as shown here.

Stef
  • 28,728
  • 2
  • 24
  • 52