3

I have a pandas Dataframe:

   date    time               user_id
0  20160921    5947  13079492369730773513
1  20160921    5948  13079492369730773513
2  20160921  235949  13079492369730773513
3  20160921  235950  13079492369730773513
4  20160921  235951  13079492369730773513

I want to format the 'time' column into:

   date    time               user_id
0  20160921  005947  13079492369730773513
1  20160921  005948  13079492369730773513
2  20160921  235949  13079492369730773513
3  20160921  235950  13079492369730773513
4  20160921  235951  13079492369730773513

I know the list comprehension way:

df['time'] = ["%06d" % t for t in df['time'].tolist()]

Is there any vectorized method to do the same trick? And how to do this if it is a Dask Dataframe?

RottenIvy
  • 63
  • 3

1 Answers1

3

Yes, there is a vectorized method to do the same thing. You can cast the column to strings and then use string methods on it:

df.time.astype(str).str.zfill(6)
0    005947
1    005948
2    235949
3    235950
4    235951

Afterwards assign it back:

df.time = df.time.astype(str).str.zfill(6)

This assumes that the maximum length of the time string is 6 characters.

Unfortunately, this is a lot slower than the list comprehension way:

In [5]: %timeit df.time.astype(str).str.zfill(6)
228 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit ["%06d" % t for t in df['time'].tolist()]
17.5 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

enter image description here

Graipher
  • 6,891
  • 27
  • 47