I have a string as follows: 2020-01-01T16:30.00 - 1.00
. I want to select the string that is between T
and -
, i.e. I want to be able to select 16:30.00
out of the whole string and convert it to a float. Any help is appreciated.
Asked
Active
Viewed 505 times
1
-
You should use `datetime` type for date/time data. – Quang Hoang Jul 06 '20 at 14:49
-
The data comes in the format I showed. How to I select the time part out of it. – S_Scouse Jul 06 '20 at 14:50
-
df['your_column'].apply( lambda x: str(x)[-15:-7] ), if and only if the format stays the same – Vadim Jul 06 '20 at 14:59
-
It doesn't stay the same, I am looking for something general. – S_Scouse Jul 06 '20 at 15:00
-
I found the following answer that does it efficiently. https://stackoverflow.com/questions/39662149/pandas-extract-date-and-time-from-timestamp – S_Scouse Jul 06 '20 at 15:06
1 Answers
1
If you have a pandas Series s
like this
import pandas as pd
s = pd.Series(["2020-01-01T16:30.00 - 1.00", "2020-12-04T00:25.00 - 14.00"])
you can use
s.str.replace(".+T", "").str.replace(" -.+", "")
# 0 16:30.00
# 1 00:25.00
# dtype: object
Basically, you first substitute with an empty string everything that precedes the T
and the T
itself. Then, you substitute with an empty string the part starting with -
(there is a whitespace before the small dash).
Another option is to use groups of regular expressions to match particular patterns and select only one of the groups (in this case the second, .+
)
import re
s.apply(lambda x: re.match("(.+T)(.+)( -.+)", x).group(2))
# 0 16:30.00
# 1 00:25.00
# dtype: object

Ric S
- 9,073
- 3
- 25
- 51
-
Thank you, using datetime library is another way to do it. I found it in one of the stackoverflow answers. – S_Scouse Jul 06 '20 at 15:07
-
-
1
-
Thank you, very useful. I might use it for some other string selection need. – S_Scouse Jul 06 '20 at 15:23