I would like to extract certain part of a string from csv file

Question

I have vast number of columns containing this kind of data:

DE-JP-202/2066/A2@qwier.cu/68
NL-LK-02206/2136/A1@ozmmfts.de/731
OM-PH-31303222/3671/Z1@jtqy.ml/524

I would like to extract string between '@' and '.' and between '.' and '/' into two separete colums .

Like :

txt 1      txt 2
qwier       cu
ozmmft      de
jtqy        ml

Tried:

x = dane.str.extract(r'@(?P<txt1>\d)\.(?P<txt2>[ab\d])/')

But doesn't work

The fourth bird · Accepted Answer · 2019-11-05T19:57:40.317

3

If you want to get 2 capturing groups, you could use 2 negated character classes.

In the first group match 1+ times any char except a dot [^.]+

In the second group match 1+ times any char except a forward slash [^/]+

@(?P<txt1>[^.]+)\.(?P<txt2>[^/]+)/

edited Nov 05 '19 at 19:57

answered Nov 05 '19 at 19:49

The fourth bird

1

Was just about to add a regex demo for you, nice one, thanks for the explanation. – Umar.H Nov 05 '19 at 19:51
What if error: DataFrame' object has no attribute 'str' occur? I have dataset with 60 colums like in exmple above and my code looks like ```dane = p.read_csv('dane.csv',delimiter=';') x = dane.str.extract(r'@(?P[^.]+)\.(?P[^/]+)') – Matadora Nov 05 '19 at 19:58
@Matadora Perhaps this page https://stackoverflow.com/questions/51502263/pandas-dataframe-object-has-no-attribute-str can be helpful. – The fourth bird Nov 05 '19 at 20:02

score 0 · Answer 2 · answered Nov 05 '19 at 19:50

0

If the formatting of your strings all have only 1 @ and 1 .. You can do the following:

s = 'DE-JP-202/2066/A2@qwier.cu/68'

column1 = s.split('@')[1].split('.')[0]

column2 = s.split('@')[1].split('.')[1].split('/')[0]

answered Nov 05 '19 at 19:50

coldsoup

What if I have huge amount of columns and this error occures while using your solution :'DataFrame' object has no attribute 'split' – Matadora Nov 05 '19 at 19:55
And I have like data set with 60 colums and 20576 rows – Matadora Nov 05 '19 at 19:56
Can you cast DataFrame object to a string? For example, str(s).split('@')[1].split('.')[0] – coldsoup Nov 05 '19 at 20:01

2 Answers2