0

I have vast number of columns containing this kind of data:

DE-JP-202/2066/A2@qwier.cu/68
NL-LK-02206/2136/A1@ozmmfts.de/731
OM-PH-31303222/3671/Z1@jtqy.ml/524

I would like to extract string between '@' and '.' and between '.' and '/' into two separete colums .

Like :

txt 1      txt 2
qwier       cu
ozmmft      de
jtqy        ml

Tried:

x = dane.str.extract(r'@(?P<txt1>\d)\.(?P<txt2>[ab\d])/')

But doesn't work

Matadora
  • 17
  • 4

2 Answers2

3

If you want to get 2 capturing groups, you could use 2 negated character classes.

In the first group match 1+ times any char except a dot [^.]+

In the second group match 1+ times any char except a forward slash [^/]+

@(?P<txt1>[^.]+)\.(?P<txt2>[^/]+)/

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Was just about to add a regex demo for you, nice one, thanks for the explanation. – Umar.H Nov 05 '19 at 19:51
  • What if error: DataFrame' object has no attribute 'str' occur? I have dataset with 60 colums like in exmple above and my code looks like ```dane = p.read_csv('dane.csv',delimiter=';') x = dane.str.extract(r'@(?P[^.]+)\.(?P[^/]+)') – Matadora Nov 05 '19 at 19:58
  • @Matadora Perhaps this page https://stackoverflow.com/questions/51502263/pandas-dataframe-object-has-no-attribute-str can be helpful. – The fourth bird Nov 05 '19 at 20:02
0

If the formatting of your strings all have only 1 @ and 1 .. You can do the following:

s = 'DE-JP-202/2066/A2@qwier.cu/68'

column1 = s.split('@')[1].split('.')[0]

column2 = s.split('@')[1].split('.')[1].split('/')[0]

coldsoup
  • 31
  • 3