0

My question is regarding to the data which is being passed in the re.split() function. I have the following data

Name Sport
John Football;NBA,Tennis
Mary Squash,Tetris;MMA
Scott Cricket,Tennis
Kim Rugby,WNBA;Footy

I am trying to split the strings using ';' and ',' as the delimiters. Initially the data type of the Name and Sports column is 'object'

import numpy as np
import pandas as pd
import re
df = pd.read_excel(r'Filepath\sports.xlsx',sheet_name = 'data')
df[['Name','Sport']] = df[['Name','Sport']].astype('string')
print(df.dtypes)
df[['A']] = re.split(r';,',df['Sport'])
df 

After converting to string and then trying to split. I get the following error.

TypeError: expected string or bytes-like object

I tried using

df[['A']] = re.split(r';,',df['Sport'].astype('string'))

But the error is till persisting. Any suggestions?

Rogue258
  • 109
  • 1
  • 9
  • Does this answer your question? [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Woodford Nov 18 '21 at 18:38
  • Post the full traceback message - we shouldn't have to guess where the error is. – tdelaney Nov 18 '21 at 19:39
  • @tdelaney apologies for that. I can't post the full traceback message as the original data is confidential. I have generated sample data here with the variables changed to reflect that. However, I get your point and therefore I should have posted that I was getting an error for this line of code within the code block. Hope this clears any doubts. df[['A']] = re.split(r';,',df['Sport']) – Rogue258 Nov 19 '21 at 15:52

1 Answers1

2

re is a library that recieves a String type, not a Pandas dataframe column you should use an accessor in this case

df[['A']] = df['Sport'].str.split(r';,')

I hope it resolves your problem

Carlos
  • 190
  • 8