8
all_data['Title']= all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]

Can anyone explain what is the meaning of this line of code? Especially with the expand=True and [1] [0].

Robert Christopher
  • 4,940
  • 1
  • 20
  • 21
Hui Yang ONG
  • 91
  • 2
  • 2
  • 4
  • 2
    That isn't the usual string split method, it's a specific method of whatever kind of object `all_data['Name'].str` returns. I believe you're working with a Pandas dataframe here, you need to see the Pandas documentation to see what methods its objects define, and what parameters they take. – jasonharper Sep 08 '20 at 14:59

2 Answers2

7

If you are using Pandas it is likely that you know also Jupyter Notebooks. So, for simplicity and readability let's complete the code you've posted with some additional information in a Notebook-like format:

(this markdown is here to override an error in the answer window interpreter)

```lang-python
    import pandas as pd
    
    raw_name = ['Bob, Mr. Ross', 'Alice, Mrs. Algae', 'Larry, Mr. lemon', 'John, Mr. Johnson']
    all_data = pd.DataFrame({'Name': raw_name})
    
    # This the OP's line
    all_data['Title'] = all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]
    
    all_data
Name Title
0 Bob, Mr. Ross Mr
1 Alice, Mrs. Algae Mrs
2 Larry, Mr. Lemon Mr
3 John, Mr. Johnson Mr

Where the expand=True renders a set of columns of strings. Therefore, after the first split, you may apply again another str.split method since the first split has rendered dataframe of strings as columns. This would have been a little more complicated with a regular split (or expand=False) which renders a series of lists.

Better explained with code examples:

    all_data['Name'].str.split(', ', expand=False) # or no expand at all
0
0 [Bob, Mr. Ross]
1 [Alice, Mrs. Algae]
2 [Larry, Mr. Lemon]
3 [John, Mr. Johnson]
    all_data['Name'].str.split(', ', expand=True) 
0 1
0 Bob Mr. Ross
1 Alice Mrs. Algae
2 Larry Mr. Lemon
3 John Mr. Johnson
    all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=False)
0
0 [Mr, Ross]
1 [Mrs, Algae]
2 [Mr, Lemon]
3 [Mr, Johnson]
    all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)
0 1
0 Mr Ross
1 Mrs Algae
2 Mr Lemon
3 Mr Johnson

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html

Alejandro QA
  • 334
  • 4
  • 9
  • The error that keeps popping up is: "Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon." – Alejandro QA May 11 '22 at 01:14
2

Take a look here: pandas.Series.str.split

Expand the split strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.

If False, return Series/Index, containing lists of strings.

s = pd.Series(
    [
        "this is a regular sentence",
    ]
)    
s.str.split(expand=True)

0 1 2 3 4
this is a regular sentence

wp78de
  • 18,207
  • 7
  • 43
  • 71