1

I have a data frame as below.

pl.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
                                 ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
                                ]})

Here combine address is a list type column which has elements with about 6 pipe(|) values, Here i would like to apply a split on each element with an separator(|) in a list.

Here is the expected output:

enter image description here

If a list has 3 elements the splitted columns will be 3*6=18

If a list has 5 elements the splitted columns will be 5*6=30 and so on so forth.

myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
  • Split strings inside the list by "|", then merge them together as a new list. For instance, `x = ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"]`, `res = [i for field in (z.split("|") for z in x) for i in field]` – hide1nbush Oct 27 '22 at 06:07
  • I have never used polars, but I see in the manual that it can be broken down into lists in the following way. Now I don't know how to extend it and make it into a column. I can't help you with anything, but just FYI. `[ x.split('|') for x in pl.Series.to_list(df['combine_address'])[0]]` – r-beginners Oct 27 '22 at 07:48

2 Answers2

5

Is this what you are looking for?



df = pl.DataFrame({"combine_address":[
    ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
    ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
]})

(df.select(
    pl.col("combine_address").reshape((1, -1))
    .arr.join("|").str.split("|")
    .arr.to_struct(n_field_strategy="max_width")
).unnest("combine_address"))
shape: (1, 36)
┌─────────┬───────────┬─────────┬─────────┬─────┬──────────┬──────────┬──────────┬──────────┐
│ field_0 ┆ field_1   ┆ field_2 ┆ field_3 ┆ ... ┆ field_32 ┆ field_33 ┆ field_34 ┆ field_35 │
│ ---     ┆ ---       ┆ ---     ┆ ---     ┆     ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str       ┆ str     ┆ str     ┆     ┆ str      ┆ str      ┆ str      ┆ str      │
╞═════════╪═══════════╪═════════╪═════════╪═════╪══════════╪══════════╪══════════╪══════════╡
│ Yes     ┆ #456 Lane ┆ Apt#4   ┆ ABC     ┆ ... ┆ APT#94   ┆ SWE      ┆ WA       ┆ 43593    │
└─────────┴───────────┴─────────┴─────────┴─────┴──────────┴──────────┴──────────┴──────────┘

ritchie46
  • 10,405
  • 1
  • 24
  • 43
0
import polars as pl
import pandas as pd
df=pd.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
                                 ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
                                ]})

The above is the original code. Then, you can try the following below.

a=[]
for i in range(len(df['combine_address'])):
    a+=[j.split('|') for j in df['combine_address'][i]]
b=[]
for i in range(len(a)):
    b+=a[i]

and you will get a list with 36 elements.

c=pd.DataFrame(b).T
pl.from_pandas(c)

This is like your expected output. shape:(1,36)

I hope this will help you.