0

I'm trying to concatenate two dataframes with Polars in Python, and it keeps throwing an error despite my syntax appearing to be correct based on the docs.

Specifically, I've got ldf_a with shape: (4, 33) and ldf_b with shape: (4, 33).

When I say

ldf_c = pl.concat([ldf_a,ldf_b,],rechunk=True,how="vertical")

or

ldf_c = pl.concat([ldf_a,ldf_b,],rechunk=True,how="diagonal")

it throws an error saying

cannot vstack: because column names in the two DataFrames do not match for left.name='type_of_lapse' != right.name='type_of_record'

Now, I don't believe that for a second, because when I say ldf_a.columns and ldf_b.columns I can very clearly see that the column names are the same. And even if they weren't, isn't that what the diagonal method is supposed to solve? Thoughts? Also, let me know if you need further clarification!

Edit: Here's an example of ldf_a

policy_key type_of_record type_of_lapse coverage_substatus actual_mortality_count actual_persistency_count last_effective_date_in_exp max_issue_age max_issue_date max_paid_to_date max_plan_gender_code max_risk_code max_par_code max_distribution max_substandard_multiplier max_current_volume max_segment_indicator has_lapse has_death max_effective_date max_transaction_date min_coverage_status min_coverage_substatus max_coverage_status max_date_of_death max_cause_of_death max_agnis_plan_code max_roe_line_of_business max_coverage_years source 16 35 unpaid_matured_endowment_date
123 V 16 0 0 2020-01-01 10 1962-10-01 1994-09-01 U XX P Mass 1 500 K N N 2019-10-01 00 05 10 15 12345 07 99 EXP 2005-12-31 2007-12-31

And here's an example of ldf_b

policy_key type_of_record type_of_lapse coverage_substatus actual_mortality_count actual_persistency_count last_effective_date_in_exp max_issue_age max_issue_date max_paid_to_date max_plan_gender_code max_risk_code max_par_code max_distribution max_substandard_multiplier max_current_volume max_segment_indicator has_lapse has_death max_effective_date max_transaction_date min_coverage_status min_coverage_substatus max_coverage_status max_date_of_death max_cause_of_death max_agnis_plan_code max_roe_line_of_business max_coverage_years source 16 35 unpaid_matured_endowment_date
456 V 16 0 0 2020-01-01 10 1962-10-01 1994-09-01 U XX P Mass 1 500 K N N 2019-10-01 00 05 10 15 12345 07 99 EXP 2005-12-31 2007-12-31
  • Can you provide example dataframes? that would be very helpful – Nikolay Zakirov Oct 19 '22 at 03:53
  • What happens when you have just the first three columns (so including `type_of_lapse`, which is what the error is about)? And could you provide your example dataframes defined as code, so people can run the example for themselves? Final remark, given you name your dataframes `ldf_x`, I am assuming that they are lazy. Note that concatting lazy dataframes using diagonal is not supported: https://github.com/pola-rs/polars/issues/5082 – jvz Oct 22 '22 at 15:28

0 Answers0