0

Looking for a way to combine two DataFrames.

df1:

shape: (2, 2)
┌────────┬──────────────────────┐
│ Fruit  ┆ Phosphorus (mg/100g) │
│ ---    ┆ ---                  │
│ str    ┆ i32                  │
╞════════╪══════════════════════╡
│ Apple  ┆ 11                   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Banana ┆ 22                   │
└────────┴──────────────────────┘

df2:

shape: (1, 3)
┌──────┬─────────────────────┬──────────────────────┐
│ Name ┆ Potassium (mg/100g) ┆ Phosphorus (mg/100g) │
│ ---  ┆ ---                 ┆ ---                  │
│ str  ┆ i32                 ┆ i32                  │
╞══════╪═════════════════════╪══════════════════════╡
│ Pear ┆ 115                 ┆ 12                   │
└──────┴─────────────────────┴──────────────────────┘

Result should be:

shape: (3, 3)
+--------+----------------------+---------------------+
| Fruit  | Phosphorus (mg/100g) | Potassium (mg/100g) |
| ---    | ---                  | ---                 |
| str    | i32                  | i32                 |
+========+======================+=====================+
| Apple  | 11                   | null                |
+--------+----------------------+---------------------+
| Banana | 22                   | null                |
+--------+----------------------+---------------------+
| Pear   | 12                   | 115                 |
+--------+----------------------+---------------------+

Here is the code sniplet I try to make work:

use polars::prelude::*;

fn main() {
    let df1: DataFrame = df!("Fruit" => &["Apple", "Banana"],
                         "Phosphorus (mg/100g)" => &[11, 22])
    .unwrap();

    let df2: DataFrame = df!("Name" => &["Pear"],
                            "Potassium (mg/100g)" => &[115],
                            "Phosphorus (mg/100g)" => &[12]
    )
    .unwrap();

    let df3: DataFrame = df1
        .join(&df2, ["Fruit"], ["Name"], JoinType::Left, None)
        .unwrap();

    assert_eq!(df3.shape(), (3, 3));
    println!("{}", df3);
}

It's a FULL OUTER JOIN I am looking for ...

The ERROR I get:

thread 'main' panicked at 'assertion failed: (left == right) left: (2, 4), right: (3, 3)', src\main.rs:18:5

Robert
  • 131
  • 1
  • 7
  • I think what you’re looking for is a `concat` with `how=diagonal`. Here’s the doc for the Python equivalent: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.concat.html#polars.concat –  Sep 26 '22 at 04:57
  • I believe this is diagonal_concat in Rust: https://pola-rs.github.io/polars/polars/ –  Sep 26 '22 at 05:07
  • @cbilot Thanks for the suggestion. The `concat` function you refer to is for python ... there are similar functions on the rust side ... but non works: `hor_concat_df` "barks" @ duplicate column_names and `diag_concat_df` can't handle different shapes of the dfs. – Robert Sep 26 '22 at 14:19

2 Answers2

2

You need to explicitly specify the columns you are going to merge, and use JoinType::Outer for the outer join:

use polars::prelude::*;

fn main() {
    let df1: DataFrame = df!("Fruit" => &["Apple", "Banana"],
                         "Phosphorus (mg/100g)" => &[11, 22])
    .unwrap();

    let df2: DataFrame = df!("Name" => &["Pear"],
                            "Potassium (mg/100g)" => &[115],
                            "Phosphorus (mg/100g)" => &[12]
    )
    .unwrap();

    let df3: DataFrame = df1
        .join(
            &df2,
            ["Fruit", "Phosphorus (mg/100g)"],
            ["Name", "Phosphorus (mg/100g)"],
            JoinType::Outer,
            None).unwrap();

    assert_eq!(df3.shape(), (3, 3));
    println!("{}", df3);
}
Ayaz Amin
  • 180
  • 3
  • 11
  • @ Ayaz, thanks for the suggested solution ... it's a little bit of work to have first create a list of common column_names and then do a join on all of them to add the outer_join columns ... I thought there was / excists a more straightforward way of doing this. ... but thanks :) – Robert Sep 26 '22 at 14:25
0

Thanks to @Ayaz :) I was able to make a generic version, one where I do not need to specify the shared column names each time.

Here is my version of the FULL OUTER JOIN of two DataFrames:

use polars::prelude::*;
use array_tool::vec::{Intersect};

fn concat_df(df1: &DataFrame, df2: &DataFrame) -> Result<DataFrame, PolarsError> {
    if df1.is_empty() {
        return Ok(df2.clone());
    }

    let df1_column_names = df1.get_column_names();
    let df2_column_names = df2.get_column_names();

    let common_column_names = &df1_column_names.intersect(df2_column_names)[..];

    df1.join(
        df2,
        common_column_names,
        common_column_names,
        JoinType::Outer,
        None,
    )
}
Robert
  • 131
  • 1
  • 7