1

I can read a csv file which does not have column headers in the file. With the following code using polars in rust:

use polars::prelude::*;

fn read_wine_data() -> Result<DataFrame> {
    let file = "datastore/wine.data";
    CsvReader::from_path(file)?
        .has_header(false)
        .finish()
}


fn main() {
    let df = read_wine_data();
    match df {
        Ok(content) => println!("{:?}", content.head(Some(10))),
        Err(error) => panic!("Problem reading file: {:?}", error)
    }
}

But now I want to add column names into the dataframe while reading or after reading, how can I add the columns names. Here is a column name vector:

let COLUMN_NAMES = vec![
    "Class label", "Alcohol",
    "Malic acid", "Ash",
    "Alcalinity of ash", "Magnesium",
    "Total phenols", "Flavanoids",
    "Nonflavanoid phenols",
    "Proanthocyanins",
    "Color intensity", "Hue",
    "OD280/OD315 of diluted wines",
    "Proline"
];

How can I add these names to the dataframe. The data can be downloaded with the following code:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
DataPsycho
  • 958
  • 1
  • 8
  • 28
  • Not super familiar with polars, but with python DataFrame libraries you can pass in headers when reading CSV files. Just taking a quick look at the polars docs, it seems that you can use the with_schema or with_dtypes method on the CsvReader, where a Schema would be your column definitions. https://pola-rs.github.io/polars/polars/prelude/struct.CsvReader.html#method.with_dtypes – emagers Jun 11 '22 at 18:11
  • Yes, I have seen that API doc also, but I have not found any instruction that how the inputs with_schema or with_dtypes should look like of course it takes a Schema type but could find a concrete example of the constructor of the Schema type. The Schema::new() method does not take any input. – DataPsycho Jun 11 '22 at 21:19
  • It looks like Schema has another method, with_column, that takes in a column name and a datatype. So, my assumption would be Schema::new().with_column("columnname", DataType::Int32): https://pola-rs.github.io/polars/polars/chunked_array/object/struct.Schema.html#method.with_column – emagers Jun 12 '22 at 15:41

2 Answers2

3

This seemed to work, by creating a schema object and passing it in with the with_schema method on the CsvReader:

use polars::prelude::*;
use polars::datatypes::DataType;

fn read_wine_data() -> Result<DataFrame> {
  let file = "datastore/wine.data";

  let mut schema: Schema = Schema::new();
  schema.with_column("wine".to_string(), DataType::Float32);

  CsvReader::from_path(file)?
      .has_header(false)
      .with_schema(&schema)
      .finish()
 }


fn main() {
    let df = read_wine_data();
    match df {
        Ok(content) => println!("{:?}", content.head(Some(10))),
        Err(error) => panic!("Problem reading file: {:?}", error)
    }
}

Granted I don't know what the column names should be, but this is the output I got when adding the one column:

shape: (10, 1)
┌──────┐
│ wine │
│ ---  │
│ f32  │
╞══════╡
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ ...  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 1.0  │
└──────┘
emagers
  • 841
  • 7
  • 13
0

Here is the full solution working for me:

fn read_csv_into_df(path: PathBuf) -> Result<DataFrame> {
    let schema = Schema::from(vec![
        Field::new("class_label", Int64),
        Field::new("alcohol", Float64),
        Field::new("malic_acid", Float64),
        Field::new("ash", Float64),
        Field::new("alcalinity_of_ash", Float64),
        Field::new("magnesium", Float64),
        Field::new("total_phenols", Float64),
        Field::new("flavanoids", Float64),
        Field::new("nonflavanoid_phenols", Float64),
        Field::new("color_intensity", Float64),
        Field::new("hue", Float64),
        Field::new("od280/od315_of_diluted_wines", Float64),
        Field::new("proline", Float64),
    ]);
    CsvReader::from_path(path)?.has_header(false).with_schema(&schema).finish()
}

I had Use Field and types for each field to create a schema then use the schema in CsvReader to read the data.

DataPsycho
  • 958
  • 1
  • 8
  • 28