I need to convert a csv file to apache arrow.
Here is the structure of my csv file (much more rows than that exerpt):
Date,Value,High,Low,Entry
1209920400,1413.50,1413.50,1412.75,1413.00
1209920580,1413.25,1414.00,1413.25,1413.75
1209921240,1413.75,1414.00,1413.25,1413.50
1209921300,1413.25,1413.25,1413.00,1413.00
1209921600,1413.25,1413.25,1412.75,1412.75
1209921780,1413.00,1413.00,1413.00,1413.00
1209921900,1413.00,1413.00,1412.75,1412.75
1209921960,1412.50,1412.50,1412.50,1412.50
1209922800,1412.75,1412.75,1412.75,1412.75
1209923100,1412.75,1413.50,1412.75,1413.25
1209923400,1412.75,1412.75,1412.50,1412.50
1209926940,1413.75,1414.00,1413.50,1413.50
1209930420,1413.75,1414.25,1413.75,1414.00
So far I produced this piece of code to infer the schema and create the arrow file:
use arrow::{
error::ArrowError,
csv::ReaderBuilder,
ipc::writer::FileWriter
};
use std::sync::Arc;
use std::{fs::File};
fn main() -> Result<(), ArrowError> {
let input = "my_data.csv";
let output = "my_data.arrow";
let delimiter: u8 = b',';
let max_read_records: Option<usize> = Some(100);
let has_header = true;
let schema = arrow_csv::reader::infer_schema_from_files(&[input.to_string()], delimiter, max_read_records, has_header).unwrap();
println!("{:?}", schema);
let file = File::open(input).unwrap();
let csv_reader = ReaderBuilder::new(Arc::new(schema)).build(file).unwrap();
let mut writer = FileWriter::try_new(File::create(output)?, csv_reader.schema().as_ref())?;
for batch in csv_reader {
match batch {
Ok(batch) => writer.write(&batch)?,
Err(error) => return Err(error),
}
}
let _ = writer.finish();
Ok(())
}
The code compiles, and then produces 2 outputs.
1- Prints the schema to console:
Schema {
fields:[
Field { name: "Date", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "Value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "High", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "Low", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "Entry", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }
],
metadata: {}
}
2- Prints an error to console:
Error: ParseError("Error while parsing value Date for column 0 at line 0")
First it feels to me that the inferred Schema is correct. But then I don't get the error. Why it can infer a correct Schema but then not be able to parse some value right away?
Whatever I try, I am not able to get rid of the error, and don't really get what's going wrong. I tried to reduce my CSV file to fewer and/or simpler schema, and the issue remains the same.