When you use arrow::open_dataset()
you can manually define a schema which determines the column names and types. I've pasted an example below, which shows the default behaviour of auto-detecting column names types first, and then using a schema to override this and specify your own column names and types. The example here does this programmatically as requested but you can define a schema by hand too.
library(arrow)
write_dataset(mtcars, "mtcars")
# opens the dataset with column detection
dataset <- open_dataset("mtcars")
dataset
#> FileSystemDataset with 1 Parquet file
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#>
#> See $metadata for additional Schema metadata
# define new schema automatically
chosen_schema <- schema(
purrr::map(names(dataset), ~Field$create(name = .x, type = string()))
)
# now opens the dataset with the chosen schema
open_dataset("mtcars", schema = chosen_schema)
#> FileSystemDataset with 1 Parquet file
#> mpg: string
#> cyl: string
#> disp: string
#> hp: string
#> drat: string
#> wt: string
#> qsec: string
#> vs: string
#> am: string
#> gear: string
#> carb: string