I'd like to import data from a large postgresql table. In order to save space, I'd like to automatically convert textual values to factors.
For instance, the dataset has many string variables such as (eg., "Male," "Female") and if these could be imported as factors, I'd be able to load the data set using a command like,
df <- dbGetQuery(con, "select id, gender from large.table"))
Instead of receiving rows like (#, "Male"), I want rows like (#, 0) so that I could save memory.
If you try the below query on a database of your choice and let the "gender" column equal a character column, you should see that the size of df_large is much larger.
df <- dbGetQuery(con, "select id, gender from large.table"))
df_large <- df$gender
print(object.size(df_large), units="Kb")
df_small <- factor(df$gender)
print(object.size(df_small), units="Kb")