I have data I need to clean for each row in a file, and I want to insert the cleaned data in a SQLite3 database. I'm using the RSQLite library which requires a dataframe. Here's the code I trying to get working:
# Select feature names for use as column names in X train/test loading
feature_names <- unlist(dbGetQuery(con, "select feature_name from features order by feature_id"), use.names = FALSE);
# Load X training data
X_train_lines <- readLines("data/train/X_train.txt"); # Space delimited with leading and trailing spaces
X_train_values <- vector("list", length(X_train_lines));
names(X_train_values) <- feature_names; # colnames or names?
for (index in 1:length(X_train_lines)) {
cleaned_line <- gsub("^ *|(?<= ) | *$", "", X_train_lines[index], perl=TRUE); # remove extraneous whitespaces
X_train_values[index] <- strsplit(cleaned_line, " "); # Wondering if X_train_values[index] is correct?
}
# Write features data to features table
dbWriteTable(con, "X_train", as.data.frame(X_train_values), row.names = FALSE);
While the code executes without incident, when I attempt to look at the database using DbVisualizer I get an error:
An error occurred while performing the operation: malformed database schema (X_train) - too many columns on X_train
My only guess is that rows and columns are somehow transposed. My column names should be the value of the feature_names
vector.
Also, if anyone has any suggestions for a better approach...
UPDATE
I tried doing a dput, though I had no idea what I was looking at. Here's the top of the summary:
head(summary(X_train_values))
Length Class Mode
tBodyAcc-mean()-X "561" "-none-" "character"
tBodyAcc-mean()-Y "561" "-none-" "character"
tBodyAcc-mean()-Z "561" "-none-" "character"
tBodyAcc-std()-X "561" "-none-" "character"
tBodyAcc-std()-Y "561" "-none-" "character"
tBodyAcc-std()-Z "561" "-none-" "character"
Again, this leads me to believe the data is all mixed up. It should have 561 columns, some of which are represented above as tBodyAcc-mean()-X etc. The column values should be floating-point numbers which I don't see above.
The table command did not work:
table(X_train_values)
Error in table(X_train_values) :
attempt to make a table with >= 2^31 elements
I should have 7,352 rows with 561 columns.
UPDATE 2
I believe my problem is that I'm trying to use a list like an array or arrays. In Ruby, for example, I could do something like this:
x_train_values = []
x_train_lines.each { |line| x_train_values << line.split(' ') }