0

I have converted a dataframe intro sqlite database using rsqlite package of R programming. First column of table sh3 is of type TEXT with 23 strings separated by separater ' '. I want to convert this column into 23 columns and drop it afterwards. Till now I have added 23 columns with data type TEXT and after that I have tried many things but failed.

library(DBI)
library(RSQLite)

# connecting to database 
con <- dbConnect(SQLite(), dbname = 'ShearTest3.sqlite')
# Type of data in fields
dbGetQuery(con,"PRAGMA table_xinfo(sh3)") 
# Adding 23 columns into table sh3 with TEXT format by using following code
dbExecute(con,"ALTER TABLE sh3 ADD COLUMN X1 TEXT")
dbExecute(con,"ALTER TABLE sh3 ADD COLUMN X2 TEXT")
dbExecute(con,"ALTER TABLE sh3 ADD COLUMN X3 TEXT")
dbExecute(con,"ALTER TABLE sh3 ADD COLUMN X4 TEXT")
...

# Things I have tried
# Rockable 29-11-2018 is the column name
dbGetQuery(con, "UPDATE sh3 
SET   X1=split_part(`Rockable 29-11-2018`, ' ',1),
      X2=split_part(`Rockable 29-11-2018`, ' ',2),
      X3=split_part(`Rockable 29-11-2018`, ' ',3),
      X4=split_part(`Rockable 29-11-2018`, ' ',4),
...


Current format


 dbGetQuery(con,"SELECT * FROM sh3 LIMIT 10 OFFSET 10000000")

+---------------------------------------------------------------------------------------------+---------+-------+-------+--------+-------+--------+-------+--------+------+--------+-------+--------+------+-------+-------+--------+-------+-------+--------+-------+-------+--------+-------+------+
|                                     Rockable29-11-2018                                      |   V1    |  X1   |  X2   |   X3   |  X4   |  X5    |  X6   |  X7    | X8   |   X9   |  X10  |  X11   | X12  |  X13  |  X14  |  X15   |  X16  |  X17  |  X18   |  X19  |  X20  |  X21   |  X22  |  X23 |
+---------------------------------------------------------------------------------------------+---------+-------+-------+--------+-------+--------+-------+--------+------+--------+-------+--------+------+-------+-------+--------+-------+-------+--------+-------+-------+--------+-------+------+
| a b c w d z a john f kennedy barack obama george bush washington d c n y police is in there | conf261 | <NA>  | <NA>  | <NA>   |  <NA> |  <NA>  | <NA>  | <NA>   | <NA> |  <NA>  | <NA>  | <NA>   | <NA> |  <NA> |  <NA> |  <NA>  | <NA>  | <NA>  |  <NA>  | <NA>  | <NA>  |  <NA>  | <NA>  | <NA> |
+---------------------------------------------------------------------------------------------+---------+-------+-------+--------+-------+--------+-------+--------+------+--------+-------+--------+------+-------+-------+--------+-------+-------+--------+-------+-------+--------+-------+------+

and Format I want


+---------+-----+------+--------+-------+-------+-------+--------+-------+------+----------+---------+--------+---------+-------+--------------+-------+-------+-------+-------+--------+-------+-------+--------+
|   V1    | X1  |  X2  |   X3   |  X4   |  X5   |  X6   |  X7    | X8    |  X9  |   X10    |  X11    |  X12   |   X13   |  X14  |     X15      |  X16  |  X17  |  X18  |  X19  |  X20   |  X21  |  X22  |   X23  |
+---------+-----+------+--------+-------+-------+-------+--------+-------+------+----------+---------+--------+---------+-------+--------------+-------+-------+-------+-------+--------+-------+-------+--------+
| conf261 | a   |  b   |  c     |  w    |  d    |  z    |  a     |  john |  f   |  kennedy |  barack |  obama |  george |  bush |  washington  | d     | c     | n     | y     | police |  is   |  in   |  there |
+---------+-----+------+--------+-------+-------+-------+--------+-------+------+----------+---------+--------+---------+-------+--------------+-------+-------+-------+-------+--------+-------+-------+--------+

Akina
  • 39,301
  • 5
  • 14
  • 25
shah nawaz
  • 15
  • 4
  • https://tidyr.tidyverse.org/reference/separate.html – Andrea M Jun 20 '22 at 08:18
  • 1
    database is too large of 12GB size so if I can get a sql solution it'd be better – shah nawaz Jun 20 '22 at 08:27
  • Is it possible to fix this on the initial import of the database? It seems as if you tried to import a space-delimited or fixed-width file with incorrect field-delimiter settings. – r2evans Jun 21 '22 at 10:58
  • I have tried it by using split_string_fixed on initial import but due to size of the file I can't do it. It is taking hours. – shah nawaz Jun 21 '22 at 13:25

1 Answers1

0

Fix the problem where you import the data, this is much better than trying to repair it later (even with larger data). SQL doesn't really have awesome string-splitting functions (like R, python, and most other languages), not like what you need here.

I think this can be imported more easily.

I'm going to start with the assumption that we have a space-delimited file with no column names. I'm creating a file named quux.csv with contents

a b c w d z a john f kennedy barack obama george bush washington d c n y police is in there
a b c w d z a john f kennedy barack obama george bush washington d c n y police is in there
a b c w d z a john f kennedy barack obama george bush washington d c n y police is in there
a b c w d z a john f kennedy barack obama george bush washington d c n y police is in there

Using sqlite3's .import command, there needs to be a row of column names in general, this is required to be able to determine column names. As in this example, when there are no column-names, we need to pre-define the table. Simple enough: since we know there are 23 columns, I'll use R with paste(paste0("v", 1:23, " text"), collapse=', ') to help draft:

sqlite> create table quux (v1 text, v2 text, v3 text, v4 text, v5 text, v6 text,
  v7 text, v8 text, v9 text, v10 text, v11 text, v12 text, v13 text, v14 text,
  v15 text, v16 text, v17 text, v18 text, v19 text, v20 text, v21 text,
  v22 text, v23 text);

(It's a bit of a sloppy table, certainly, but since SQLite is generally type-agnostic and flexible, this seemed a reasonable start. Feel free to adapt if you have better knowledge about each field, though SQLite is still flexible-enough that I don't know that you'll do much better in that architecture.)

From here, we just import the file. (Some of these .-modes are mostly for habit based on other work I do;

sqlite> .separator " "
sqlite> .import path/to/quux.data quux
sqlite> .mode table           -- purely for aesthetics here, not required in general
sqlite> select * from quux ;
+----+----+----+----+----+----+----+------+----+---------+--------+-------+--------+------+------------+-----+-----+-----+-----+--------+-----+-----+-------+
| v1 | v2 | v3 | v4 | v5 | v6 | v7 |  v8  | v9 |   v10   |  v11   |  v12  |  v13   | v14  |    v15     | v16 | v17 | v18 | v19 |  v20   | v21 | v22 |  v23  |
+----+----+----+----+----+----+----+------+----+---------+--------+-------+--------+------+------------+-----+-----+-----+-----+--------+-----+-----+-------+
| a  | b  | c  | w  | d  | z  | a  | john | f  | kennedy | barack | obama | george | bush | washington | d   | c   | n   | y   | police | is  | in  | there |
| a  | b  | c  | w  | d  | z  | a  | john | f  | kennedy | barack | obama | george | bush | washington | d   | c   | n   | y   | police | is  | in  | there |
| a  | b  | c  | w  | d  | z  | a  | john | f  | kennedy | barack | obama | george | bush | washington | d   | c   | n   | y   | police | is  | in  | there |
| a  | b  | c  | w  | d  | z  | a  | john | f  | kennedy | barack | obama | george | bush | washington | d   | c   | n   | y   | police | is  | in  | there |
+----+----+----+----+----+----+----+------+----+---------+--------+-------+--------+------+------------+-----+-----+-----+-----+--------+-----+-----+-------+
r2evans
  • 141,215
  • 6
  • 77
  • 149