0

I am reading a directory for 600,000 images and wanting to store these images in a Sqlite DB. The DB structure is simply ID, IMAGE (blob).

I'm not proficient in C++ so am figuring this out.

First I open the DB file and setup the prepare statement etc

int rc = sqlite3_open_v2(filename, &db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL);

char* errorMessage;
sqlite3_exec(db, "PRAGMA synchronous=OFF", NULL, NULL, &errorMessage);
sqlite3_exec(db, "PRAGMA count_changes=OFF", NULL, NULL, &errorMessage);
sqlite3_exec(db, "PRAGMA journal_mode=MEMORY", NULL, NULL, &errorMessage);
sqlite3_exec(db, "PRAGMA temp_store=MEMORY", NULL, NULL, &errorMessage);
sqlite3_exec(db, "PRAGMA cache_size=1", NULL, NULL, &errorMessage);

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &errorMessage);

char const *szSQL = "INSERT INTO images (image) VALUES (?);";
int rc = sqlite3_prepare_v2(db, szSQL, -1, &stmt, NULL);
if( rc != SQLITE_OK ) {
  printf("PREPARE FAILED. EXITING\n");
  exit(0);
}

There is then a while loop iterating over all the files in the directory. To read the image into a char* (for the blob) I use the following code:

char text[10] = {'\0'};
struct stat s;
int status = stat(fullimagepath.c_str(), &s);
int fd = open(fullimagepath.c_str(), O_RDONLY);
if (fd == -1)
{
    perror("Error opening file for reading");
    exit(1);
}

char *fileContent = (char *)mmap(NULL, s.st_size, PROT_READ, MAP_SHARED, fd, 0);
close(fd);

The code below is run on each iteration as well where it binds the blob and attempts the insert

int retVal = sqlite3_bind_blob(stmt, 1, fileContent, s.st_size, NULL);
if (retVal != SQLITE_OK) {
  fprintf(stderr, "ERROR %s %s\n", fullimagepath.c_str(), sqlite3_errmsg(db));
  sqlite3_finalize(stmt);
  sqlite3_close(db);
  exit(1);
}

retVal = sqlite3_step(stmt);
if (retVal != SQLITE_DONE && retVal != SQLITE_ROW) {
      sqlite3_finalize(stmt);
      sqlite3_close(db);
      fprintf(stderr, "ERROR %d\n", retVal);
      exit(1);
}
fprintf(stderr, "1f2 %s\n", fullimagepath.c_str());


sqlite3_reset(stmt);
sqlite3_clear_bindings(stmt);

Where fullimagepath is the full path to the image each time (the above is in a while loop iterating through the files in the directory).

Once the full directory is done we run the commit and close the DB.

sqlite3_exec(db, "COMMIT", NULL, NULL, &errorMessage);
sqlite3_finalize(stmt);
sqlite3_close(db);

The issue I am having is that after running for a number of the files it stops with a segmentation fault.

As far as I can tell through various trials where I changed the blob to text and other tests, this only happens when trying to insert the blob.

Can anyone point me in the right direction to fix this please?

When running on a smaller directory with less than 10,000 images it works fine. Or if you need more information please advise.

Thanks.

Mudders
  • 127
  • 2
  • 12
  • 1
    Where do you check for fileContent != nullptr, where do you munmap, why do you use MAP_SHARED, do you use multiple processes or threads? – Surt Apr 13 '20 at 11:25
  • Not sure of the answers to those questions - as I said I am quite new to C++. I used this link for a recursive and fast way of reading the file list in a directory - https://github.com/ChristopherSchultz/fast-file-count/blob/master/dircnt.c If you would like the full code, please let me know. Thanks – Mudders Apr 13 '20 at 11:37
  • Why on Earth would you want to store 600,000 images in sqlite? You will just end up with an enormous, slow, difficult-to-backup file containing images you can't readily see or edit without writing code. Why not keep the filenames in the database and the files in the filesystem which is optimised for storing files. Just trying to understand. – Mark Setchell Apr 13 '20 at 11:44
  • We're in a situation with around 15TB of 50kb images growing at a rate of around 2million images per day. If we ever want to move the images to another server it takes months to copy that number of files. If they are all stored in a single Sqlite db file it would make managing them a lot simpler. – Mudders Apr 13 '20 at 11:46
  • If you just want a single file to move, take the partition containing the filesystem, or switch the iSCSI device to a new host in seconds, or unplug the disk and move it. – Mark Setchell Apr 13 '20 at 11:53
  • None of these are options for us at the moment for various reasons. We're attempting to come up with a solution within certain constraints. Thanks. – Mudders Apr 13 '20 at 11:56

2 Answers2

0

I would suspect you run our of memory so check for it.

char *fileContent = (char *)mmap(NULL, s.st_size, PROT_READ, MAP_SHARED | MAP_POPULATE , fd, 0);
if (!fileContent) {
    perror("Error mapping file");
    exit(1);
}
close(fd);
Surt
  • 15,501
  • 3
  • 23
  • 39
  • I added the above with the same results. If I comment out all the sqlite code, and run it, so it is still opening every file in the directory and reading it into fileContent, it works great and runs through all the files in a few seconds. If I change the sqlite binding to be text instead of a blob, so it inserts the text file name instead of the fileContents, it works in a few seconds as well. It seems the issue is only when trying to bind to a blob. – Mudders Apr 13 '20 at 11:51
  • @mudders In a few seconds? So the files are not actually read into the mmap? or can you read 1TB/s, try with MAP_POPULATE. – Surt Apr 13 '20 at 12:39
  • I run the code and attempt to print out the fileContents (it appears to be binary format) so I am assuming it is reading the files. 617k files (11GB) and it reads them in around 4 seconds. Is that unrealistic? I'll look into MAP_POPULATE. It also does read the correct file size for each file, as I checked that. Thanks – Mudders Apr 13 '20 at 12:42
  • @Mudders its theoretically possible, my disk is rated at something like 2200 MB/s so would take 5s to load it. – Surt Apr 13 '20 at 12:48
0

I finally figured out the issue after @Surt in the very first comment mentiond "where do you munmap". I didn't know what this meant, and after much searching in other places noticed a comment mentioning this. A little more playing and realised this is quite crucial. Placed the following

status = munmap(fileContent, s.st_size);
sqlite3_reset(stmt);

Thanks for those who commented for the pointers in the right direction. Much appreciated.

Mudders
  • 127
  • 2
  • 12