4

I have a Node.js application that heavily utilizes an SQLite database and I am worried about the impact it may have on my SSD.

My application periodically updates information about all clients that are connected to a certain server. I would rather not going into any specifics regarding the clients and/or server here.

Every 5 seconds, it performs an INSERT OR REPLACE for each connected client. New clients are inserted and existing clients have their data replaced with more up-to-date data. A single row contains roughly 10kB worth of data and there can be up to 5,000 clients connected (= 5,000 rows inserted / replaced) at a time.

In Task Manager, I can see ~3MB/s disk usage in idle state and ~6MB/s when the update happens.

Should I be concerned about wearing down my SSD? If so, how can I minimize the wear?

Options like renting a server or building a redundancy RAID are not really available to me, so I am looking for a software solution. I have already tried setting journal mode to WAL and temp store to memory, but the temp store always reverts to default and the WAL journal mode does not seem to have much of an effect in this regard.

I already use transactions, but I am afraid it may not be enough to solve the problem. Another obvious solution would have been to decrease the frequency, however this is not really an option for me as I need a database with as up-to-date data as possible.

I know there is the option for an in-memory database using :memory:, but I have to store the client data across sessions. I need to load the existing client data from disk when the application initializes and I also need to periodically save the current state of the database to disk to avoid data loss in case of a power outage, an unexpected Windows update or something like that. Given how unlikely these events are to occur, it would be okay to do these backups to disk once per hour for example.

Is there any way to keep the database in memory for the most part and only synchronize it with a version on a physical disk during initialization and then every hour or so?

natiiix
  • 1,005
  • 1
  • 13
  • 21
  • I'd imagine if you're using a persistent database, you want that being flushed to disk as fast as possible, yeah? You can run SQLite in-memory, but doesn't that defeat the whole purpose of what you want to do? Really, it sounds as if though you should use something more than SQLite that can coordinate disk writes among all your connections. – Brad Apr 16 '18 at 23:14
  • @Brad - It's a persistent database, but there are no critical data (passwords, etc.). It's mainly used to track when was the last time a client was online, some information about their connection and such. I use this data to generate various statistics, but none of it is vital to the server's function. If an hour of data is lost, it's not really a major problem. Also, as I mentioned, the database is accessed exclusively from the Node.js application, so the on-disk database should only really be used as a backup and for the persistency. – natiiix Apr 16 '18 at 23:35
  • Some SSDs have diagnostic software available that will tell you how much of their wear lifetime is remaining. I think your best first thing to do is to see where you stand after however much you've run so far. If you're at 98% remaining and you've already run awhile, then you probably don't have a problem. If you're at 50%, then you may indeed have an issue. If you're somewhere in between, then you probably need to do some calculations to get an idea how long it will last. – jfriend00 Apr 17 '18 at 02:05
  • I'd be surprised if you really need to write to the database every 5 seconds for every connected client, but you don't go into enough specifics for us to know exactly what to suggest. For example, you could maintain data in a cache and actually write it out to the database every 5 minutes instead of every 5 seconds. That would be a 60x reduction in those types of writes. – jfriend00 Apr 17 '18 at 02:09
  • "Disk usage" is mostly reads. How much data is written per minute? – CL. Apr 17 '18 at 08:05
  • @jfriend00 - Any particular freeware suggestions regarding the SSD diagnostic software? S.M.A.R.T. in SpeedFan gives me 97% fitness and 100% performance, but I have no idea if that information is worth anything. The reason I write to the database so frequently is because I want to be able to access the most up-to-date data at any given time. Without it I wouldn't be able to generate real-time statistics as easily as I can now (using just `SELECT` statements). – natiiix Apr 17 '18 at 15:29
  • @CL. - In the case of my application the written bytes seem to be ~160% of read bytes. The write speed is ~2.3MB/s on average. I base this information on "I/O read bytes" and "I/O write bytes" columns in Task Manager. There are no `SELECT`s from the database in the current state. The only thing happening is the periodic `INSERT OR REPLACE`. – natiiix Apr 17 '18 at 15:40
  • I don't have a particular software recommendation. You should start with what the manufacturer of the SSD offers because that might be able to better interpret the SMART data from their own drive. I use Samsung SSDs and use Samsung software to look at the data from them. – jfriend00 Apr 17 '18 at 21:27
  • @jfriend00 - The concerned SSD is Samsung SSD 850 EVO 500GB. The only Samsung tool I was able to find is Samsung Magician, but all I got out of it was that the drive condition is "Good", which could be interpreted in a number of different ways. – natiiix Apr 17 '18 at 21:52
  • I don't know what software would give you a good approximation of remaining life. [This article](https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/2) reports that the Samsung 840 Pro 256GB got to 600TB written before some reallocated sectors started showing up and functioned all the way up to 2.5PB. You can see the total data written to the drive using CrystalDiskInfo. It is odd that Samsung Magician doesn't show useful data. – jfriend00 Apr 17 '18 at 22:32
  • @jfriend00 - Samsung Magician tells you how much has been written to the disk as well, but I was not sure how useful that information really was to me. I am afraid that the database file is being rewritten without ever being moved, which would mean that the sectors allocated to this file would be used significantly more than any others and they could malfunction even though the overall "bytes written" value would still remain relatively low. – natiiix Apr 17 '18 at 22:52
  • SSDs all have very carefully written firmware for "wear leveling". When you write to a sector in the file, the entire sector is rewritten to a new part of the SSD. The whole concept of "wear leveling" is to even out the usage so no single part of the SSD wears out way sooner than the rest. Because of the way this is done, total bytes written is a decent proxy for overall wear (when taking into account the size of the SSD). That's why the data written numbers in my previous comment should be directly relevant to your situation. – jfriend00 Apr 17 '18 at 23:00
  • @jfriend00 - I always thought this feature only applied to newly written data, not to data that are just being rewritten. If the mechanism is as good as you say, then it should be relatively safe to write to an SSD the way my application does. – natiiix Apr 17 '18 at 23:20
  • Rewritten data is definitely moved to promote wear leveling. The precise algorithm and how good it is at keeping the wear pretty level across all the sectors will vary some by manufacturer and probably by exactly how you use the SSD, but they've worked very hard at this. I have a OS system SSD that is 6 years old and I see that Windows is always busy writing something to it so I got worried about it and I find that I still have 97% of its life left. It has a specific SMART value that reports that value, my Samsung SSD does not seem to have that SMART value. – jfriend00 Apr 17 '18 at 23:25
  • @jfriend00 - That is great then. I cannot see any value indicating the wear in the SMART of my SSD either, but SpeedFan somehow calculates the disk's "fitness", so I will assume it does not just show me a random number and that it is actually backed by something. Thanks for your input, I had no idea this wear-protection was so good, so I was getting worried. – natiiix Apr 18 '18 at 00:00

1 Answers1

3

Changed data is flushed to disk only when a transaction commits, so the most important thing is to put as many statements as possible into a single transaction. (You say you are already doing this.)
If you want to keep the changed data in memory and write to the disk later, just commit later.

When using WAL mode, changed pages are simply appended to the -wal file (instead of written at a random position in the main DB file). This is easier on the SSD (although you will notice much of a speed difference), so you should enable WAL mode.

To prevent the -wal file from becoming too big, SQLite sometimes moves the latest version of each page back into the main DB file (this is called a checkpoint). This happens automatically when a certain number of pages has been reached; to reduce these writes, increase the auto-checkpoint interval to as large a value as possible (depending on the free space you have), or disable it altogether and do checkpoints manually.

The temp_store setting has no effect unless the DB actually needs to write to a temporary file.

If each step ends up modifying most or all rows in the table, you might be able to reduce overhead by using a larger page_size; try 64 KB. (But if you are changing only a fraction of the rows, a larger page size will just increase the database's write amplification.)


The 500 GB 850 EVO guarantees an endurance of 150 TBW (terabytes written). At an average 2.3 MB/s, this corresponds to over two years.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • So, WAL is my best bet here? Would having a ~100GB WAL file improve the situation? Also, what happens to the WAL file on a checkpoint? When I tried using the WAL mode, the file seemed to always stay there until I closed the database connection with its change only ever growing, never shrinking. Is that the expected behavior? --- Is there no way to load a database from file into memory and only synchronize it to the file when manually requested? (by "manually" I mean from `setInterval()` or something like that) – natiiix Apr 17 '18 at 16:04
  • Yeah, I know it does, but I thought the WAL file would shrink when a checkpoint is done. Also, committing later might actually solve the problem to a certain degree. On the other hand, it will make reading from the database in real time impossible, so it's not exactly a perfect solution. Two years are a very short time, but at least I know I indeed must do something about it. – natiiix Apr 18 '18 at 14:30