So I'm running web scrapers from time to time. Sometimes a database is unavoidable, and I prefer sqlite because of its ease of use. However, I fear my latest project is putting strain on my SSD; take a look.
I have separate processes, each of them queries the database for a URL to scrape, does that, and then writes results and commits to disk. I want to, if possible, avoid unnecessary scraping, so I commit after each result so the other processes know not to scrape that one.
What I worry is that each commit is an actual write to disk. With something like a million queries, my SSD will age prematurely :/ So basically, two questions:
Does this really mean 1 commit = 1 physical disk write? Or is there buffering within SQLite, or Linux does something like that, or maybe the SSD controller itself that kind of keeps the copy of the database in RAM (or controller memory) and only infrequently syncs with the drive? The database itself is less than 20MB.
If it is how I fear, any tweaks I could do to sqlite3 or is a server-client database unavoidable? My understanding is that the database server would have exclusive access to the database, so it could just keep the working version in RAM and sync to disk regularly.
I'm sorry the question is more of a discussion :/