0

A beginner's question.

I'm using R with dplyr to analyse large amounts of data but I don't have access to a server-based database. In addition, my computer's internal hard drive is too small for the databases that I need to create. I have been using monetdblite and RSQLite to store the data so far.

Q: How much does the speed of monetdblite/RSQLite decrease in case I save the databases on an external hard drive and connect that to the computer via usb? What factors determine how feasible this is?

Or is there a better alternative approach (still relying on dplyr's database connectivity) in my situation?

Maarölli
  • 375
  • 1
  • 3
  • 13
  • That would depend on what drive you have, and what the connection is. In terms of consumer hardware, USB 3.1 is pretty much as fast as an internal drive these days. – Hong Ooi Aug 10 '17 at 18:08

2 Answers2

1

Its really hard to tell whether the external drive is slower. For example, if the internal drive is a SSD and the external one a classical "spinning disk", a performance drop is more or less to be expected, especially when using complex queries. I suggest you simply try with a reasonably sized database and your queries on both disks. There are also various disk performance checking tools (e.g. XBench on OSX) that you could use to check performance. The interesting metrics to look for here are sequential scan speed and random access speed.

Hannes Mühleisen
  • 2,542
  • 11
  • 13
0

I use monetDBLite to load large datasets into Rstudio. For security reasons, I have an external SSD with USB 3.0, but my built-in hard drive is also an SSD. I've used it for a few months, and my experience is summarized in the following query:

SELECT * FROM drug_db WHERE atc='L02BX03' OR atc='L02BB04';

On built in: < 2 seconds,

On external: 6-7 minutes

The query scans through a ~15 Gb database and returns ~ 30 000 rows of 14 variables. In my experience, it's actually much quicker to copy the file to the built in drive, and run the queries there, compared to running the queries against the external SSD.