I have a rather small dataset (5~gb, 1.5 million rows), currently stored in Bigtable, and accessed through the HBase API (Scala), for the purpose of doing data analytics with Spark (dataproc).
However, I'm also on a tight budget, and the Bigtable cost is rather high (2~ USD/hour), so what I ended up doing is deleting and recreating the Bigtable cluster whenever I need it.
The obvious drawback is that it takes quite a while to populate a fresh cluster, due to the nature of my data. It's all stored in a single big textfile as JSON, and it takes 40~ minutes to populate the cluster.
So what I'm asking is whether there are better way to execute this, like implementing some kind of backup/snapshot routine? Or simply not using Bigtable at all. I couldn't find any other HDFS alternatives in the Google Cloud platform.
It should be noted that I'm rather fresh to the world of Hadoop and big data, so excuse my ignorance if I'm missing the obvious.