2

was just wondering whether anyone had any thoughts on best practices when working in databricks. It is financially costing a lot to develop within databricks, hence would like to know where else it would be best to develop python code in. With thought also to collaborative work, is there a similar set up to databricks for collaborative work that is free or of little cost to use.

Any suggestions, greatly appreciated!

jpaisley
  • 31
  • 5
  • Work faster? No seriously, this is really hard to answer as we do not know what kind of workload you are talking about and what your definition of *costing a lot* is. I dare say it also depends on the value using databricks brings you. – Peter Bons Feb 26 '20 at 14:49
  • Apologies, by costing a lot I mean the literal expense in $ of using the databricks service. I would not say we were using large amounts of data thus perhaps databricks is not the best system. Would you happen to know another collaborative workspace that could be more cost effective than databricks? – jpaisley Feb 26 '20 at 16:20

1 Answers1

0

The cost of Databricks is really related to the size of the clusters you are running (1 worker, 1 driver or 1 driver 32 workers?), the spec of the machines in the cluster (low RAM and CPU or high RAM and CPU), and how long you leave them running (always running or short time to live, aka "Terminate after x minutes of inactivity". I am also assuming you are not running the always on High Concurrency cluster mode.

Some general recommendations would be:

  • work with smaller datasets in dev, eg representative samples which would enable you to...
  • work with smaller clusters in dev, eg instead of working with large 32 node clusters, work with 2 node small clusters
  • set time to live as short eg 15 mins
  • which together would reduce your cost

Obviously there is a trade-off in assembling representative samples and making sure your outputs are still accurate and useful but that's up to you.

wBob
  • 13,710
  • 3
  • 20
  • 37
  • Thank you, will certainly take your advice into consideration. Do need to reduce the termination time. – jpaisley Feb 26 '20 at 16:21