I have 16 GB dataset and want to use it in databricks. However, in community edition DBFS limit is 10 GB. May you please assist me to preprocess the data to be able to move it from driver to DBFS.
Asked
Active
Viewed 265 times
1 Answers
0
The simplest way for that is not to use DBFS (it's designed only for temporary data), but host data & results in your own environment, like, AWS S3 bucket or ADLS (could be a higher transfer costs).
If you can't use it, then solution depends on other factors - what is the input file format, like, is it compressed/uncompressed, etc.

Alex Ott
- 80,552
- 8
- 87
- 132
-
It is now in the driver node. It is json file and I have already unzipped it. But it doesn't appear in databricks-datasets. – Shihab Masri Apr 20 '22 at 17:23