1

I am trying to write a json file into the data storage system from Databricks by using

open('/dbfs/mypath/test.json', 'wb').write(files)

I am getting the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mypath/test.json'
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File <command-4437961952528903>:5
----> 1 open('/dbfs/mypath/test.json', 'wb').write(files)

I do not know what I am doing wrong, does any body knows if I can use this way to safe the binary file? Thanks in advance

I use the different commands

dbutils.fs.put('/myPath/test.json',files)

but this is for creating a text file as far as I know, I am not sure how to do it.

meuto
  • 11
  • 4
  • `No such file or directory`. Maybe start by reading the error message? – drum May 26 '23 at 02:40
  • Databricks should automatically create the file even though it doesn't exist as far as I read in the documentation. I am not an expert in databricks I just started a few days back. Thanks for your help – meuto May 26 '23 at 02:52

1 Answers1

0

The file paths typically specified using the Databricks File System protocol.

Can you please try this. I put overwrite=True to ensures that the file is overwritten if it already exists in parameter given

import json
from pyspark.sql import SparkSession

data = {
    'name': 'John Doe',
    'age': 30,
    'city': 'New York'
}

json_string = json.dumps(data)

spark = SparkSession.builder.getOrCreate()

dbutils.fs.put('/dbfs/mypath/test.json', json_string, overwrite=True)
  • the main issue that I have is the limitation on memory and I don't want to parse the json file. I am using 80 gb json files and I want to save the files and then use ijson to read the json files and do all the transformations necessary. Thank you for your help – meuto May 26 '23 at 03:00
  • can you verify that test.json exists in the '/dbfs/mypath/' directory? You can use the `%fs ls /dbfs/mypath/` command in a Databricks notebook cell to list the files and directories in that location. also make sure that you have the necessary permissions to access the path directory – Awoooooooooooooooooooo May 26 '23 at 05:27
  • If existed, then can you try to create a simple test file in different location? like `/dbfs/tmp/test.json` – Awoooooooooooooooooooo May 26 '23 at 05:28
  • I created a file by using the following instruction 'dbutils.fs.put("/dbfs/mypath/my_new_file", "This is a file on the local driver node.")' but I cannot use the command Open() I do not know why? I see that other post looks like people is able to use the open instruction I really appreciate your help. Thanks – meuto May 26 '23 at 11:16
  • I found the solution, Thanks for everything. https://stackoverflow.com/questions/73650245/open-file-on-dbfs-while-using-databricks-connect – meuto May 26 '23 at 11:32