Unable to load multiple json files with pyspark

Question

I am fairly new to pyspark and am trying to load data from a folder which contains multiple json files.However the load fails. Here is the code that I am using:

spark = SparkSession.builder.master("local[1]") \
                .appName('SparkByExamples.com') \
                .getOrCreate()
spark.read.json('file_directory/*')

I am getting error as : Exception in thread "globPath-ForkJoinPool-1-worker-57" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

Error

I tried setting the path variables for hadoop and spark as well but still no use. However, if I load a single file from the directory, it loads perfectly. Can someone please tell me what is going wrong in this case.

Does this help? https://stackoverflow.com/questions/33710898/how-can-i-efficiently-read-multiple-json-files-into-a-dataframe-or-javardd — Sreeram TP, Mar 07 '22 at 05:49

score 0 · Answer 1 · answered Mar 07 '22 at 15:58

0

I can successfully read all CSV under a directory without adding the asterik. I think you should try

spark.read.json('file_directory/')

answered Mar 07 '22 at 15:58

digital_monk

78
5

I have tried that too, it doesn't seem to work for me. I am still getting unsatisfied Link error – Anushree Mahajan Mar 07 '22 at 20:54
As described by @Nemetaarion. The UnsatisfiedLinkError exception on Java native method was raised because of version incompatibility between the Hadoop version in SBT dependencies and winutils.exe (HDFS wrapper) on my Windows machine. Please refer to [this](https://stackoverflow.com/questions/61696990/unsatisfiedlinkerror-in-apache-spark-when-writing-parquet-to-aws-s3-using-stagin). – digital_monk Mar 08 '22 at 11:30
I had tried that too but still am getting the error – Anushree Mahajan Mar 10 '22 at 02:50

Unable to load multiple json files with pyspark

1 Answers1