2

initially I was reading a csv file(local) placed in all of the Nodes in my standalone cluster.

df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Everything was fine then

Now I installed HDFS and set the conf path in spark-env.sh of all the nodes

export HADOOP_CONF_DIR=/etc/hadoop/conf  ###to read/avoid core_site.xml  error

And tried to read same csv, which I intend to do some analysis and then will write to Hdfs path. But till now my csv is in LOCAL PATH.

#when tried 
df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Error:
 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
  pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://Myspark:9000/data/TRX_FILE/1000_trx.csv;'

My question is : why even it trying to read from a HDFS???. I have not even mention a HDFS path also.... whera as my intention or requirement is to read that csv from local...I am merely confused about the problem and solution. Is there anything I am doing wrong here???plz correct me also.

Kindly help me experts.. Thanks in Adv.

0 Answers0