initially I was reading a csv file(local) placed in all of the Nodes in my standalone cluster.
df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Everything was fine then
Now I installed HDFS and set the conf path in spark-env.sh of all the nodes
export HADOOP_CONF_DIR=/etc/hadoop/conf ###to read/avoid core_site.xml error
And tried to read same csv, which I intend to do some analysis and then will write to Hdfs path. But till now my csv is in LOCAL PATH.
#when tried
df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Error:
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://Myspark:9000/data/TRX_FILE/1000_trx.csv;'
My question is : why even it trying to read from a HDFS???. I have not even mention a HDFS path also.... whera as my intention or requirement is to read that csv from local...I am merely confused about the problem and solution. Is there anything I am doing wrong here???plz correct me also.
Kindly help me experts.. Thanks in Adv.