spark read csv by default trying to read from Hdfs. How to read local csv files then? Pyspark

Asked Nov 19 '16 at 17:44

Active Nov 19 '16 at 17:44

Viewed 7,924 times

initially I was reading a csv file(local) placed in all of the Nodes in my standalone cluster.

df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Everything was fine then

Now I installed HDFS and set the conf path in spark-env.sh of all the nodes

export HADOOP_CONF_DIR=/etc/hadoop/conf  ###to read/avoid core_site.xml  error

And tried to read same csv, which I intend to do some analysis and then will write to Hdfs path. But till now my csv is in LOCAL PATH.

#when tried 
df = spark.read.csv('/data/TRX_FILE/1000_trx.csv',header=True)
#Error:
 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
  pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://Myspark:9000/data/TRX_FILE/1000_trx.csv;'

My question is : why even it trying to read from a HDFS???. I have not even mention a HDFS path also.... whera as my intention or requirement is to read that csv from local...I am merely confused about the problem and solution. Is there anything I am doing wrong here???plz correct me also.

Kindly help me experts.. Thanks in Adv.

asked Nov 19 '16 at 17:44

5

Use `file:///data/...` if you don't want to read from HDFS – OneCricketeer Nov 19 '16 at 17:49
@cricket_007- It worked, Thanks... And can u plz post that in answer so that i will accept that as answer. – Nov 19 '16 at 17:51

spark read csv by default trying to read from Hdfs. How to read local csv files then? Pyspark

0 Answers0