0

I am new in Spark Framework and need some advice.

I have such structure of folders.

reports
 - 20180101
 - 20180102
 - 20180103
   - GHWEI.csv
   - DSFSD.csv
   - GHWEI.csv

Reports (csv files) are stored each day separately. For example 20180103 folder collect all reports of third january of 2018.

Before read of csv files I need to check availability of the path. How to make it?

val reports = spark.read.option("delimiter", "|")
              .csv("/reports/{20180101,20180102,20180103,}/*GHWEI*")
              .orderBy("CREATE_DATE")

Right now if there are no folder with name 20180101, 20180102, 20180103 Spark raise error which say that no such path. Code works only if one of these folders are available.

The second question is how to check is reports value is empty or not after read?

Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
  • 1
    Possible duplicate of [How to check if path or file exist in Scala](https://stackoverflow.com/questions/21177107/how-to-check-if-path-or-file-exist-in-scala) – Pavel Jan 17 '19 at 11:24
  • Looks like a duplicate - first part of the question – Pavel Jan 17 '19 at 11:25
  • 1
    That solutions check path in current server and don't use Spark for check. – Nurzhan Nogerbek Jan 17 '19 at 11:30
  • 1
    you don't have to use spark for these checks, this could be done in the different way, depending what will the source of data – Pavel Jan 17 '19 at 11:59
  • Are these local filesystem paths, or HDFS paths? – DNA Jan 17 '19 at 12:04
  • It's HDFS paths, guys. – Nurzhan Nogerbek Jan 17 '19 at 12:08
  • @NurzhanNogerbek What you want to do if path doesn't exists, Please specify that in question. If You simple want to skip the Error then you can perform read operation inside `scala.util.Try`. Also you can check `reports.count()` to check if there is any rows available in Dataframe. – vindev Jan 17 '19 at 12:22

1 Answers1

0

I think it is possible to check the file with the Hadoop FileSystem java sdk that can be used on a Scala program.

This is the whole documentation: https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html

I link you to an answer that can adapt to your case: https://stackoverflow.com/a/30408153/10623105

Note: to clarify, Hadoop does not work with the folder. The concept of a folder does not exist on the Hadoop ecosystem. It is only the key and value file system where the key is the entire path of the file and the value is the file.

gccodec
  • 343
  • 1
  • 8