1

I have a directory with files that look like abc_00_00.csv , abc_001_00.csv , abc_002_00.csv, def_00_00.csv

I want only those file which matched user input, i am trying below method but it is not working in spark

new File("dbfs:/s3path").listFiles.filter(_.getName.startsWith("abc_*")).foreach(println)

error

java.lang.NullPointerException

is there any way in spark through which i can iterate over match file in foor loop ?

Code_rocks
  • 131
  • 12
  • Your directory does not exist. Also you don't want an asterisk in `startsWith`, just `startsWith("abc_")`. – Dima Apr 27 '21 at 10:26
  • sorry for confusion, i am not checking in local dir, i am searching in s3 path – Code_rocks Apr 27 '21 at 10:48
  • 1
    `File` will not work with s3. You need an aws client for that. Regardless, the reason you get the NPE is because directory does not exist: `.listFiles` returns null in this case . – Dima Apr 27 '21 at 10:55

1 Answers1

1

Your problem is that the folder may not exist, and also you have no element starting by abc_* , wildcards are not allowed here. So please try:

new File("C:/dir/").listFiles.filter(_.getName.startsWith("abc_")).foreach(println)

If any problem is found, that means your directory does not exists, it may be because of the lowercase c: , try with it in uppercase and check if dir exists

In order to be sure, i'll consider checking if the directory exists as follows:

val directory = new File("C:/dir/")

if (directory.exists && directory.isDirectory) {
   directory.listFiles.filter(_.getName.startsWith("abc_")).foreach(println)
}
SCouto
  • 7,808
  • 5
  • 32
  • 49