-1

I want to list all the buckets from cloud storage which matches gs://bucketname*. I have tried using gsutil which is working but the same is not working from spark read or readstream.

gs://bucket1 gs://bucket2 gs://bucketN

working: gsutil ls gs://bucket*/mydir/abcd*.txt

not working: sc.textFile("gs://bucket*/mydir/abcd*.txt")

Michael Heil
  • 16,250
  • 3
  • 42
  • 77

1 Answers1

0

gsutil implements wildcarding by performing bucket listing and object listing (with optional prefix) queries at the server and then filtering the results per the wildcard client-side. Since spark doesn't support that same functionality, you will have to list the buckets and objects and do the filtering yourself.

Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36