0

I have a dataframe var cache :DataFrame = _. As an initial run i have given, cache = existingDF, the existingdf is read from an excel using crealytics.spark.excel. but in the subsequent run, the existingDF will get another updated excel file, it should be cache = cache.union(existingDF) But I seem to get only existingDF inside cache. In short whenever i call cache it seems to read the excel. How do i avoid this? This issue is not there while reading it as csv. (It was there when i used .persist on the csv read, but got fixed when i removed .persist More Simply:

var a = _
while(true){
    val b = spark.read.format("com.crealytics.spark.excel")...
    if (Option(a).isEmpty){
      a = b
    }
    else if a!=b
      a = b.union(a)
}

The variable a is always getting updated along with b, so it never becomes different from b. How do I avoid this?

ss301
  • 514
  • 9
  • 22
  • please post runnable code, as it stands now, the question is unclear (especially the title) – Raphael Roth Sep 14 '20 at 19:29
  • As @RaphaelRoth says, your question doesn't have enough information for us to help you. – Nick Sep 14 '20 at 21:10
  • Hope this edit helps – ss301 Sep 15 '20 at 04:40
  • Are you iterating through a list of files? I'm still not clear what you are doing, if you clarify I'll post a functional answer (you almost certainly shouldn't be using a var or a while loop) – Nick Sep 15 '20 at 20:42

0 Answers0