1

I have a dataset A with schema A , also dataset B with Schema B. Both datasets A and B are mostly similar(have same columns ,but data types are different for only few), but have minor differences.One example being a column in dataset A has date value('2020-08-03' represented as string data type), the same column in dataset B is represented as an epoch number(long). Now i have to merge these two data sets.If i have to merge i have to use same data types in both the datasets.

Could you please suggest how do i this ? is this possible ?

chaithanya
  • 11
  • 2

1 Answers1

2

You have to use sql functions to change column types. For example you can convert your string date to unix timestamp:

df.withColumn("date", unix_timestamp("date", "yyyy-MM-dd"))

Then you can use union with both dataframes.

Shadowtrooper
  • 1,372
  • 15
  • 28