1

I have a dataframe df with four columns id, ts, lat and lon. If I run df.schema() in debug mode, I get

 0 = {StructField@13126} "StructField(id,LongType,true)"
  name = "id"
  dataType = {LongType$@12993} "LongType"
  nullable = true
  metadata = {Metadata@13065} "{"encoding":"UTF-8"}"
 1 = {StructField@13127} "StructField(ts,LongType,true)"
  name = "timestamp"
  dataType = {LongType$@12993} "LongType"
  nullable = true
  metadata = {Metadata@13069} "{"encoding":"UTF-8"}"
 2 = {StructField@13128} "StructField(lat,DoubleType,true)"
  name = "position_lat"
  dataType = {DoubleType$@13034} "DoubleType"
  nullable = true
  metadata = {Metadata@13073} "{"encoding":"UTF-8"}"
 3 = {StructField@13129} "StructField(lon,DoubleType,true)"
  name = "position_lon"
  dataType = {DoubleType$@13034} "DoubleType"
  nullable = true
  metadata = {Metadata@13076} "{"encoding":"UTF-8"}"

Now, I want to get rid of all metadata,, i.e. "{"encoding":"ZSTD"}"shouold be replaced by "" for each column. Please note that my actual table has many columns, so the solution needs to be somewhat generic. Thank you in advance!

Corram
  • 233
  • 1
  • 3
  • 13

1 Answers1

0

You can use encode("XX","ignore")

Example :

  Val df=data.map(lambda x: x.encode("ascii", "ignore").
vaquar khan
  • 10,864
  • 5
  • 72
  • 96
  • Hi and thank you for your reply. I have some questions regarding this: 1) What is "data"? 2) Closing bracket missing? 3) Can you post this as Java code not Scala? – Corram Sep 09 '20 at 07:18