1

In a spark job, I am using

.withColumn("year", year(to_timestamp(lit(col("timestamp")))))

This code used to work. But now I get the error :

"cannot resolve 'CAST(`timestamp` AS TIMESTAMP)' due to data type mismatch: cannot cast struct<int:int,long:bigint> to timestamp;"

I looks like spark is reading my timestamp column as a struct<int:int,long:bigint> instead of a int

How can I prevent that ?

Context the initial data is in jsonline. I read it using AWS GLUE glueContext.create_dynamic_frame.from_catalog. In the GLUE catalog the timestamp column is typed int.

Hugo
  • 1,195
  • 2
  • 12
  • 36

2 Answers2

2

Finally I solved it this way :

GF_resolved = ResolveChoice.apply(
    frame=GF_raw,
    specs=[("timestamp", "cast:int")],
    transformation_ctx="resolve timestamp type",
)

ResolveChoice is method avaible on AWS Glue DynamicFrame

Hugo
  • 1,195
  • 2
  • 12
  • 36
0

The short answer is that you cannot prevent it if creating a dynamic frame from catalog because, as the name suggests, the schema is dynamic. See this SO for more information.

Alternative approach that is a little more compact is...

gf_resolved = gf_raw.resolveChoice(specs = [('timestamp','cast:int')])

Official documentation for the resolve choice class can be found here. AWS Resolve Choice

jpizzo
  • 143
  • 2
  • 7