0

I am trying to write a Spark DF to AWS Keyspaces. Randomly some of the records are getting updated and some of the records are throwing this exception

com.datastax.oss.driver.api.core.type.codec.CodecNotFoundException: Codec not found for requested operation: [INT <-> java.lang.String]
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.createCodec(CachingCodecRegistry.java:609)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:95)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry$1.load(DefaultCodecRegistry.java:92)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2276)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.get(LocalCache.java:3951)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:3973)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4957)
at com.datastax.oss.driver.shaded.guava.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4963)
at com.datastax.oss.driver.internal.core.type.codec.registry.DefaultCodecRegistry.getCachedCodec(DefaultCodecRegistry.java:117)
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.codecFor(CachingCodecRegistry.java:258)
at com.datastax.oss.driver.internal.core.data.ValuesHelper.encodePreparedValues(ValuesHelper.java:112)
at com.datastax.oss.driver.internal.core.cql.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:158)

My Keyspace table schema is

CREATE TABLE test_ks.test_table_ttl (
    consumer_id TEXT PRIMARY KEY,
    ttl_col map<text, frozen<tuple<text, text>>>
);

The codeblock which is throwing error is this.

val rowKey =   // some string
val mapKey =   // some string
val mapValue = mapValueTupleType.newValue(tuple_value)
val mapData = ImmutableMap.builder().put(mapKey, mapValue).build()
batch.addStatement(prep_statement.bind(mapData, rowKey)) // <--- error on this line
Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
shril
  • 143
  • 2
  • 13
  • 1
    are you sure that rowKey & other variables are of the string type? Error message is saying that some of them are integers – Alex Ott Feb 14 '22 at 13:37
  • Yes, I am sure that they are of string type. I can see that in the AWS console as well. Out of 100 records randomly 70-80 records are getting written and the rest of them are throwing errors. Had it not been of string type, all of them should have not worked. – shril Feb 14 '22 at 16:08

2 Answers2

2

Try converting to a RDD then writing. Instead of a nested collection try storing the data as json blob.

val myRdd = myDataframe.rdd

implicit val c = connectToKeyspaces

myRdd.saveToCassandra("aws_sample_keyspace","events_tables")
MikeJPR
  • 764
  • 3
  • 14
  • 1
    Amazon Keyspaces now supports the Spark Cassandra Connector https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-keyspaces-read-write-data-apache-spark/ – MikeJPR Apr 22 '22 at 21:16
0

Currently, AWS Keyspaces doesn't allow frozen type. This was a bug in Keyspaces that allowed the table to be created with frozen type, but it throws an exception during insertion.

The only possible way is to store data using JSON as suggested by @MikeJPR.

shril
  • 143
  • 2
  • 13