I am querying a CosmosDb collection, and, could able to print the results. When I try to store the results to a Spark DataFrame, it fails.
Referred this site as an example:
How to read data from Azure's CosmosDB in python
Followed the exact steps from above link. Additionally, trying the below
df = spark.createDataFrame(dataset)
This throws this error:
ValueError: Some of types cannot be determined after inferring
ValueError Traceback (most recent call last)
in ()
25 print (dataset)
26
---> 27 df = spark.createDataFrame(dataset)
28 df.show()
29/databricks/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema)
808 rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
809 else:
--> 810 rdd, schema = self._createFromLocal(map(prepare, data), schema)
811 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
812 jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json())/databricks/spark/python/pyspark/sql/session.py in _createFromLocal(self, data, schema)
440 write temp files.
441 """
--> 442 data, schema = self._wrap_data_schema(data, schema)
443 return self._sc.parallelize(data), schema
But, wanting this to save as a Spark DataFrame
any help would be much appreciated. thanks!!!>