I'm using pyspark's create_map
function to create a list of key:value
pairs. My problem is that when I introduce key value pairs with string value, the key value pairs with float values are all converted to string!
Does anyone know how to avoid this happening?
To reproduce my problem:
import pandas as pd
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("test").getOrCreate()
test_df = spark.createDataFrame(
pd.DataFrame(
{
"key": ["a", "a", "a"],
"name": ["sam", "sam", "sam"],
"cola": [10.1, 10.2, 10.3],
"colb": [10.2, 12.1, 12.1],
}
)
)
test_df.withColumn("test", F.create_map(
F.lit("a"), F.col("cola").cast("float"),
F.lit("b"), F.col("colb").cast("float"),
F.lit("key"), F.lit("default"),
F.lit("name"), F.lit("ext"),
)).show()
If you observe inside the mapping created... the values for cola
and colb
are strings, not floats!