1

I have a csv file

1577,true,false,false,false,true

I tried to load the csv file with custom schema,

val customSchema = StructType(Array(
      StructField("id", StringType, nullable = false),
      StructField("flag1", BooleanType, nullable = false),
      StructField("flag2", BooleanType, nullable = false),
      StructField("flag3", BooleanType, nullable = false),
      StructField("flag4", BooleanType, nullable = false),
    StructField("flag6", BooleanType, nullable = false))

    )
    val df =
      spark.read.schema(customSchema).option("header","false").
     option("inferSchema","false").csv("mycsv.csv")

But nullable properly of schema is not changing as expected.

df.printSchema
root
 |-- id: string (nullable = true)
 |-- flag1: boolean (nullable = true)
 |-- flag2: boolean (nullable = true)
 |-- flag3: boolean (nullable = true)
 |-- flag4: boolean (nullable = true)
 |-- flag6: boolean (nullable = true)
John
  • 1,531
  • 6
  • 18
  • 30
  • i think you need to cast as well .https://stackoverflow.com/questions/40526208/about-how-to-create-a-custom-org-apache-spark-sql-types-structtype-schema-object – Indrajit Swain Apr 09 '18 at 07:41
  • Also see this one: https://stackoverflow.com/questions/39917075/pyspark-structfield-false-always-returns-nullable-true-instead-of – Shaido Apr 09 '18 at 08:18
  • thanks for the help. I got a workaround from here https://stackoverflow.com/questions/47443483/how-do-i-apply-schema-with-nullable-false-to-json-reading?rq=1 – John Apr 09 '18 at 09:12

2 Answers2

0

Please check the below urls for details

Spark DataFrame Schema Nullable Fields

How do I apply schema with nullable = false to json reading

Workaround

val rowDF = spark.read.textFile("mycsv.csv")
    val df= spark.read.schema(customSchema).csv(rowDF)
    df.printSchema()
John
  • 1,531
  • 6
  • 18
  • 30
0

// Create an RDD val rowRDD1 = spark.sparkContext.textFile("../yourfile.csv")

// The schema is encoded in a string val schemaString = "id flag1 flag2 flag3 flag4 flag5 flag6"

// Generate the schema based on the string of schema val fields = schemaString.split(" "). map(fieldName => StructField(fieldName, StringType, nullable = true))

val schema = StructType(fields)

// Convert records of the RDD (rowRDD1 ) to Rows val rowRDD = rowRDD. map(_.split(",")). map(attributes => Row(attributes(0), attributes(1),..,..))

// Apply the schema to the RDD val rowDF = spark.createDataFrame(rowRDD, schema)