Getting java.lang.NumberformatException on a dataframe created by spark-csv

Question

When I am reading the CSV file with spark-csv, inferschema=true, I am able to get the count on dataframe (df.count).

But after when I removed spaces in column names and created a new schema and created new dataframe with the help of the first dataframe RDD, I am getting

NumberfromatException: null while doing count (udpateddf.count).

java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:542)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:241)
    at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)
    at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128)
    at org.apache.spark.sql.execution.datasources

Do you have all the column names in your new dataframe? Looks like it isn't the case. Can you post the schema of the dataframe using `df.printschema` — tourist, Jul 03 '18 at 20:22

Getting java.lang.NumberformatException on a dataframe created by spark-csv

0 Answers0