-1

This line of code is not working the way I thought it would:

val df2 = df1
  .withColumn("email_age", when('age_of_email <= 60, 1))
  .withColumn("email_age", when('age_of_email <= 120, 2))
  .withColumn("email_age", when('age_of_email <= 180, 3).otherwise(4))

I have thousands of lines in df1 with age_of_email that are less than 60 and/or less than 120, but all my lines are getting categorized as 3 or 4:

Any insight into why this is happening?

Aaron Hellman
  • 1,741
  • 2
  • 14
  • 16
  • How are we supposed to answer your question without knowing what library you're using or anything... – Falmarri Oct 24 '16 at 22:39
  • Is this what you are looking for? : `import org.apache.spark.sql._` , `import org.apache.spark.ml._` – Aaron Hellman Oct 24 '16 at 22:43
  • 1
    Why would you assume that we knew that you were using spark? – Falmarri Oct 24 '16 at 22:51
  • 2
    I've never used spark, but it looks like you can't use `.withColumn` on the same column more than once. Your last call is overwriting your previous ones. http://stackoverflow.com/questions/34908448/spark-add-column-to-dataframe-conditionally – Falmarri Oct 24 '16 at 22:56
  • 4
    You are using the same column name in all the three withColumn methods. You should use a different name on each of that method. And age <= 60 or <= 120 is <= 180 too :-) This is why you see all the groups as 3 (because the column name is same). – Jegan Oct 25 '16 at 00:26

1 Answers1

2

As people have said in the comments, using withColumn with a column name that is already in the dataframe will replace that column.

I think for what you want to achieve you might either use different column names for each categorization or simply concatenate the when() in a single column like

val df2 = df1.withColumn("email_age", when('age_of_email <= 60, 1)
                                     .when('age_of_email <= 120, 2)
                                     .when('age_of_email <= 180, 3)
                                     .otherwise(4))

I guess you're aware that the categories are subsets of category 3

LiMuBei
  • 2,868
  • 22
  • 27