-1

I have below dictionary for keeping feature definitions as strings.

    features = {
  "journey_email_been_sent_flag": "F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0))",
  "journey_opened_flag": "F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))"
}
retrieved_features = {}
non_retrieved_features = {}

Or keeping it as definition itself.

    features = {
  "journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
  "journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}

Then below code for retrieving the feature definitions

 def feature_extract(*featurenames):
  for featurename in featurenames:
    if featurename in features:
      print(f"{featurename} : {features[featurename]}")
      retrieved_features[featurename] = features[featurename]
    else:
      print('failure')
      non_retrieved_features[featurename] = "Not Found in the feature defenition"
  return retrieved_features

And this is how I call the function for retrieving the features

feature_extract('journey_email_been_sent_flag','journey_opened_flag')

However its not working when I am trying to retrieve the future , i receive the below result when keeping the definition in dictionary

Out[19]: {'journey_email_been_sent_flag': Column<b'CASE WHEN (email_14days > 0) THEN 1 ELSE 0 END'>}

when i call the retrieval of feature as below in the dataframe.

.withColumn('journey_email_been_sent_flag', feature_extract('journey_email_been_sent_flag'))

getting below error

AssertionError: col should be Column
sbs
  • 43
  • 10
  • Please fix your indentation and describe *exactly* what doesn't work for you. As given, this is non-working code for which you want a code review -- not Stack Overflow. – Prune Nov 12 '20 at 20:16
  • Hello @Prune , did you get a chance to look at my issue. when i use column definiton in the features, I get the result when i retrieve it as Out[19]: {'journey_email_been_sent_flag': Column 0) THEN 1 ELSE 0 END'>} And when I call this in the place where I want to get the feature , I get below error. Any way to fix it. AssertionError: col should be Column .withColumn('journey_email_been_sent_flag', feature_extract('journey_email_been_sent_flag')) – sbs Nov 14 '20 at 16:48
  • I see , the question is having a negative vote due to which , no response are getting. can you advise @prune – sbs Nov 14 '20 at 16:53
  • @UninformedUser do you have any thoughts here – sbs Nov 14 '20 at 16:56
  • @mck do you have any thoughts here – sbs Nov 14 '20 at 16:57
  • You have confused correlation and cause: the downvote and lack of response are from a common cause: you have not written a clear question to Stack Overflow guidelines. Please provide the expected [MRE](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your problem. This also lets us test any suggestions in your context. – Prune Nov 14 '20 at 20:51
  • Also, please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). As the posting guidelines say, "Make it easy for others to help you." – Prune Nov 14 '20 at 20:52

1 Answers1

0

I could fix it by this way

I keep the feature definition as definitions

    features = {
  "journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
  "journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}

And call the feature_extract function using F.lit

F.lit(feature_extract('journey_email_been_sent_flag').get('journey_email_been_sent_flag'))
sbs
  • 43
  • 10