Cross dataframe with dictionary

Question

I have the following dictionaries inside variables:

sk_channel_types = {"facebooknotification": 2,
                    "facebookmessenger": 9,
                    "onsitenotification": 3,
                    "pushnotification": 6,
                    "pushnotificationmessage": 6,
                    "lightbox": 4,
                    "onsitemessage": 7,
                    "mailmessage": 1}

sk_story_types = {"welcome": 7,
                  "rescue": 13,
                  "frequency": 4,
                  "abandoncart": 6,
                  "pricedrop": 16,
                  "manual": 5,
                  "searchbykeyword": 30,
                  "sazonality": 31,
                  "bestdayforpurchase": 28,
                  "pricechange": 32,
                  "availability": 33,
                  "toptrending": 1,
                  "toptrendingbycluster": 2,
                  "toptrendingwithpricelimit": 3,
                  "frequencyview": 4,
                  "manualnotification": 5,
                  "trending": 9,
                  "toptrendingbykeyword": 9}

And this is my current spark dataframe:

ID	StoryType	Type	StoryId
abcdefghijklmnopqrst	AbandonCart	MailMessage	56465465456456456465
lçdkçlskdçlsdkçlskdç	ManualNotification	MailMessage	60983099380938390833
uahuahuahauhauahuaha	ManualNotification	MailMessage	49438093890484984949
sklçskçlskdkcnopeieo	ManualNotification	MailMessage	93084098409840984098
2d5fe941380938098948	ManualNotification	MailMessage	49809380398094894844
9883jkjd3eu0dj0j3930	ManualNotification	MailMessage	636f50c9380938093893

I need to replace the StoryType and Type columns with their respective numbers, as per the variables, like this:

ID	StoryType	Type	StoryId
abcdefghijklmnopqrst	6	1	56465465456456456465
lçdkçlskdçlsdkçlskdç	5	1	60983099380938390833
uahuahuahauhauahuaha	5	1	49438093890484984949
sklçskçlskdkcnopeieo	5	1	93084098409840984098
2d5fe941380938098948	5	1	49809380398094894844
9883jkjd3eu0dj0j3930	5	1	636f50c9380938093893

How can I do this? Can I use a case with low? I'm new to Pyspark.

If you invert those dictionaries (that is, `{2: "facebooknotification", 4: "lightbox",...}`), then it's trivial to use `apply` to lookup the new values. — Tim Roberts, Feb 22 '22 at 19:51
Have a look here: https://stackoverflow.com/questions/42980704/pyspark-create-new-column-with-mapping-from-a-dict — ScootCork, Feb 22 '22 at 20:00
USE `from itertools import chain` `from pyspark.sql.functions import create_map, lit` `m_expr1 = create_map([lit(x) for x in chain(*sk_channel_types.items())])` `m_expr2 = create_map([lit(x) for x in chain(*sk_story_types.items())])` `df = df.withColumn('Type', m_expr1[lower(df['Type'])]).withColumn('StoryType', m_expr2[lower(df['StoryType'])])` — wwnde, Feb 22 '22 at 22:04

score 1 · Accepted Answer · answered Feb 23 '22 at 00:15

1

Since the dictionaries are small the efficient way is to make them broadcasted dataset and join them to the dataset.

answered Feb 23 '22 at 00:15

Hossein Torabi

694
1
7
18

1

Perfect, I achieved :D – Caroline Leite Feb 23 '22 at 13:29

Cross dataframe with dictionary

1 Answers1