1

Requirement

My requirement is to have a Python code extract some records from a database, format and upload a formatted JSON to a sink.

Planned approach

1. Create JSON-like templates for each record. E.g.

 json_template_str = '{{
   "type": "section",
   "fields": [
       {{
           "type": "mrkdwn",
           "text": "Today *{total_val}* customers saved {percent_derived}%."
       }}
     ]
 }}'

2. Extract records from DB to a dataframe.

3. Loop over dataframe and replace the {var} variables in bulk using something like .format(**locals()))

Question

I haven't worked with dataframes before. What would be the best way to accomplish Step 3 ? Currently I am

3.1 Looping over the dataframe objects 1 by 1 for i, df_row in df.iterrows():

3.2 Assigning

    total_val= df_row['total_val']
    percent_derived= df_row['percent_derived']

3.3 In the loop format and add str to a list block.append(json.loads(json_template_str.format(**locals()))

I was trying to use the assign() method in dataframe but was not able to figure out a way to use like a lambda function to create a new column with my expected value that I can use.

As a novice in pandas, I feel there might be a more efficient way to do this (which may even involve changing the JSON template string - which I can totally do). Will be great to hear thoughts and ideas.

Thanks for your time.

jconnor198
  • 13
  • 3

1 Answers1

1

I would not write a JSON string by hand, but rather create a corresponding python object and then use the json library to convert it into a string. With this in mind, you could try the following:

import copy
import pandas as pd

# some sample data
df = pd.DataFrame({
    'total_val': [100, 200, 300],
    'percent_derived': [12.4, 5.2, 6.5]
})

# template dictionary for a single block
json_template = {
    "type": "section",
    "fields": [
        {"type": "mrkdwn",
         "text": "Today *{total_val:.0f}* customers saved {percent_derived:.1f}%."
        }
    ]
}

# a function that will insert data from each row 
# of the dataframe into a block
def format_data(row):
    json_t = copy.deepcopy(json_template)
    text_t = json_t["fields"][0]["text"]
    json_t["fields"][0]["text"] = text_t.format(
        total_val=row['total_val'], percent_derived=row['percent_derived'])
    return json_t

# create a list of blocks
result = df.agg(format_data, axis=1).tolist()

The resulting list looks as follows, and can be converted into a JSON string if needed:

[{
    'type': 'section',
    'fields': [{
        'type': 'mrkdwn',
        'text': 'Today *100* customers saved 12.4%.'
    }]
}, {
    'type': 'section',
    'fields': [{
        'type': 'mrkdwn',
        'text': 'Today *200* customers saved 5.2%.'
    }]
}, {
    'type': 'section',
    'fields': [{
        'type': 'mrkdwn',
        'text': 'Today *300* customers saved 6.5%.'
    }]
}]
bb1
  • 7,174
  • 2
  • 8
  • 23
  • Thanks so much for taking the time to reply in detail. The goal was to have the format in a json file so the file can be changed without need for a change to py code. That was also the reason for looking at `.format(**locals()))` because we have the list of attributes in a data dictionary and the diff teams can use diff formats with the attributes they are using. I left that detail out in the question, so I actually will not use a string like `json_template_str = '{{` but read a json file that has the template code. – jconnor198 Dec 07 '21 at 02:54
  • Having said that, thanks again for taking the time and sharing your approach. And as I am a newbie here I am not allowed to upvote your answer. – jconnor198 Dec 07 '21 at 02:57