0

I need help creating a NDJSON object from the following parsed data from on of the leading Advertising Platform. I intend to upload the data to bigquery.

I succeeded in creating an NDJSON using pandas but I can't control for datatypes and it creates error during loading. [Especially between Int & Floats]

This is my object

datadict = {
 'start_time': ['2019-03-26','2019-03-27','2019-03-28'],
 'id': ['campaignid10', 'campaignid10', 'campaignid10'],
 'impression': [100, 200, 0],
 'tweets' : [10, None, None]
}

Desired Output: also None should be null

{'start_time':'2019-03-26', 'id':'campaignid10', 'impression':100, 'tweets':10 }
{'start_time':'2019-03-27', 'id':'campaignid10','impression':200, 'tweets':null}
{'start_time':'2019-03-28', 'id':'campaignid10', 'impression':0, 'tweets':null}
jarvis
  • 157
  • 1
  • 13

1 Answers1

0
import functools
import operator
import ndjson
def transform(dd, days):
    obs = days
    data = [[lst[idx] for lst in list(dd.values())] for idx in range(obs)]
    pre_label = [[elm]*obs for elm in list(dd.keys())]
    labels = [[lst[idx] for lst in pre_label] for idx in range(obs)]
    return [dict(zip(labels[i], data[i])) for i in range(obs)]


jsonList = [transform(_dd, 3) for _dd in dd]
jsonList = functools.reduce(operator.iconcat, jsonList, [])
output_ndjson = ndjson.dumps(jsonList)
print(output_ndjson) 

Would really appreciate if anyone could help me simplify the solution?

jarvis
  • 157
  • 1
  • 13