0

I am looping through a json line files where i am just filtering for sender id and status nd outputting this to the terminal. There are multiple_sender id which are within a list whilst the sender is are just a string. I want to be able to write the output on one csv file where the first column is STATUS and the second one is SENDER_ID. I have attempted this at the top of my script but not sure if this is the right way of doing so.

My script is as follows. At which point would i need to write it to csv.I have read through the documentation but still a little unsure.

import json_lines

text_file = open("senderv1.csv", "a")
with open('specifications.jsonl', 'rb') as f:
for item in json_lines.reader(f):
  • 1
    please add some values from specifications.json for more clarification – Shan Ali Aug 28 '19 at 12:50
  • Wouldn't you want to append all values to a dict/list and convert it to a dataframe, to then export it to a .csv file? – Celius Stingher Aug 28 '19 at 12:56
  • Could you point to an example @CeliusStingher – london2012_dd Aug 28 '19 at 13:19
  • Sai kumar got ahead of me and posted the answer, here he is not opening the file, but creating a new. Do you want to open the existing file and be able to edit it, or you do want to create a new .csv file? – Celius Stingher Aug 28 '19 at 13:21
  • I want to be able to create a new file since am filtering data from a jsonl file and outputting that in a csv file @CeliusStingher – london2012_dd Aug 28 '19 at 13:23
  • You could try @sai kumar's answer, if it doesn't work, I'll help you get it right. – Celius Stingher Aug 28 '19 at 13:25
  • 1
    What's your question exactly ? Is your code working ? If yes and you're just looking for improvments, SO is not the right place, you want codereview instead. Else, please explain __clearly__ the issue you're having (cf https://stackoverflow.com/help/how-to-ask) – bruno desthuilliers Aug 28 '19 at 13:32
  • I not quite sure how the resulting CSV file should look like in cases of multiple senders. Your code tries to write a list to the file, which doesn't work and you don't want to write the string representation of a Python list as it is not meant to be saved and especially not loaded again in that format. So should the CSV file have a column per sender then? Or maybe a JSON array with multiple senders? Then you should consider also saving a single sender that way to get a more regular structure → no special casing when reading the data again. – BlackJack Aug 28 '19 at 13:42
  • @BlackJack My code is not working as it should be , it should essentially be like the following in CSV format : Active ADS Inactive [CDF,VDF] – london2012_dd Aug 28 '19 at 13:43
  • @CeliusStingher – london2012_dd Aug 28 '19 at 13:45

2 Answers2

0

Using pandas you can create the dataframe and thereby save it as csv. Hope this will solve your problem.

import json_lines 
import pandas as pd 
# text_file = open("senderv1.csv", "a") 

single_sender_status=[] 
single_sender=[] 
with open('specifications.jsonl', 'rb') as f: 
    for item in json_lines.reader(f): 
        if 'sender_id' in item: 
            single_sender_status.append(item['status']) 
            single_sender.append(item['sender_id']) 
            # text_file.write(single_sender_status) 
            # text_file.write('\t') 
            # text_file.write(single_sender) 
            # text_file.write('\n') 
            # print("Single ID " + str(single_sender)) 
        else: 
            single_sender_status.append(item['status']) 
            single_sender.append([sender['id'] for sender in item['senders']]) 
            # text_file.write(single_sender_status) 
            # text_file.write('\t') 
            # text_file.write(multiple_sender_ids) 
        # print("Multiple Sender ID'S " + str(multiple_sender_ids)) 

df=pd.DataFrame({'STATUS':single_sender_status,'SENDER_ID':single_sender}) 

df.to_csv('senderv1.csv',index=False)
sai kumar
  • 44
  • 3
  • 3
    You don't need such a huge and complex dependency as panda just to write a basic simple CSV file - there's already a csv module in the stdlib and it's just as easy to use. – bruno desthuilliers Aug 28 '19 at 13:30
0

Here is code to write a CSV file with the csv module from the standard library. If the first column contains the status and the following columns the senders:

#!/usr/bin/env python3
import csv

import json_lines


def main():
    with json_lines.open("specifications.jsonl") as reader:
        with open("senderv1.csv", "w", encoding="utf8") as csv_file:
            writer = csv.writer(csv_file, delimiter="\t")
            for item in reader:
                row = [item["status"]]
                if "sender_id" in item:
                    row.append(item["sender_id"])
                elif "senders" in item:
                    row.extend(sender["id"] for sender in item["senders"])
                else:
                    raise ValueError("item with no sender information")
                writer.writerow(row)


if __name__ == "__main__":
    main()

To have the same information spread across different columns isn't really good, but putting more than one value into a single cell isn't good either. CSV is best suited for two dimensional tabular data. Maybe you want JSON (Lines) for the result too‽

BlackJack
  • 4,476
  • 1
  • 20
  • 25