-4

I have a text file with something like a million lines that are formatted like this:

{"_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0","parameterId":"visib_mean_last10min","stationId":"06193","timeCreated":1581590344449633,"timeObserved":1577922600000000,"value":11100}

The file has no headers. I want to be able to observe it as an array.

I've tried this:

df = pd.read_csv("2020-01_2.txt", delimiter = ",", header = None, names = ["_id", "parameterId", "stationId", "timeCreated", "timeObserved", "value"])

and while that does sort the files into columns and rows like I want it to it will plot "_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0" as the first entry where I would only want "0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0".

How do I plot only the value that comes after each ":" into the array?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
KaspDL
  • 1
  • 3
    That looks like a JSON file, not a CSV file – mousetail Sep 05 '20 at 18:39
  • 1
    Take a moment to read through the [editing help](//stackoverflow.com/editing-help) in the help center. Formatting on Stack Overflow is different than on other sites. The better your post looks, the easier it is for others to read and understand it. – Patrick Artner Sep 05 '20 at 18:39
  • 1
    Why use CSV on what's *not* a CSV file? It looks like someone's dumped a bunch of MongoDB documents into a JSON file, have you looked into where it comes from (can you just *connect to the DB*) or what options Python has for parsing it? – jonrsharpe Sep 05 '20 at 18:40

1 Answers1

0

As put it by @mousetail this looks as some kind of json file. You may want to do as follows:

import json
mylist = []
with open("2020-01_2.txt") as f:
          for line_no, line in enumerate(f):
              mylist.append([])
              mydict = json.loads(line)
              for k in mydict:
                  mylist[line_no].append(mydict[k])
              mydict= {}

It will output a list of lists, each one of them corresponding to a file line. Good luck!

Erick
  • 301
  • 3
  • 12