2

I am using the rich library to parse json data retrieved with aiohttp. It works great printing the data directly from the API, formatting nicely (with line breaks so that it is not hard to read):

{
    'city': 'Haidian',
    'region_code': 'BJ',
    'os': None,
    'tags': [],
    'ip': 1699530633,
    'isp': 'China Education and Research Network Center',
    'area_code': None,
    'longitude': 116.28868,
    'last_update': '2021-12-16T05:42:00.377583',
    'ports': [8888],
    'latitude': 39.99064,
    'hostnames': [],
    'postal_code': None,
    'country_code': 'CN',
    'country_name': 'China',
    'domains': [],
    'org': 'China Education and Research Network',
    'data': [
        {
            '_shodan': {'options': {}, 'id': '1d25e274-18ce-4a3d-8e1c-73e5bf35bf76', 'module': 'http-simple-new', 'crawler': '42f86247b760542c0192b61c60405edc5db01d55'},
            'hash': -1008250258,
            'os': None,
            'opts': {},
            'timestamp': '2021-12-16T05:42:00.377583',
            'isp': 'China Education and Research Network Center',
            'port': 8888,
            'hostnames': [],
            'location': {'city': 'Haidian', 'region_code': 'BJ', 'area_code': None, 'longitude': 116.28868, 'country_name': 'China', 'postal_code': None, 'country_code': 'CN', 'latitude': 39.99064},
            'ip': 1699530633,
            'domains': [],
            'org': 'China Education and Research Network',
            'data': 'GET / HTTP/1.1\r\nHost: 101.76.199.137\r\n\r\n',
            'asn': 'AS4538',
            'transport': 'tcp',
            'ip_str': '101.x.199.x'
        }
    ],
    'asn': 'AS4538',
    'ip_str': '101.x.199.x'
}

The program then appends that to a dictionary like:

ipInfo = {}
async def host(ip):
    ret = await fetch(ip) 
    ipInfo[ip] = ret

Then after its is finished with a list of ip addresses it writes this dictionary to a file. The issue I am having is that when I load this data to review at a later time and attempt to parse it, the rich library does not format it nicely the way that it does when it is just coming from the API. It always ends up looking like:

[{'hash': -644847518, 'timestamp': '2021-12-27T15:08:16.109960', 'isp': 'VNPT Corp', 'transport': 'tcp', 'data': 'GET / HTTP/1.1\r\nHost: 113.x.185.x\r\n\r\n', 'asn': 'AS45899', 'port': 5555, 'hostnames': ['static.vnpt.vn'], 
'location': {'city': 'Vị Thanh', 'region_code': '73', 'area_code': None, 'longitude': 105.47012, 'latitude': 9.78449, 'postal_code': None, 'country_code': 'VN', 'country_name': 'Viet Nam'}, 'ip': 1906751888, 'domains': ['vnpt.vn'], 
'org': 'Vietnam Posts and Telecommunications Group', 'os': None, '_shodan': {'crawler': 'd905ab419aeb10e9c57a336c7e1aa9629ae4a733', 'options': {}, 'id': '33f5bd73-c7d7-4dc0-beb8-b17afb53d931', 'module': 'http-simple-new', 'ptr': 
True}, 'opts': {}, 'ip_str': '113.x.185.x'}], 'asn': 'AS45899', 'city': 'Vị Thanh', 'latitude': 9.78449, 'isp': 'VNPT Corp', 'longitude': 105.47012, 'last_update': '2021-12-27T15:08:16.109960', 'country_name': 'Viet Nam', 
'ip_str': '113.x.185.x', 'os': None, 'ports': [5555]}

And that does not work for me because I need to be able to actually read it. The code I am currently using to parse it looks like:

if argsc.parse:
    _print(f'Opening {argsc.parse}')
    with open(argsc.parse, 'r') as f:
        f = f.read()
        rich.print(f)
        exit(0)

I have tried using rich.print_json and parsing the dictionary entries one at a time, all sorts of things really. I did notice while writing this post that if the data is saved like it is in the first example with the nice newlines formatting then it does parse correctly, but I don't know how to do that either.

So my question is (guess it is two questions): 1) How do I save the data from rich so that it is saved the way that I see it on the screen? And: 2) How I do parse json data in a file with the nice newline formatting seen in the first example? Is that even possible? Maybe that is the way it comes back the API and it is being written differently. But I tried writing the data as-is without appending it to a dictionary and that did not work either.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Chev_603
  • 313
  • 3
  • 14

2 Answers2

0

I figured it out. The answer was to save the output as if I were simply redirecting standard output to a file, as detailed in this blog. The function I ended up using is:

from rich import print as rprint

def write_formatted(data):
    """
    Func to write PARSABLE output 
    :param data:
    :return: nada, just print
    """
    with open(f'{argsc.output}-parsable.txt', 'a') as f:
        rprint(data, file=f)
        f.close()

And a function to later open that data and parse it with rich:

def parse_data():
    _print(f'Opening {argsc.parse}')
    with open(argsc.parse, 'r') as f:
        f = f.read()
        rprint(f)
        exit(0)
Chev_603
  • 313
  • 3
  • 14
0

When you read from a file you will get back a string. Rich won't do any formatting of that string, because it does not know that the string contains JSON.

You could decode that string in to a Python object by using the builtin json module. Add import json to the top of your file, and then my_data=json.loads(f.read()) which will give you a dict or list you can then print.

Alternatively, with Rich you can use the print_json method which parse and pretty print a string containing JSON in a single step.

Put this at the start of your code:

from rich import print_json

Then add the following to your parse_data method:

print_json(f.read())
Will McGugan
  • 2,005
  • 13
  • 10
  • I tried this, but this did not format the output using new lines when I was loading json from disc. It does however when the data is printed after coming back from the API before being written to disc. – Chev_603 Dec 30 '21 at 21:14
  • 1
    You may be conflating what is and is not JSON. For instance, your `write_formatted method` is writing a Python data structure, which look like JSON, but won't always parse as JSON. Strictly speaking, JSON is text. Represented in Python as a string. You can print that with `print_json(text)`, which will format and style it for you. If you have data, i.e. a Python dict, list, etc, you can use `print_json(data=my_data)` which will first encode that data as JSON then print it. If you want to write json to a file, you should use `json.dump`. – Will McGugan Jan 01 '22 at 15:30
  • Oh that is a good point - my data is a dictionary, not actually json – Chev_603 Jan 03 '22 at 20:50