0

What is the official way to save list of dictionaries and jsonl or json lines? I saw: https://ml-gis-service.com/index.php/2022/04/27/toolbox-python-list-of-dicts-to-jsonl-json-lines/ with solution:

import gzip
import json


def dicts_to_jsonl(data_list: list, filename: str, compress: bool = True) -> None:
    """
    Method saves list of dicts into jsonl file.

    :param data: (list) list of dicts to be stored,
    :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends
        .jsonl suffix into the file.
    :param compress: (bool) should file be compressed into a gzip archive?
    """

    sjsonl = '.jsonl'
    sgz = '.gz'

    # Check filename

    if not filename.endswith(sjsonl):
        filename = filename + sjsonl

    # Save data
    
    if compress:
        filename = filename + sgz
        with gzip.open(filename, 'w') as compressed:
            for ddict in data:
                jout = json.dumps(ddict) + '\n'
                jout = jout.encode('utf-8')
                compressed.write(jout)
    else:
        with open(filename, 'w') as out:
            for ddict in data:
                jout = json.dumps(ddict) + '\n'
                out.write(jout)

but feels weird to copy paste code to do this or that no other stack overflow question asks this already or the original/official json library doesn't seem to reference this?

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323

1 Answers1

1

I don't know if this is the "official way" but I prefer it and it fixed some bugs:

def expanduser(path: Union[str, Path]):
    """

    note: if you give in a path no need to get the output of this function because it mutates path. If you
    give a string you do need to assign the output to a new variable
    :param path:
    :return:
    """
    if not isinstance(path, Path):
        # path: Path = Path(path).expanduser()
        path: Path = Path(path).expanduser()
    path = path.expanduser()
    assert not '~' in str(path), f'Path username was not expanded properly see path: {path=}'
    return path

def dicts_to_jsonl(data_list: list[dict], path2filename: Union[str, Path], compress: bool = False) -> None:
    """
    Method saves list of dicts into jsonl file.
    :param data: (list) list of dicts to be stored,
    :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends
        .jsonl suffix into the file.
    :param compress: (bool) should file be compressed into a gzip archive?

    credit:
        - https://stackoverflow.com/questions/73312575/what-is-the-official-way-to-save-list-of-dictionaries-as-jsonl-or-json-lines
        - https://ml-gis-service.com/index.php/2022/04/27/toolbox-python-list-of-dicts-to-jsonl-json-lines/
    """
    import gzip
    import json

    sjsonl = '.jsonl'
    sgz = '.gz'
    # Check filename
    if not str(path2filename).endswith(sjsonl):
        path2filename = Path(str(path2filename) + sjsonl)
    expanduser(path2filename)
    # Save data

    if compress:
        filename = path2filename + sgz
        with gzip.open(filename, 'w') as compressed:
            for ddict in data_list:
                jout = json.dumps(ddict) + '\n'
                jout = jout.encode('utf-8')
                compressed.write(jout)
    else:
        with open(path2filename, 'w') as out:
            for ddict in data_list:
                jout = json.dumps(ddict) + '\n'
                out.write(jout
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323