0

Many thanks in advance, I have been keen on collecting some insights on getting the below output from its corresponding input. Would like to see the table getting converted to a desirable format via using a python script as I have to work with a huge CSV at a later stage. Any inputs are highly appreciated.

Input CSV:

reference mcc value currency
10000 5300 134.09 USD
10001 5651 128.95 USD
10002 5912 104.71 USD

Used python code:

from csv import DictReader
from itertools import groupby
from pprint import pprint
import json

with open('Test_bulk_transactions_data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    group = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['reference'], r['mcc'])):
        group.append({
            "reference": k[0],
            "mcc": k[1],
            "amount": [{k:v for k, v in d.items() if k not in ['reference','mcc']} for d in list(g)]})
        uniquekeys.append(k)

print(json.dumps(group, indent = 3) + '}')

Current Output:

  {
   "reference": "10000",
   "mcc": "5300",
   "amount": [
    {
     "value": "134.09",
     "currency": "USD"
    }
   ]
  },
  {
   "reference": "10001",
   "mcc": "5651",
   "amount": [
    {
     "value": "128.95",
     "currency": "USD"
    }
   ]
  },
  {
   "reference": "10002",
   "mcc": "5912",
   "amount": [ 
    {
     "value": "104.71",
     "currency": "USD"
    }
   ]
  }

Desired Output JSON:

  {
   "reference": "10000",
   "mcc": "5300",
   "amount": {
     "value": 134.09,
     "currency": "USD"
    }
  },
  {
   "reference": "10001",
   "mcc": "5651",
   "amount": {
     "value": 128.95,
     "currency": "USD"
    }
  },
  {
   "reference": "10002",
   "mcc": "5912",
   "amount": {
     "value": 104.71,
     "currency": "USD"
    }
  }
  • Important Note: Amount shouldn't fall into [] and that the value should stand as a numeric output and not strings.
Prasad
  • 5
  • 2
  • Please, check [ask]. What have you tried so far and what specific problem you are not able to solve? Do you have problem open/read/parse csv, file, create desired data structure, dump it to JSON? – buran Dec 23 '21 at 09:58
  • I've edited the question, hopefully now it works ? @buran – Prasad Dec 23 '21 at 10:17

2 Answers2

0
import csv

csv_filepath =  "/home/mhs/test.csv"
 
output = []


with open(csv_filepath) as cd:
    csvReader = csv.DictReader(cd)
    for r in csvReader:
        r["amount"] = {"value": float(r.pop("value")), "currency": r.pop("currency")}
        output.append(r)
MHS
  • 2,260
  • 11
  • 31
  • 45
0

Without relying on imported modules, you could just do this:

J = []

with open('input.csv') as csv:
    cols = next(csv).split()
    assert len(cols) == 4
    for row in csv:
        t = row.split()
        if len(t) == 4:
            J.append({cols[0]: t[0], cols[1]: t[1], "amount": {cols[2]: float(t[2]), cols[3]: t[3]}})


print(J)
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • Hi, I tried this and it's giving an error: 'IndexError: list index out of range' – Prasad Dec 23 '21 at 10:38
  • Your input file may contain empty lines. I'll edit the code to add a sanity check. Also, in the absence of information to the contrary, this code assumes that the columns are whitespace delimited – DarkKnight Dec 23 '21 at 10:43
  • Hi again, I tried the updated code and now it is giving "AssertionError: " :) – Prasad Dec 23 '21 at 10:48
  • That's because the first line in your csv file does not contain 4 whitespace delimited tokens. If you could show what your csv file actually looks like you'd be a whole lot better off – DarkKnight Dec 23 '21 at 10:57
  • Sure, I'll try to find a fix, thanks for many help :) – Prasad Dec 23 '21 at 11:28
  • Some history may help you. CSV stands for Comma-Separated Values. However, over time this has become a generic term for files containing tokens (values) separated by anything that's unambiguous in the context of the tokens themselves. So, a CSV might use comma, whitespace, pipe, colon, semi-colon, exclamation mark - just anything. If your file is using commas you just need to change the *split()* to *split(',')* although you may need to trim (strip) any text values to remove redundant whitespace – DarkKnight Dec 23 '21 at 13:06
  • Thanks for the kind explanation and sharing insights :) It has been really helpful to understand handling CSVs, much appreciated..! – Prasad Dec 23 '21 at 14:45