1

I have data that looks like this:

account_id, timestamp, network, balance
z11ldsm3, 08-26-2023, coinbase, [{'BTC', 1.23',32000}, {'USDC', 12500, 12500'},...]

That list of coins in "balance" can be N rows long, but will contain the coin type, current balance, and total value of holdings at that time per coin in USD. Additionally, there can be up to M wallets.

This value is logged and stored daily, or per user execution. I am assembling transactions from the api data each day, and updating the records.

The problem is that I am not sure that storing it like that in postgres is efficient. I'm seeing some places online that the maximum size for a charvar is 1gb, but that seems heinously large and I'm not sure if that's correct.

Is there any other idea on how to store data in this format? Should I be storing records of those values and then querying from a different table? Should I store it individually per coin?

IE:

account_id, timestamp, network, coin, balance, amount
z11ldsm3, 08-26-2023, coinbase, 'BTC', 1.23,32000
z11ldsm3, 08-26-2023, coinbase, 'USDC', 12500, 12500

Ultimately, this will be displayed as a table of values showing the balance for each wallet, in a similar format to the nested list from above.

Any advice is appreciated.

birdman
  • 249
  • 1
  • 13
  • 3
    Yes, you should absolutely store the individual currencies separately, like your second example. That makes it much easier to do summaries by currency (`SELECT SUM(amount) FROM xxx WHERE coin='BTC'`). That also allows Postgres to do better optimization of the storage, rather than having a bunch of long strings it doesn't understand. – Tim Roberts Aug 28 '23 at 18:40
  • 2
    Build the correct schema as per your second example. Here is how you can populate the schema: https://stackoverflow.com/a/76995261/13808319 – Mike Organek Aug 28 '23 at 19:00
  • 1
    Do some database normalization, and focus on 3NF: https://en.wikipedia.org/wiki/Database_normalization – Frank Heikens Aug 28 '23 at 19:23
  • 1) Best solution, follow the advice in the comments above and normalize your data. 2) If you want to persist in this inefficient manner and are using `psycopg2` then declare `balance` to be [JSONB](https://www.postgresql.org/docs/current/datatype-json.html) . Then use `psycopg2` [JSON adaptation](https://www.psycopg.org/docs/extras.html#adapt-json) to convert a`list` of `dicts` to a JSON `array` of `objects` once you change to `{'coin': 'BTC', 'balance': 1.23, 'amount': 32000}`. Or some other data structure. – Adrian Klaver Aug 28 '23 at 19:42
  • Definitely normalize the table as in your example. You can even add `check constraints` to the column `coin`, so that only valid coins are stored, or, preferably, controlled by another table via a foreign key. cheers – Jim Jones Aug 29 '23 at 06:12

1 Answers1

0

The below schema makes more sense compared to former one due to various reasons:

  • Easy to retrieve data based on coins, balances, amounts, etc.
  • Timestamp seems important in such case scenarios, each transaction should have a different entry in the table i.e. get hourly/daily/weekly transaction
  • Easy to analyze the data without manipulating with json blob
    account_id, timestamp, network, coin, balance, amount
    z11ldsm3, 08-26-2023, coinbase, 'BTC', 1.23,32000
    z11ldsm3, 08-26-2023, coinbase, 'USDC', 12500, 12500
C Patel
  • 1
  • 2