3

I'm trying to run some analysis on cryptocurrency(e.g. Bitcoin, Ethereum) data but having trouble finding data sources. For example, I'd like to collect transaction data such as input address, output address, transaction time, transaction amount, etc. for Ethereum.

I've found that I can access Ethereum data with web3py but is it possible to get data for "ALL" transactions that have made recently in the entire Ethereum network, not just the transactions connected to my own wallet(address)? For example, I'd like to get data on all Ethereum transaction occurred today.

Also, do I must have my own Ethereum wallet(address) in order to access their data with web3py? I wonder whether I need a specific address as a starting point or I can just scrape the data without creating a wallet.

Thanks.

2 Answers2

1

For example, I'd like to collect transaction data such as input address, output address, transaction time, transaction amount, etc. for Ethereum.

You can iterate over all blocks and transactions using web3.eth.get_block call. You need, however, parse the transaction content yourself.

To access all the data, it is recommended that you run your own node to have the maximum network bandwidth for JSON-RPC calls.

Also, do I must have my own Ethereum wallet(address) in order to access their data with web3py?

Address is just a derived from a random number and you do not need to generate one.

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
0

The following code should help you access the most recent blocks assuming you already have an Infura Project ID:

ethereum_mainnet_endpoint  = f'https://mainnet.infura.io/v3/{INFURA_PROJ_ID}'
web3 = Web3(Web3.HTTPProvider(ethereum_mainnet_endpoint))
assert web3.isConnected()

eth_block_df = pd.DataFrame(ethBlocks).set_index('number')

Once you've accessed the most recent transactions, you can loop through each of the transaction hashes and create a new dataset with it:

def decoder(txns):
  block = []
  for i in txns:
    hash =  '0x' + bytes(i).hex()
    block.append(hash)
  
  return block

eth_block_df['transactions_0x'] = eth_block_df['transactions'].apply(lambda x: decoder(x))

def transaction_decoder(hashes):
  """
  Generates a list of ETH transactions per row
  """
  txn_dets = []
  for i in hashes:
    txn = web3.eth.get_transaction(str(i))
    txn_dets.append(dict(txn))
  
  return txn_dets

def transaction_df(series):
  """
  Converts a list of lists of Ethereum transactions into a single DataFrame.
  """
  obj = series.apply(transaction_decoder)
  main = []
  for row in obj:
    for txn in row:
      main.append(txn)

  eth_txns_df = pd.DataFrame(main, columns=main[0].keys())

  return eth_txns_df

eth_txns_df = transaction_df(eth_block_df['transactions_0x'])
print(eth_txns_df.shape)

I used this code recently for a project I'm still working so it's probably not the most efficient or the cleanest solution but it gets the job done.

Hope that helps!