Ethereum: What's a good way to retrieve a large amount of old smartcontract log data from a RPC service for a backfill?

Question

The problem I'm posed with is backfilling a specialized database, using data from the event log of a given smartcontract on an Ethereum blockchain.

The question is however: how to do so without reaching the limits of eth_getLogs (also without limits: how to have reasonably sized RPC responses)

What I tried so far

I prefer to use Infura, but they limit this call at 100 entries per response. And rightfully so, querying should be done in small chunks for load balancing etc. Is api pagination + eth_getLogs the right way to collect data for backfills?

Idea 1: `eth_getLogs` on ranges of blocks

I don't know of any way to paginate the eth_getLogs other than querying for ranges of blocks. A block may contain more than 100 events however, which prevents me from reading all of the data when using Infura. Maybe there is a way to paginate on log index? (100 is something I came accross when experimenting, but I can't find documentation on this)

Idea 2: log filters

Using a filter RPC call is another option: i.e. start a "watcher" on a range of old blocks. I tried this, but the Infura websocket RPC I am using doesn't seem to give any response, and neither does Ganache when testing locally. Non-archive (i.e. live watching) logs work, so I known that my code is working as intended at least. (My go-ethereum Watch... generated binding call works, but does not result in responses on the output channel when specifying an old block in bind.WatchOpts.Start)

Does anyone have any suggestions on how to retrieve large amounts of log data? Or a link to other projects that tackled this problem?

Any reason you can't use a fully synced local node instead of using Infura to initialize the database? I would take this approach to process legacy events then switch over to Infura for my live updates. — Adam Kipnis, Mar 17 '18 at 18:51
@AdamKipnis Thanks for the quick input. Yes, that sort of works, and I will take that route if there's no better solution, but I prefer to tackle the api problem itself. The backfill is meant to remove inconsistencies/failed updates and is ran every X minutes for the last Y blocks. So I prefer a setup where it can run consistently without much maintenance. And I can spread the log query workload to play nice with Infura / other platforms. — protolambda, Mar 17 '18 at 19:38

Ethereum: What's a good way to retrieve a large amount of old smartcontract log data from a RPC service for a backfill?

What I tried so far

Idea 1: eth_getLogs on ranges of blocks

Idea 2: log filters

0 Answers0

Idea 1: `eth_getLogs` on ranges of blocks