Like I said in the question, I figured this was pretty simple, especially with the requests library, so I developed a script to do this. I started with the Riak keys=true
(i.e. non-chunked) mode, but that failed on my larger buckets. I switched to chunked mode (keys=stream
), but the output was not a single JSON object anymore, but a series of concatenated objects (i.e. {...}{...}...{...}
. A colleague provided me with a regex to split the JSON objects out from the aggregated Riak response, which I parsed and processed sequentially. Not too bad. Here's the code:
#!/usr/bin/python
# script to delete all keys in a Riak bucket
import json
import re
import requests
import sys
def processChunk(chunk):
global key_count
obj = json.loads(chunk.group(2))
if 'keys' in obj:
for key in obj['keys']:
r = requests.delete(sys.argv[1] + '/' + key)
print 'delete key', key, 'response', r.status_code
key_count += 1
if len(sys.argv) != 2:
print 'Usage: {0} <http://riak_host:8098/riak/bucket_name>'.format(sys.argv[0])
print 'Set riak_host and bucket_name appropriately for your Riak cluster.'
exit(0)
r = requests.get(sys.argv[1] + '?keys=stream')
content = ''
key_count = 0
for chunk in r.iter_content():
if chunk:
content += chunk
re.sub(r'(?=(^|})({.*?})(?={|$))', processChunk, content)
print 'Deleted', key_count, 'keys'
While my problem is largely solved at this point, I suspect there are better solutions out there. I welcome people to add them on this page. I won't accept my own answer unless no alternatives are provided after a few weeks.