I'm trying to get data from hbase like this:
key = pd.read_sql('select key from table',hive_engine)
table = connection.table('games_ut')
res = {}
n = 0
for key in table.key:
res[str(key)] = table.row(b'{key}'.format(key=key))
n += 1
if n % 100000 == 0:
print(str(n) + " has been read,need a sleep!")
sleep(0.5)
sleep(0.5) means have a rest. As you can see codes running well but too slow and gave us a lot of pressure on the cluster.Because the length of key is more than 40 million.
So,I want to know is there any way for me to get data lot size at the same time.I tried to use table.rows(),but fauild..... I am new and really how to generate lot size data at the same time by looping.