16

I'm a bit stuck repairing a faulty table (on Hbase 0.92.1-cdh4.0.0, Hadoop 2.0.0-cdh4.0.0)

There is a region in transition that doesn't finish:

Region    State
bf2025f4bc154914b5942af4e72ea063 counter_traces,1329773878.35_766a0b4df75e4381a686fbc07db9e333,1339425291230.bf2025f4bc154914b5942af4e72ea063. state=OFFLINE, ts=Tue Jun 12 11:43:53 CEST 2012 (0s ago), server=null

When I run sudo -u hbase hbase hbck -repair, I get this:

Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
ERROR: Region { meta => counter_traces,1329773878.35_766a0b4df75e4381a686fbc07db9e333,1339425291230.bf2025f4bc154914b5942af4e72ea063., hdfs => hdfs://hbase001:8020/hbase/counter_traces/bf2025f4bc154914b5942af4e72ea063, deployed =>  } not deployed on any region server.
Trying to fix unassigned region...
12/06/12 11:44:40 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'counter_traces,1329773878.35_766a0b4df75e4381a686fbc07db9e333,1339425291230.bf2025f4bc154914b5942af4e72ea063.', STARTKEY => '1329773878.35_766a0b4df75e4381a686fbc07db9e333', ENDKEY => '1329793347.58_163865765c0a11e184ab003048f0e77e', ENCODED => bf2025f4bc154914b5942af4e72ea063,}

and it just loops.

If I don't do the -repair, I get this:

Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
ERROR: Region { meta => counter_traces,1329773878.35_766a0b4df75e4381a686fbc07db9e333,1339425291230.bf2025f4bc154914b5942af4e72ea063., hdfs => hdfs://hbase001:8020/hbase/counter_traces/bf2025f4bc154914b5942af4e72ea063, deployed =>  } not deployed on any region server.
ERROR: Region { meta => counter_traces,1329816776.95_95b5561f3c1e496598421359a19ac665,1339425297099.ee1fd808d954c9adc9ed95ec2f29d119., hdfs => hdfs://hbase001:8020/hbase/counter_traces/ee1fd808d954c9adc9ed95ec2f29d119, deployed =>  } not deployed on any region server.
12/06/12 11:45:59 DEBUG util.HBaseFsck: There are 134 region info entries
ERROR: There is a hole in the region chain between 1329773878.35_766a0b4df75e4381a686fbc07db9e333 and 1329793347.58_163865765c0a11e184ab003048f0e77e.  You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between 1329816776.95_95b5561f3c1e496598421359a19ac665 and 1329847231.75_b3c50776778b43e088dd7ed865e11331.  You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table counter_traces

I've run -repair a couple of times before, and it helped. But this time, not anymore.

Ok, so this states to do a manual intervention to fix this. Could someone point me in the right direction on how to do this? A recipe, webpage, example, anything will help.

Thanks, Mario

Mario
  • 1,801
  • 3
  • 20
  • 32

7 Answers7

19

Mario,

So one of the reasons why a region gets stuck in transition is because, when it is being moved across regionservers, it is unassigned from the source regionserver but is never assigned to another regionserver. One fix that always works for me is by forcibly ASSIGNing it from the hbase shell by :-

assign regionName
sulabhc
  • 656
  • 5
  • 9
  • That got me in the right direction. The direct assing didn't work, but at least I found some code now that helped me fill the hole in the region chain. – Mario Jul 04 '12 at 09:53
  • Could You share the code? I have a similar issue and would like to fix that. – Marcin Cylke Dec 19 '12 at 08:59
  • I'm sorry, should have posted back then. I don't think I have that anymore. – Mario Feb 22 '13 at 14:59
  • Also see https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies - it fixed the stuck state where assign could not – Arnaud Jan 02 '19 at 15:05
6

I tried to do forceful assignment of regions, but it didn't work for me. I tried following and it worked:

Steps:

  • Disable table from hbase shell
  • Run hbck to fix problmes using following command

    sudo -u hbase hbase hbck -repair

  • Enable table from hbase shell

hp36
  • 269
  • 1
  • 6
  • 20
  • 2
    From all of the above this one worked. The one thing you can do before those steps is to run `hbase hbck` to see which tables are `inconsistent` so you could know which ones should be disabled – Sebastian Kaczmarek Mar 19 '18 at 08:18
2

If your HBase version is recent enough you might also try hbck -repairHoles instead of just -repair. That did the trick for me on a recent "fix the hole" problem.

omeyn
  • 103
  • 1
  • 6
1

First you should check if there's a file for that particular region in your hdfs.

If there is, you should stick with hbck -fixHdfsHoles -fixMeta alone until fixed. (might take a couple of tries).

If there is no such file for the region in transition (it should be under /hbase/data///), then HBase thinks there should be a valid HFile for that region in that directory and won't be able to fix it with normal repair commands.

You should do what is in one of the latest response here and create a valid HFile for in your hdfs :

http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308

1

I had same problem

  • While starting all the services through Ambari, HDFS went on to Safe mode and didn't come back for longer time.
  • Other services didn't start as HFDS was in safe mode for longer time. I removed HDFS from safe mode manually and tried to start the services, looks like this corrupted some HDFS files which effected hbase.
  • Scanning existing table returned "Unknown table error"
  • Creating a new namespace returned below error:

create_namespace 'tst1' ERROR: java.io.IOException: Table Namespace Manager not fully initialized, try again later

"hbase hbck -repair"

returned "ERROR: There is a hole in the region chain between and . You need to create a new .regioninfo and region dir in hdfs to plug the hole".

Running "hbase hbck -repairHoles" fixed the problem. I could scan previously stored data as well.

Note: - hbase hbck command should be run from hbase user

Vinay MP
  • 37
  • 8
0

For my case, I forget changing the owner of the region data which was copyed from another cluster.

Then I try to do hbase hbck -repair but get INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned then error appear Unable to complete check or repair the region, failed to move out of transition within timeout 120000ms

I find hbase:meta already has region info.

While scan table, you will get error like this

ERROR: No server address listed in hbase:meta for region X

Then try hbase hbck -fixAssignments, but still failed like before.

Then I check the table region data and find only this table's own and group are

drwxr-xr-x   - hdfs  hbase

but other like this

drwxr-xr-x - hbase hbase

So problem was solved after change own and group as others. Now you will scan table sucessfully.

Matiji66
  • 709
  • 7
  • 14
0

I had the same problem. One region was stuck in Region still in transition, waiting for it to become assigned:. None of the -repair options worked, because all options require that all regions are assigned.

I had to remove the region from hdfs. hdfs$ hdfs dfs -rm -r /hbase/data/default/<table>/<region>

After removing the region, all -repair options worked, but the region was reported still in transition, due to zookeeper cache.

As How to get the region in HBASE which is struck in FAILED_OPEN state? I removed the transition cache from zookeeper, restarted HBASE master and everything was ok

banuj
  • 3,080
  • 28
  • 34