1

I would like to store the names of all my hbase tables in an array inside my bash script.

  1. All sed hotfixes are acceptable.
  2. All better solutions (like readarray it from some zookeeper file I am not aware of) are acceptable

I have two hbase tables called MY_TABLE_NAME_1 and MY_TABLE_NAME_2, so what I want would be:

tables = (
  MY_TABLE_NAME_1
  MY_TABLE_NAME_2
)

What I tried:

Basing on HBase Shell in OS Scripts by Cloudera:

echo "list" | /path/to/hbase/bin/hbase shell -n > /home/me/hbase-tables
readarray -t tables < /home/me/hbase-tables

but inside my /home/me/hbase-tables is:

MY_TABLE_NAME_1
MY_TABLE_NAME_2
2 row(s) in 0.3310 seconds

MY_TABLE_NAME_1
MY_TABLE_NAME_2
wscourge
  • 10,657
  • 14
  • 59
  • 80
  • Why are the names duplicated? – Inian Feb 12 '18 at 10:39
  • Amount of the names is unknown, as well as the names themselves. So they might be `MY_TABLE_NAME_1`, `MY_TABLE_NAME_2` as well as `banana`, `cucumber` and `tomato`. – wscourge Feb 12 '18 at 10:41
  • Can you confirm that the problem is that 1) you've got the "x row(s) in x.xxx seconds" message and 2) the table names are duplicated ? Anyway I guess you might have a `--quiet` or similar flag on your `hbase shell` tool that might fix these two problems at once. (edit : that should be the `-n` flag for `--non-interactive`, but you're already using it... not sure this tool is well made for returning parseable output) – Aaron Feb 12 '18 at 10:44
  • Supposing deleting everything until the "x row(s) in x.xxx seconds" works for you, you could fix the file with GNU `sed '0,/[0-9]* row/d'` – Aaron Feb 12 '18 at 10:49

1 Answers1

1

You could use readarray/mapfile just fine. But to remove duplicates/skip empty lines and remove unnecessary strings, you need a filter using awk.

Also you don't need to create a temporary file and then parse that file, but directly use a technique called process substitution which allows the output of a command be available as if it is available in a temporary file

mapfile -t output < <(echo "list" | /path/to/hbase/bin/hbase shell -n | awk '!unique[$0]++ && !/seconds/ && NF')

Now the array would contain only the unique table names from the hbase output. That said, you should really look-up for the solution to remove the noise as part of the query output than post-process it this way.

Inian
  • 80,270
  • 14
  • 142
  • 161