1

ISSUE/QUESTION:
How can we assure that EMR Bootstrap action runs after the HBase application installation on EMR?

CLUSTER INFO:
I am using emr-5.25.0 version which has support for Hbase 1.4.9.

USE-CASE: I am installing Geomesa on EMR using the Bootstrap action (Following below document). https://www.geomesa.org/documentation/tutorials/geomesa-hbase-s3-on-aws.html

OBSERVATION:
I am using below code as bootstrap action. I see the below bootstrap action started before HBase installation on the cluster. I want to use bootstrap action to ensure Geomesa is installed on every master node in case of Multi-Master set-up.

#!/bin/bash

set -e -x

IS_MASTER=false

if [ -f /mnt/var/lib/info/instance.json ]
then
  IS_MASTER=`cat /mnt/var/lib/info/instance.json | tr -d '\n ' | sed -n 's|.*\"isMaster\":\([^,]*\).*|\1|p'`
fi

if [[ $IS_MASTER == false* ]] 
then
  echo "Not the master server."
  exit 0
else   
  echo "Installing Geomesa on Master Server."  
  GEOMESA_INSTALLATION_FILE_S3_LOCATION="$1"
  GEOMESA_FILE_VERSION="$2"

  # initialize the Geomesa version.
  export GEOMESA_VERSION="$3"

  # Create jars package
  mkdir -p /home/hadoop/jars

  # Copy Geomesa 2.3.0 jars from s3 to local jars folders.
  aws s3 cp $GEOMESA_INSTALLATION_FILE_S3_LOCATION /home/hadoop/jars

  # Move to opt package
  cd /opt/

  # Unzip geomesa jar in /opt package.
  sudo tar zxvf /home/hadoop/jars/geomesa-hbase-dist_${GEOMESA_FILE_VERSION}-bin.tar.gz

  # run bootstrap-geomesa-hbase-aws.sh file to bootstrap geomesa on EMR.
  sudo /opt/geomesa-hbase_${GEOMESA_FILE_VERSION}/bin/bootstrap-geomesa-hbase-aws.sh

  # Go to /etc/hadoop/conf
  cd /etc/hadoop/conf

  # Copy hbase-site.xml in the /etc/hadoop/conf
  sudo cp /usr/lib/hbase/conf/hbase-site.xml /etc/hadoop/conf

  # Create .zip file for hbase-site.xml
  sudo zip /home/hadoop/jars/hbase-site.zip hbase-site.xml

  # initialize GEOMESA_EXTRA_CLASSPATHS to hbase-site.zip
  export GEOMESA_EXTRA_CLASSPATHS=/home/hadoop/jars/hbase-site.zip
fi
Amit
  • 59
  • 6

2 Answers2

1

Use Steps. The bootstrap always run after the server is provisioned and before installing the applications. So, you have to use Steps with your script. First, add the custom jar steps with below jars.

s3://<region prefix>.elasticmapreduce/libs/script-runner/script-runner.jar

The argument is

s3://<your bucket>/<path>/<script>.sh

and set the action on failure as Continue. Don't check the option

Auto-terminate cluster after the last step is completed

Lamanus
  • 12,898
  • 4
  • 21
  • 47
  • 1
    **Yes, Step is an option but how will it work with multi-master cluster set-up.** In the Multi-Master EMR set-up (as per the AWS documentation), if a master node goes down, EMR spin-up a back-up master node with the same configuration and bootstrap action. In the case of installing the application using EMR Steps, the Geomesa will not get installed on the back-up master node. Due to it, I was investigating if it is possible to run bootstrap action after installing HBase. – Amit Aug 21 '19 at 18:31
  • 1
    Well, as far as I know, there is no way to do that with bootstrap. You have to submit steps again. It can be done by aws sdk or cli. – Lamanus Aug 22 '19 at 01:00
  • 1
    Yes, Fair enough. I am also thinking the same solution. – Amit Aug 22 '19 at 06:29
-1

It's ok if the script runs before HBase is provisioned. It calls the bootstrap-geomesa-hbase-aws.sh script, which will check for hbase to be installed and sleep until it's ready.

Emilio Lahr-Vivaz
  • 1,439
  • 6
  • 5
  • 1
    **It will not work.** If we run bootstrap-geomesa-hbase-aws.sh script in the bootstrap action then it *(the logic of sleep until it's ready)* is not allowing EMR to complete bootstrap action. EMR doesn't allow applications (HBase) to install till bootstrap actions is not completed. Due to above it, EMR is not allowing to install Geomesa in bootstrap action. – Amit Aug 21 '19 at 19:44
  • 1
    Ah, you're correct - the script is not meant to be run as a bootstrap action on the cluster. – Emilio Lahr-Vivaz Aug 22 '19 at 11:41
  • 1
    If we register geomesa hbase coprocessor via hbase-site.xml then do I need to install Geomesa Server on each HBase Master nodes? Considering I am using S3 based HBase cluster and Accessing Cluster via Lambda with GeoTools interface? Documentation of Geomesa: https://www.geomesa.org/documentation/user/hbase/coprocessor_install.html#register-site-wide – Amit Aug 30 '19 at 17:41