Big Data, Spark

Installing Apache Zeppelin on a Hadoop Cluster

Apache Zeppelin(https://zeppelin.incubator.apache.org/)  is a web-based notebook that enables interactive data analytics. You can make data-driven, interactive and collaborative documents with SQL, Scala and more.

This document describes the steps you can take to install Apache Zeppelin on a CentOS 7 Machine.

Steps

Note: Run all the commands as Root

Configure the Environment

Install Maven (If not already done)
cd /tmp/
wget https://archive.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
tar xzf apache-maven-3.1.1-bin.tar.gz -C /usr/local
cd /usr/local
ln -s apache-maven-3.1.1 maven
Configure Maven (If not already done)
#Run the following
export M2_HOME=/usr/local/maven
export M2=${M2_HOME}/bin
export PATH=${M2}:${PATH}

Note: If you were to login as a different user or logout these settings will be whipped out so you won’t be able to run any mvn commands. To prevent this, you can append these export statements to the end of your ~/.bashrc file:

#append the export statements
vi ~/.bashrc
#apply the export statements
source ~/.bashrc


Install NodeJS

Note: Steps referenced from https://nodejs.org/en/download/package-manager/

curl --silent --location https://rpm.nodesource.com/setup_5.x | bash -

yum install -y nodejs
Install Dependencies

Note: Used for Zeppelin Web App

yum install -y bzip2 fontconfig

Install Apache Zeppelin

Select the version you would like to install

View the available releases and select the latest:

https://github.com/apache/zeppelin/releases

Override the {APACHE_ZEPPELIN_VERSION} placeholder with the value you would like to use.


Download Apache Zeppelin
cd /opt/
wget https://github.com/apache/zeppelin/archive/{APACHE_ZEPPELIN_VERSION}.zip
unzip {APACHE_ZEPPELIN_VERSION}.zip
ln -s /opt/zeppelin-{APACHE_ZEPPELIN_VERSION-WITHOUT_V_INFRONT} /opt/zeppelin
rm {APACHE_ZEPPELIN_VERSION}.zip
Get Build Variable Values
Get Spark Version

Running the following command

spark-submit --version

Override the {SPARK_VERSION} placeholder with this value.

Example: 1.6.0

Get Hadoop Version

Running the following command

hadoop version

Override the {HADOOP_VERSION} placeholder with this value.

Example: 2.6.0-cdh5.9.0

Take the this value and get the major and minor version of Hadoop. Override the {SIMPLE_HADOOP_VERSION} placeholder with this value.

Example: 2.6

Build Apache Zeppelin

Update the bellow placeholders and run

cd /opt/zeppelin
mvn clean package -Pspark-{SPARK_VERSION} -Dhadoop.version={HADOOP_VERSION} -Phadoop-{SIMPLE_HADOOP_VERSION} -Pvendor-repo -DskipTests

Note: this process will take a while

 

Configure Apache Zeppelin

Base Zeppelin Configuration
Setup Conf
cd /opt/zeppelin/conf/
cp zeppelin-env.sh.template zeppelin-env.sh
cp zeppelin-site.xml.template zeppelin-site.xml
Setup Hive Conf
# note: verify that the path to your hive-site.xml is correct
ln -s /etc/hive/conf/hive-site.xml /opt/zeppelin/conf/
Edit zeppelin-env.sh

Uncomment export HADOOP_CONF_DIR
Set it to export HADOOP_CONF_DIR=“/etc/hadoop/conf”

Starting/Stopping Apache Zeppelin

Start Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh start
Restart Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh restart
Stop Zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh stop
Viewing Web UI

Once the zeppelin process is running you can view the WebUI by opening a web browser and navigating to:

http://{HOST}:8080/

Note: Network rules will need to allow this communication

Runtime Apache Zeppelin Configuration

Further configurations maybe needed for certain operations to work

Configure Hive in Zeppelin
  1. Open the cloudera manager and get the public host name of the machine that has the HiveServer2 role. Identify this as HIVESERVER2_HOST
  2. Open the Web UI and click the Interpreter tab
  3. Change the Hive default.url option to: jdbc:hive2://{HIVESERVER2_HOST}:10000

6 comments

  1. Kapil

    Excellent Article.I Installed as you mentioned above, but when I start, it’s failing: Zeppelin process died [FAILED]
    When I looked into the log file, this is what I found : ZEPPELIN_CLASSPATH: ::/opt/zeppelin/zeppelin-zengine/target/lib/*:/opt/zeppelin/zeppelin-interpreter/target/lib/*:/opt/zeppelin/*::/opt/zeppelin/conf:/opt/zeppelin/zeppelin-interpreter/target/classes:/opt/zeppelin/zeppelin-zengine/target/classes
    Error: Could not find or load main class org.apache.zeppelin.server.ZeppelinServer
    Any help would be appreciated.
    Thanks.

    1. Robert Sanders Post author

      Looks like somethings might be missing from the ZEPPELIN_CLASSPATH or the mvn project didn’t build correctly. The org.apache.zeppelin.server.ZeppelinServe class should have been compiled under the /opt/zeppelin/zeppelin-server module. Specifically the class should be available at the /opt/zeppelin/zeppelin-server/target/zeppelin-server-0.6.2.jar and /opt/zeppelin/zeppelin-server/target/classes/org/apache/zeppelin/server/ZeppelinServer.class.

      The zeppelin-daemon.sh command should be adding the necessary dependencies to the classpath on execution. It contains the following command when assembling the ZEPPELIN_CLASSPATH:

      if [[ -d “${ZEPPELIN_HOME}/zeppelin-server/target/classes” ]]; then
      ZEPPELIN_CLASSPATH+=”:${ZEPPELIN_HOME}/zeppelin-server/target/classes”
      fi

      Can you confirm that the zeppelin-server module was built correctly and the /opt/zeppelin/zeppelin-server/target/classes folder exists with the correct contents?

  2. Kapil

    You were right, how to fix this :

    [INFO] Zeppelin: web Application ……………………. FAILURE [1:02.029s]
    [INFO] Zeppelin: Server ……………………………. SKIPPED
    [INFO] Zeppelin: Packaging distribution ……………… SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 3:45.817s
    [INFO] Finished at: Thu Apr 06 21:13:05 PDT 2017
    [INFO] Final Memory: 225M/1893M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.3:yarn (yarn install) on project zeppelin-web: Failed to run task: ‘yarn install –no-lockfile’ failed. (error code 1) -> [Help 1]

    1. Robert Sanders Post author

      Its failing because the command “yarn install –no-lockfile” failed during execution. Do you have a yarn service installed on the cluster you’re running this installation process (NodeManager, ResourceManager or JobHistory Server)?

  3. Kapil

    Yes, Yarn is already installed.I am trying to install this on CDH 5.8 sandbox. Should I skip it while building? If so how?

  4. Kapil

    Instead of building it using MVN, I downloaded the prebuild from Zeppelin site and its working now. Thanks for your help.

Leave a Reply

Your email address will not be published. Required fields are marked *