Installing Hive on Hadoop

1.  wget

2. tar zxf apache-hive-2.3.3-bin.tar.gz

3. /bin/mv apache-hive-2.3.3-bin  jaguarhive

4. cd jaguarhive/conf

5. cp



6.  cp hive-default.xml.template hive-default.xml

vi hive-default.xml:


Note: ConnectionURL can be configured to use MySQL, PostgreSQL, or any other database server that supports JDBC. Metastore is used to save metadata (schema) information for Hive tables. The derby metastore is an embedded database in Hive. It can store meta data, but supports only one hive session (only one session of hive is allowed).

7.   hdfs dfs -mkdir /user/hive/warehouse

8.  vi $HOME/.bashrc

export HIVE_HOME=$HOME/jaguarhive

9. $ source  $HOME/.bashrc

10.  Init metastore

$  cd $HOME/jaguarhive

$ /bin/rm -rf metastore_db

$  $HIVE_HOME//bin/schematool -initSchema -dbType derby

11. Ready to use Hive:

$  export PATH=$PATH:$HIVE_HOME/bin

$  hive


One-Key Install, Configure, Operate High Availability Hadoop Cluster

  1. Download jaguar-bigdata-1.5.tar.gz from
  2. Prepare your hosts in Hadoop cluster for High Availability (Active and Standby namenodes). Save the host names in a file: hosts
  3. $ tar zxf  jaguar-bigdata-1.5.tar.gz
  4. $ cd jaguar-bigdata-1.5
  5. $ make sure hosts file in this directory and has the following content:


(node1 wll be namenode1, node2 namenode2, node3 datanode, node4 datanode).

6. On each host in hosts file, make sure put the following in $HOME/.bashrc file:

$HOME/.bashrc on all hosts (assuming Hadoop to be installed in $HOME/jaguarhadoop):
export JAVA_HOME=`java -XshowSettings:properties -version 2>&1 |grep ‘java.home’|tr -d ‘ ‘|cut -d= -f2`
export HADOOP_PREFIX=$HOME/jaguarhadoop

7. Download your favorite .tar.gz packages of Hadoop, Kafka, Spark and copy them into

jaguar-bigdata-1.5/package directory.  For example:

$ cp -f  /tmp/hadoop-2.8.5.tar.gz  jaguar-bigdata-1.5/package

Note: you must download and copy desired packages into package directory. Otherwise they will not be installed.

8. Install the packages with one installer script:

$ cd   jaguar-bigdata-1.5

$  ./ -f  hosts  -hadoop

File hosts is the file with host names of the cluster on each line.  If more packages are to be installed, you can use  “-hadoop -kafka  -spark” option or “-all” option.

9.  Start Zookeeper on all hosts (Zookeeper must be first started)
$ cd $HOME/jaguarkafka/bin; ./

10. Start JournalNode on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./

11. Format and start Hadoop on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./
$ cd $HOME/jaguarhadoop/sbin; ./

Use bin/hdfs haadmin command to check active/standby status of namenodes:
$ hdfs haadmin -getServiceState namenode1
$ hdfs haadmin -getServiceState namenode2

12. If you have installed Kafka, Spark you can start them:

Kafka, Spark, Zepplin can be started with (if they were installed):
$ cd $HOME/jaguarkafka/bin; ./
$ cd $HOME/jaguarspark/bin; ./
$ cd $HOME/jaguarzeppelin/bin; ./ start

13. If you want to clean up the data in Hadoop, you can execute the following command:

$ cd $HOME/jaguarhadoop/sbin; ./

You must be sure you really want to delete all data in Hadoop.

14.   Use $HOME/jaguarhadoop/bin/hdfs command to check hdfs:
$ hdfs dfs -ls /


Setup Chef Cluster On Centos7

1.   Environment: Four hosts:  HD8, HD7, HD6, HD5

On each of these hosts, there is chefadmin account with sudo privilege.

HD8: chefserver

/etc/hosts:  HD8   chefserver  HD7   chefdk HD6  chefclient1 HD5  chefclient2


HD7: chefdk (Work Station)

/etc/hosts:  HD8   chefserver  HD7   chefdk HD6  chefclient1 HD5  chefclient2


HD6: chefclient1

/etc/hosts:  HD8   chefserver  HD7   chefdk HD6  chefclient1 HD5  chefclient2


HD5: chefclient2

/etc/hosts:  HD8   chefserver  HD7   chefdk HD6  chefclient1 HD5  chefclient2


2.  On HD8 (chefserver)

Use root account:

#  cd /usr/local/src

#  wget

#  rpm -ivh chef-server-core-12.17.33-1.el7.x86_64.rpm

#  chef-server-ctl reconfigure

#   chef-server-ctl status

#   chef-server-ctl user-create chefadmin FirstName LastName chefadminpassword  -f /etc/chef/chefadmin.pem

#  chef-server-ctl service-list

#   chef-server-ctl user-list

#  chef-server-ctl org-create datajaguar “DataJaguar, Inc” –association_user chefadmin -f /etc/chef/datajaguar-validator.pem

#  firewall-cmd –permanent –zone public –add-service http

#  firewall-cmd –permanent –zone public –add-service https


3.  On HD7 (chefdk)

#  yum install ruby

# yum install git

# cd /usr/local/src

#  wget

#  rpm -ivh chefdk-1.5.0-1.el7.x86_64.rpm

#   chef verify

#  useradd chefadmin

# passwd chefadmin

# su – chefadmin

In user chefadmin account:

$ echo ‘eval “$(chef shell-init bash)”‘ >> ~/.bash_profile

$  .  ~/.bash_profile

$  cd ~

$  chef generate repo chef-repo

$  cd chef-repo

$  git init

$ git config –global “chefadmin”

$  git config –global “”

$  mkdir  .chef

$  echo ‘.chef’ >> ~/chef-repo/.gitignore

$ cd  ~/chef-repo

$ git add .

$ git commit

$  scp -pr root@chefserver:/etc/chef/chefadmin.pem ~/chef-repo/.chef/

$  scp -pr root@chefserver:/etc/chef/datajaguar-validator.pem ~/chef-repo/.chef/

$   vi ~/chef-repo/.chef/knife.rb

current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name “chefadmin”
client_key “#{current_dir}/chefadmin.pem”
validation_client_name “datajaguar-validator”
validation_key “#{current_dir}/datajaguar-validator.pem”
chef_server_url “https://HD8/organizations/datajaguar&#8221;
syntax_check_cache_path “#{ENV[‘HOME’]}/.chef/syntaxcache”
cookbook_path [“#{current_dir}/../cookbooks”]

$ knife ssl fetch

$ knife bootstrap chefclient1 -x chefadmin –sudo

(chefadmin is user account on host chefclient1. It must have sudo privilege)

$   knife bootstrap chefclient2 -x chefadmin –sudo

(chefadmin is user account on host chefclient2. It must have sudo privilege)




Install Boost on Linux

  1. download boost_1_68_0.tar.gz
  2. # cp boost_1_68_0.tar.gz /usr/local/src
  3. # tar zxf boost_1_68_0.tar.gz
  4. # cd boost_1_68_0
  5. # ./ –prefix=/usr/local/boost_168_0
  6. # ./b2
  7. # ./b2 install

/usr/local/boost_168_0/include/ will contain header files

/usr/local/boost_168_0/lib/  will contain individual library files

Install CGAL Library on Linux

    1. download the source tar ball CGAL-4.12.tar.gz
    2. Run the following commands as sudo or root
    3. # cp CGAL-4.12.tar.gz  /usr/local/src
    4. # tar zxf CGAL-4.12.tar.gz
    5. # cd cgal-releases-CGAL-4.12
    6. # mkdir -p build/release
    7. Make sure you have an updated cmake (old cmake will not work)
    8. # cmake -DCMAKE_BUILD_TYPE=Release -DBoost_INCLUDE_DIR=/usr/local/boost_168_0/include ../..
    9. # make
    10. # make install

The CGAL header files will be in /usr/local/include/CGAL/

The library .so files will be in /usr/local/lib64/   (

Replicating database using triggers

Suppose you have a table on any RDBMS database:

table123:  column uid and column addr

You can create another table to capture insert, update, and delete operations on table123:

create table table123_trigger_table


ts datetime primary key,

uid int,

addr varchar(64),

action: char(1)



Then you can create three triggers to capture the changes in table123:

CREATE TRIGGER after_table123_insert AFTER INSERT ON table123 FOR EACH ROW
INSERT INTO table123_trigger_table
SET action = ‘I’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();


CREATE TRIGGER after_table123_update AFTER UPDATE ON table123 FOR EACH ROW
INSERT INTO table123_trigger_table
SET action = ‘U’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();


CREATE TRIGGER after_table123_delete AFTER DELETE ON table123 FOR EACH ROW
INSERT INTO table123_trigger_table
SET action = ‘D’,
uid = OLD.uid,
addr = OLD.addr,
ts = NOW();


After the 3 triggers are created, you can write a Java program to use JDBC and pull the records into target database and table. The ‘ts’ column in the trigger table is a timestamp and is primary key, which can be used to track the change time. The trigger table can be cleaned up periodically.




Suppose you want to install all Java 1.8 files into  /opt directory:

(# means root prompt, $ means your regular user account)

#mkdir /opt

# cd /opt/
# wget --no-cookies --no-check-certificate --header "Cookie:; oraclelicense=accept-securebackup-cookie" ""

# tar xzf jdk-8u111-linux-x64.tar.gz
# ln -sf /opt/jdk1.8.0_77 /opt/java

$ export JAVA_HOME=/opt/java
$ export JRE_HOME=/opt/java/jre

$ export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

There you go, check "java -version" for the new 1.8 version!