Installing Hive on Hadoop

1.  wget https://www-us.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz

2. tar zxf apache-hive-2.3.3-bin.tar.gz

3. /bin/mv apache-hive-2.3.3-bin  jaguarhive

4. cd jaguarhive/conf

5. cp  hive-env.sh.template hive-env.sh

vi  hive-env.sh

HADOOP_HOME=$HOME/jaguarhadoop

6.  cp hive-default.xml.template hive-default.xml

vi hive-default.xml:

      <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/myhome/hive/metastore_db;create=true</value>
</property>

Note: ConnectionURL can be configured to use MySQL, PostgreSQL, or any other database server that supports JDBC. Metastore is used to save metadata (schema) information for Hive tables. The derby metastore is an embedded database in Hive. It can store meta data, but supports only one hive session (only one session of hive is allowed).

7.   hdfs dfs -mkdir /user/hive/warehouse

8.  vi $HOME/.bashrc

export HIVE_HOME=$HOME/jaguarhive

9. $ source  $HOME/.bashrc

10.  Init metastore

$  cd $HOME/jaguarhive

$ /bin/rm -rf metastore_db

$  $HIVE_HOME//bin/schematool -initSchema -dbType derby

11. Ready to use Hive:

$  export PATH=$PATH:$HIVE_HOME/bin

$  hive

 

Advertisements

One-Key Install, Configure, Operate High Availability Hadoop Cluster

  1. Download jaguar-bigdata-1.5.tar.gz from GitHub.com/datajaguar/jaguardb/bigdata/
  2. Prepare your hosts in Hadoop cluster for High Availability (Active and Standby namenodes). Save the host names in a file: hosts
  3. $ tar zxf  jaguar-bigdata-1.5.tar.gz
  4. $ cd jaguar-bigdata-1.5
  5. $ make sure hosts file in this directory and has the following content:

node1
node2
node3
node4

(node1 wll be namenode1, node2 namenode2, node3 datanode, node4 datanode).

6. On each host in hosts file, make sure put the following in $HOME/.bashrc file:

$HOME/.bashrc on all hosts (assuming Hadoop to be installed in $HOME/jaguarhadoop):
export JAVA_HOME=`java -XshowSettings:properties -version 2>&1 |grep ‘java.home’|tr -d ‘ ‘|cut -d= -f2`
export HADOOP_PREFIX=$HOME/jaguarhadoop
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export PATH=$PATH:$HADOOP_HOME/bin

7. Download your favorite .tar.gz packages of Hadoop, Kafka, Spark and copy them into

jaguar-bigdata-1.5/package directory.  For example:

$ cp -f  /tmp/hadoop-2.8.5.tar.gz  jaguar-bigdata-1.5/package

Note: you must download and copy desired packages into package directory. Otherwise they will not be installed.

8. Install the packages with one installer script:

$ cd   jaguar-bigdata-1.5

$  ./install_jaguar_bigdata_on_all_hosts.sh -f  hosts  -hadoop

File hosts is the file with host names of the cluster on each line.  If more packages are to be installed, you can use  “-hadoop -kafka  -spark” option or “-all” option.

9.  Start Zookeeper on all hosts (Zookeeper must be first started)
$ cd $HOME/jaguarkafka/bin; ./start_zookeeper_on_all_hosts.sh

10. Start JournalNode on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./start_journalnode_on_all_hosts.sh

11. Format and start Hadoop on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh
$ cd $HOME/jaguarhadoop/sbin; ./start_hadoop_on_all_hosts.sh

Use bin/hdfs haadmin command to check active/standby status of namenodes:
$ hdfs haadmin -getServiceState namenode1
$ hdfs haadmin -getServiceState namenode2

12. If you have installed Kafka, Spark you can start them:

Kafka, Spark, Zepplin can be started with (if they were installed):
$ cd $HOME/jaguarkafka/bin; ./start_kafka_on_all_hosts.sh
$ cd $HOME/jaguarspark/bin; ./start_spark_on_all_hosts.sh
$ cd $HOME/jaguarzeppelin/bin; ./zeppelin-daemon.sh start

13. If you want to clean up the data in Hadoop, you can execute the following command:

$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh

You must be sure you really want to delete all data in Hadoop.

14.   Use $HOME/jaguarhadoop/bin/hdfs command to check hdfs:
$ hdfs dfs -ls /

 

Setup Chef Cluster On Centos7

1.   Environment: Four hosts:  HD8, HD7, HD6, HD5

On each of these hosts, there is chefadmin account with sudo privilege.

HD8: chefserver

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD7: chefdk (Work Station)

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD6: chefclient1

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD5: chefclient2

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

2.  On HD8 (chefserver)

Use root account:

#  cd /usr/local/src

#  wget https://packages.chef.io/files/stable/chef-server/12.17.33/el/7/chef-server-core-12.17.33-1.el7.x86_64.rpm

#  rpm -ivh chef-server-core-12.17.33-1.el7.x86_64.rpm

#  chef-server-ctl reconfigure

#   chef-server-ctl status

#   chef-server-ctl user-create chefadmin FirstName LastName jonyue@datajaguar.com chefadminpassword  -f /etc/chef/chefadmin.pem

#  chef-server-ctl service-list

#   chef-server-ctl user-list

#  chef-server-ctl org-create datajaguar “DataJaguar, Inc” –association_user chefadmin -f /etc/chef/datajaguar-validator.pem

#  firewall-cmd –permanent –zone public –add-service http

#  firewall-cmd –permanent –zone public –add-service https

 

3.  On HD7 (chefdk)

#  yum install ruby

# yum install git

# cd /usr/local/src

#  wget https://packages.chef.io/files/stable/chefdk/1.5.0/el/7/chefdk-1.5.0-1.el7.x86_64.rpm

#  rpm -ivh chefdk-1.5.0-1.el7.x86_64.rpm

#   chef verify

#  useradd chefadmin

# passwd chefadmin

# su – chefadmin

In user chefadmin account:

$ echo ‘eval “$(chef shell-init bash)”‘ >> ~/.bash_profile

$  .  ~/.bash_profile

$  cd ~

$  chef generate repo chef-repo

$  cd chef-repo

$  git init

$ git config –global user.name “chefadmin”

$  git config –global user.email “chefadmin@datajaguar.com”

$  mkdir  .chef

$  echo ‘.chef’ >> ~/chef-repo/.gitignore

$ cd  ~/chef-repo

$ git add .

$ git commit

$  scp -pr root@chefserver:/etc/chef/chefadmin.pem ~/chef-repo/.chef/

$  scp -pr root@chefserver:/etc/chef/datajaguar-validator.pem ~/chef-repo/.chef/

$   vi ~/chef-repo/.chef/knife.rb

current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name “chefadmin”
client_key “#{current_dir}/chefadmin.pem”
validation_client_name “datajaguar-validator”
validation_key “#{current_dir}/datajaguar-validator.pem”
chef_server_url “https://HD8/organizations/datajaguar&#8221;
syntax_check_cache_path “#{ENV[‘HOME’]}/.chef/syntaxcache”
cookbook_path [“#{current_dir}/../cookbooks”]

$ knife ssl fetch

$ knife bootstrap chefclient1 -x chefadmin –sudo

(chefadmin is user account on host chefclient1. It must have sudo privilege)

$   knife bootstrap chefclient2 -x chefadmin –sudo

(chefadmin is user account on host chefclient2. It must have sudo privilege)

 

 

 

Install Boost on Linux

  1. download boost_1_68_0.tar.gz
  2. # cp boost_1_68_0.tar.gz /usr/local/src
  3. # tar zxf boost_1_68_0.tar.gz
  4. # cd boost_1_68_0
  5. # ./bootstrap.sh –prefix=/usr/local/boost_168_0
  6. # ./b2
  7. # ./b2 install

/usr/local/boost_168_0/include/ will contain header files

/usr/local/boost_168_0/lib/  will contain individual library files

Install CGAL Library on Linux

    1. download the source tar ball CGAL-4.12.tar.gz
    2. Run the following commands as sudo or root
    3. # cp CGAL-4.12.tar.gz  /usr/local/src
    4. # tar zxf CGAL-4.12.tar.gz
    5. # cd cgal-releases-CGAL-4.12
    6. # mkdir -p build/release
    7. Make sure you have an updated cmake (old cmake will not work)
    8. # cmake -DCMAKE_BUILD_TYPE=Release -DBoost_INCLUDE_DIR=/usr/local/boost_168_0/include ../..
    9. # make
    10. # make install

The CGAL header files will be in /usr/local/include/CGAL/

The library .so files will be in /usr/local/lib64/   (libCGAL.so)

Replicating database using triggers

Suppose you have a table on any RDBMS database:

table123:  column uid and column addr

You can create another table to capture insert, update, and delete operations on table123:

create table table123_trigger_table

(

ts datetime primary key,

uid int,

addr varchar(64),

action: char(1)

);

 

Then you can create three triggers to capture the changes in table123:

DELIMITER $$
CREATE TRIGGER after_table123_insert AFTER INSERT ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘I’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

 

DELIMITER $$
CREATE TRIGGER after_table123_update AFTER UPDATE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘U’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

 

DELIMITER $$
CREATE TRIGGER after_table123_delete AFTER DELETE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘D’,
uid = OLD.uid,
addr = OLD.addr,
ts = NOW();
END$$
DELIMITER ;

 

After the 3 triggers are created, you can write a Java program to use JDBC and pull the records into target database and table. The ‘ts’ column in the trigger table is a timestamp and is primary key, which can be used to track the change time. The trigger table can be cleaned up periodically.

 

 

HOW TO INSTALL JAVA 1.8

Suppose you want to install all Java 1.8 files into  /opt directory:

(# means root prompt, $ means your regular user account)

#mkdir /opt

# cd /opt/
# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.tar.gz"

# tar xzf jdk-8u111-linux-x64.tar.gz
# ln -sf /opt/jdk1.8.0_77 /opt/java

$ export JAVA_HOME=/opt/java
$ export JRE_HOME=/opt/java/jre

$ export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

There you go, check "java -version" for the new 1.8 version!