Installing Hive on Hadoop

1.  wget https://www-us.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz

2. tar zxf apache-hive-2.3.3-bin.tar.gz

3. /bin/mv apache-hive-2.3.3-bin  jaguarhive

4. cd jaguarhive/conf

5. cp  hive-env.sh.template hive-env.sh

vi  hive-env.sh

HADOOP_HOME=$HOME/jaguarhadoop

6.  cp hive-default.xml.template hive-default.xml

vi hive-default.xml:

      <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/myhome/hive/metastore_db;create=true</value>
</property>

Note: ConnectionURL can be configured to use MySQL, PostgreSQL, or any other database server that supports JDBC. Metastore is used to save metadata (schema) information for Hive tables. The derby metastore is an embedded database in Hive. It can store meta data, but supports only one hive session (only one session of hive is allowed).

7.   hdfs dfs -mkdir /user/hive/warehouse

8.  vi $HOME/.bashrc

export HIVE_HOME=$HOME/jaguarhive

9. $ source  $HOME/.bashrc

10.  Init metastore

$  cd $HOME/jaguarhive

$ /bin/rm -rf metastore_db

$  $HIVE_HOME//bin/schematool -initSchema -dbType derby

11. Ready to use Hive:

$  export PATH=$PATH:$HIVE_HOME/bin

$  hive

 

One-Key Install, Configure, Operate High Availability Hadoop Cluster

  1. Download jaguar-bigdata-1.5.tar.gz from GitHub.com/datajaguar/jaguardb/bigdata/
  2. Prepare your hosts in Hadoop cluster for High Availability (Active and Standby namenodes). Save the host names in a file: hosts
  3. $ tar zxf  jaguar-bigdata-1.5.tar.gz
  4. $ cd jaguar-bigdata-1.5
  5. $ make sure hosts file in this directory and has the following content:

node1
node2
node3
node4

(node1 wll be namenode1, node2 namenode2, node3 datanode, node4 datanode).

6. On each host in hosts file, make sure put the following in $HOME/.bashrc file:

$HOME/.bashrc on all hosts (assuming Hadoop to be installed in $HOME/jaguarhadoop):
export JAVA_HOME=`java -XshowSettings:properties -version 2>&1 |grep ‘java.home’|tr -d ‘ ‘|cut -d= -f2`
export HADOOP_PREFIX=$HOME/jaguarhadoop
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export PATH=$PATH:$HADOOP_HOME/bin

7. Download your favorite .tar.gz packages of Hadoop, Kafka, Spark and copy them into

jaguar-bigdata-1.5/package directory.  For example:

$ cp -f  /tmp/hadoop-2.8.5.tar.gz  jaguar-bigdata-1.5/package

Note: you must download and copy desired packages into package directory. Otherwise they will not be installed.

8. Install the packages with one installer script:

$ cd   jaguar-bigdata-1.5

$  ./install_jaguar_bigdata_on_all_hosts.sh -f  hosts  -hadoop

File hosts is the file with host names of the cluster on each line.  If more packages are to be installed, you can use  “-hadoop -kafka  -spark” option or “-all” option.

9.  Start Zookeeper on all hosts (Zookeeper must be first started)
$ cd $HOME/jaguarkafka/bin; ./start_zookeeper_on_all_hosts.sh

10. Start JournalNode on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./start_journalnode_on_all_hosts.sh

11. Format and start Hadoop on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh
$ cd $HOME/jaguarhadoop/sbin; ./start_hadoop_on_all_hosts.sh

Use bin/hdfs haadmin command to check active/standby status of namenodes:
$ hdfs haadmin -getServiceState namenode1
$ hdfs haadmin -getServiceState namenode2

12. If you have installed Kafka, Spark you can start them:

Kafka, Spark, Zepplin can be started with (if they were installed):
$ cd $HOME/jaguarkafka/bin; ./start_kafka_on_all_hosts.sh
$ cd $HOME/jaguarspark/bin; ./start_spark_on_all_hosts.sh
$ cd $HOME/jaguarzeppelin/bin; ./zeppelin-daemon.sh start

13. If you want to clean up the data in Hadoop, you can execute the following command:

$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh

You must be sure you really want to delete all data in Hadoop.

14.   Use $HOME/jaguarhadoop/bin/hdfs command to check hdfs:
$ hdfs dfs -ls /

 

Setup Chef Cluster On Centos7

1.   Environment: Four hosts:  HD8, HD7, HD6, HD5

On each of these hosts, there is chefadmin account with sudo privilege.

HD8: chefserver

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD7: chefdk (Work Station)

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD6: chefclient1

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

HD5: chefclient2

/etc/hosts:

192.168.7.100  HD8   chefserver

192.168.7.101  HD7   chefdk

192.168.7.102 HD6  chefclient1

192.168.7.103 HD5  chefclient2

 

2.  On HD8 (chefserver)

Use root account:

#  cd /usr/local/src

#  wget https://packages.chef.io/files/stable/chef-server/12.17.33/el/7/chef-server-core-12.17.33-1.el7.x86_64.rpm

#  rpm -ivh chef-server-core-12.17.33-1.el7.x86_64.rpm

#  chef-server-ctl reconfigure

#   chef-server-ctl status

#   chef-server-ctl user-create chefadmin FirstName LastName jonyue@datajaguar.com chefadminpassword  -f /etc/chef/chefadmin.pem

#  chef-server-ctl service-list

#   chef-server-ctl user-list

#  chef-server-ctl org-create datajaguar “DataJaguar, Inc” –association_user chefadmin -f /etc/chef/datajaguar-validator.pem

#  firewall-cmd –permanent –zone public –add-service http

#  firewall-cmd –permanent –zone public –add-service https

 

3.  On HD7 (chefdk)

#  yum install ruby

# yum install git

# cd /usr/local/src

#  wget https://packages.chef.io/files/stable/chefdk/1.5.0/el/7/chefdk-1.5.0-1.el7.x86_64.rpm

#  rpm -ivh chefdk-1.5.0-1.el7.x86_64.rpm

#   chef verify

#  useradd chefadmin

# passwd chefadmin

# su – chefadmin

In user chefadmin account:

$ echo ‘eval “$(chef shell-init bash)”‘ >> ~/.bash_profile

$  .  ~/.bash_profile

$  cd ~

$  chef generate repo chef-repo

$  cd chef-repo

$  git init

$ git config –global user.name “chefadmin”

$  git config –global user.email “chefadmin@datajaguar.com”

$  mkdir  .chef

$  echo ‘.chef’ >> ~/chef-repo/.gitignore

$ cd  ~/chef-repo

$ git add .

$ git commit

$  scp -pr root@chefserver:/etc/chef/chefadmin.pem ~/chef-repo/.chef/

$  scp -pr root@chefserver:/etc/chef/datajaguar-validator.pem ~/chef-repo/.chef/

$   vi ~/chef-repo/.chef/knife.rb

current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name “chefadmin”
client_key “#{current_dir}/chefadmin.pem”
validation_client_name “datajaguar-validator”
validation_key “#{current_dir}/datajaguar-validator.pem”
chef_server_url “https://HD8/organizations/datajaguar&#8221;
syntax_check_cache_path “#{ENV[‘HOME’]}/.chef/syntaxcache”
cookbook_path [“#{current_dir}/../cookbooks”]

$ knife ssl fetch

$ knife bootstrap chefclient1 -x chefadmin –sudo

(chefadmin is user account on host chefclient1. It must have sudo privilege)

$   knife bootstrap chefclient2 -x chefadmin –sudo

(chefadmin is user account on host chefclient2. It must have sudo privilege)

 

 

 

Install Boost on Linux

  1. download boost_1_68_0.tar.gz
  2. # cp boost_1_68_0.tar.gz /usr/local/src
  3. # tar zxf boost_1_68_0.tar.gz
  4. # cd boost_1_68_0
  5. # ./bootstrap.sh –prefix=/usr/local/boost_168_0
  6. # ./b2
  7. # ./b2 install

/usr/local/boost_168_0/include/ will contain header files

/usr/local/boost_168_0/lib/  will contain individual library files

Install CGAL Library on Linux

    1. download the source tar ball CGAL-4.12.tar.gz
    2. Run the following commands as sudo or root
    3. # cp CGAL-4.12.tar.gz  /usr/local/src
    4. # tar zxf CGAL-4.12.tar.gz
    5. # cd cgal-releases-CGAL-4.12
    6. # mkdir -p build/release
    7. Make sure you have an updated cmake (old cmake will not work)
    8. # cmake -DCMAKE_BUILD_TYPE=Release -DBoost_INCLUDE_DIR=/usr/local/boost_168_0/include ../..
    9. # make
    10. # make install

The CGAL header files will be in /usr/local/include/CGAL/

The library .so files will be in /usr/local/lib64/   (libCGAL.so)

Replicating database using triggers

Suppose you have a table on any RDBMS database:

table123:  column uid and column addr

You can create another table to capture insert, update, and delete operations on table123:

create table table123_trigger_table

(

ts datetime primary key,

uid int,

addr varchar(64),

action: char(1)

);

 

Then you can create three triggers to capture the changes in table123:

DELIMITER $$
CREATE TRIGGER after_table123_insert AFTER INSERT ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘I’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

 

DELIMITER $$
CREATE TRIGGER after_table123_update AFTER UPDATE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘U’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

 

DELIMITER $$
CREATE TRIGGER after_table123_delete AFTER DELETE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘D’,
uid = OLD.uid,
addr = OLD.addr,
ts = NOW();
END$$
DELIMITER ;

 

After the 3 triggers are created, you can write a Java program to use JDBC and pull the records into target database and table. The ‘ts’ column in the trigger table is a timestamp and is primary key, which can be used to track the change time. The trigger table can be cleaned up periodically.

 

 

HOW TO INSTALL JAVA 1.8

Suppose you want to install all Java 1.8 files into  /opt directory:

(# means root prompt, $ means your regular user account)

#mkdir /opt

# cd /opt/
# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.tar.gz"

# tar xzf jdk-8u111-linux-x64.tar.gz
# ln -sf /opt/jdk1.8.0_77 /opt/java

$ export JAVA_HOME=/opt/java
$ export JRE_HOME=/opt/java/jre

$ export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

There you go, check "java -version" for the new 1.8 version!

 

How to replace a failed hard disk from Linux RAID

Suppose /dev/raid1 consists of two bootable physical drives  /dev/hde1 and /dev/hdf1 and /dev/hdf1 has failed. Here are the steps to replace the failed /dev/hdf1.

Step One: confirm that /dev/hdf1 failed

# cat /proc/mdstat

You should see [U_] instead of [UU] in degraded RAID1 array.

 

Step Two: remove the failed disk

# mdadm   –manage /dev/raid1   –fail /dev/hdf1

# mdadm –manage /dev/raid1   –remove /dev/hdf1

 

Step Three:  shutdown the system and install a new disk

# shutdown -h now

# physically install new disk

# boot up the system

 

Step Four:  add the new disk  /dev/hdf

#  sfdisk -d /dev/hde | sfdisk /dev/hdf

# mdadm –manage /dev/raid1  –add /dev/hdf1

 

Step Five: wait for /dev/hde and /dev/hdf  to become fully synchronized

#  cat /proc/mdstat

 

 

How to make software RAID on Linux

RAID is redundant arrays of inexpensive disks.  In this article, we will show you how to implement RAID 1 (disk mirroring) where data is duplicated on two disks (either HDD or SSD) simultaneously.

Step One: Use two disk partitions that are of approximately the same size. For example, /dev/hde1, /dev/hdf1

Step Two:  Set the type of each the disk parition to “Linux raid autodetect”

# fdisk   /dev/hde
Command (m for help): m

p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition’s system id

Command (m for help): t
Partition number (1-5): 1
Hex code (type L to list codes): L

16 Hidden FAT16 61 SpeedStor f2 DOS secondary
17 Hidden HPFS/NTF 63 GNU HURD or Sys fd Linux raid auto
18 AST SmartSleep 64 Novell Netware fe LANstep
1b Hidden Win95 FA 65 Novell Netware ff BBT
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Repeat this step for /dev/hdf1:
# fdisk  /dev/hdf
… (similar to /dev/hde1) …

Step Three: create the RAID set of type 1

# mdadm  –create  –verbose  /dev/raid1  –level=1 \
–raid-devices=2  /dev/hde1  /dev/hdf1

# cat /proc/mdstat   (to confirm it is created)

Step Four: format the new RAID set

# mkfs.xfs   /dev/raid1

Step Five: create config file

On Centos, Redhat, Fedora:
# mdadm –detail –scan > /etc/mdadm.conf

On Debian, Ubuntu:
# mdadm –detail –scan > /etc/mdadm/mdadm.conf

Step Six: mount the RAID set
# mkdir  /mnt/raid1
# vi  /etc/fstab:
/dev/raid1    /mnt/raid1     xfs      defaults     1 2
# mount  -a

You can check the status of all devices:
# cat  /proc/mdstat

Linux software RAID provides redundancy across hard disks, but it is slower than a hardware-based RAID disk controller, which is usually done via the system BIOS and transparent to Linux.

How to upgrade Linux Kernel

Here are the steps to upgrade any Linux system (Fedora, Centos, etc) to a newer version of Linux:

  1.  # wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.4.2.tar.xz
  2.  # tar xvfJ linux-4.4.2.tar.xz
  3. # cd linux-4.4.2
  4. # mkdir -p /home/name/build/kernel
  5. # make O=/home/name/build/kernel defconfig
  6. # vi /home/name/build/kernel/.config
  7.   CONFIG_R8169=y   (for your network card)
  8.   CONFIG_XFS_FS=y
  9.   CONFIG_EFI=y
  10.   CONFIG_EFI_STUB=y
  11. CONFIG_FUSE_FS=y
  12.   # make O=/home/name/build/kernel
  13.  # make O=/home/name/build/kernel modules_install install

 

Reboot and you will have a new version of  Linux.