Posted by

exeray

Posted on

October 30, 2018

Posted under

Uncategorized

Comments

Leave a comment

Installing Hive on Hadoop

1. wget https://www-us.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz

2. tar zxf apache-hive-2.3.3-bin.tar.gz

3. /bin/mv apache-hive-2.3.3-bin jaguarhive

4. cd jaguarhive/conf

5. cp hive-env.sh.template hive-env.sh

vi hive-env.sh

HADOOP_HOME=$HOME/jaguarhadoop

6. cp hive-default.xml.template hive-default.xml

vi hive-default.xml:

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/myhome/hive/metastore_db;create=true</value>
</property>

Note: ConnectionURL can be configured to use MySQL, PostgreSQL, or any other database server that supports JDBC. Metastore is used to save metadata (schema) information for Hive tables. The derby metastore is an embedded database in Hive. It can store meta data, but supports only one hive session (only one session of hive is allowed).

7. hdfs dfs -mkdir /user/hive/warehouse

8. vi $HOME/.bashrc

export HIVE_HOME=$HOME/jaguarhive

9. $ source $HOME/.bashrc

10. Init metastore

$ cd $HOME/jaguarhive

$ /bin/rm -rf metastore_db

$ $HIVE_HOME//bin/schematool -initSchema -dbType derby

11. Ready to use Hive:

$ export PATH=$PATH:$HIVE_HOME/bin

$ hive

Posted by

exeray

Posted on

October 30, 2018

Posted under

Uncategorized

Comments

Leave a comment

One-Key Install, Configure, Operate High Availability Hadoop Cluster

Download jaguar-bigdata-1.5.tar.gz from GitHub.com/datajaguar/jaguardb/bigdata/
Prepare your hosts in Hadoop cluster for High Availability (Active and Standby namenodes). Save the host names in a file: hosts
$ tar zxf jaguar-bigdata-1.5.tar.gz
$ cd jaguar-bigdata-1.5
$ make sure hosts file in this directory and has the following content:

node1
node2
node3
node4

(node1 wll be namenode1, node2 namenode2, node3 datanode, node4 datanode).

6. On each host in hosts file, make sure put the following in $HOME/.bashrc file:

$HOME/.bashrc on all hosts (assuming Hadoop to be installed in $HOME/jaguarhadoop):
export JAVA_HOME=`java -XshowSettings:properties -version 2>&1 |grep ‘java.home’|tr -d ‘ ‘|cut -d= -f2`
export HADOOP_PREFIX=$HOME/jaguarhadoop
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export PATH=$PATH:$HADOOP_HOME/bin

7. Download your favorite .tar.gz packages of Hadoop, Kafka, Spark and copy them into

jaguar-bigdata-1.5/package directory. For example:

$ cp -f /tmp/hadoop-2.8.5.tar.gz jaguar-bigdata-1.5/package

Note: you must download and copy desired packages into package directory. Otherwise they will not be installed.

8. Install the packages with one installer script:

$ cd jaguar-bigdata-1.5

$ ./install_jaguar_bigdata_on_all_hosts.sh -f hosts -hadoop

File hosts is the file with host names of the cluster on each line. If more packages are to be installed, you can use “-hadoop -kafka -spark” option or “-all” option.

9. Start Zookeeper on all hosts (Zookeeper must be first started)
$ cd $HOME/jaguarkafka/bin; ./start_zookeeper_on_all_hosts.sh

10. Start JournalNode on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./start_journalnode_on_all_hosts.sh

11. Format and start Hadoop on all hosts
$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh
$ cd $HOME/jaguarhadoop/sbin; ./start_hadoop_on_all_hosts.sh

Use bin/hdfs haadmin command to check active/standby status of namenodes:
$ hdfs haadmin -getServiceState namenode1
$ hdfs haadmin -getServiceState namenode2

12. If you have installed Kafka, Spark you can start them:

Kafka, Spark, Zepplin can be started with (if they were installed):
$ cd $HOME/jaguarkafka/bin; ./start_kafka_on_all_hosts.sh
$ cd $HOME/jaguarspark/bin; ./start_spark_on_all_hosts.sh
$ cd $HOME/jaguarzeppelin/bin; ./zeppelin-daemon.sh start

13. If you want to clean up the data in Hadoop, you can execute the following command:

$ cd $HOME/jaguarhadoop/sbin; ./format_hadoop_on_all_hosts.sh

You must be sure you really want to delete all data in Hadoop.

14. Use $HOME/jaguarhadoop/bin/hdfs command to check hdfs:
$ hdfs dfs -ls /

Posted by

exeray

Posted on

September 15, 2018

Posted under

Uncategorized

Comments

Leave a comment

Setup Chef Cluster On Centos7

1. Environment: Four hosts: HD8, HD7, HD6, HD5

On each of these hosts, there is chefadmin account with sudo privilege.

HD8: chefserver

/etc/hosts:

192.168.7.100 HD8 chefserver

192.168.7.101 HD7 chefdk

192.168.7.102 HD6 chefclient1

192.168.7.103 HD5 chefclient2

HD7: chefdk (Work Station)

/etc/hosts:

192.168.7.100 HD8 chefserver

192.168.7.101 HD7 chefdk

192.168.7.102 HD6 chefclient1

192.168.7.103 HD5 chefclient2

HD6: chefclient1

/etc/hosts:

192.168.7.100 HD8 chefserver

192.168.7.101 HD7 chefdk

192.168.7.102 HD6 chefclient1

192.168.7.103 HD5 chefclient2

HD5: chefclient2

/etc/hosts:

192.168.7.100 HD8 chefserver

192.168.7.101 HD7 chefdk

192.168.7.102 HD6 chefclient1

192.168.7.103 HD5 chefclient2

2. On HD8 (chefserver)

Use root account:

# cd /usr/local/src

# wget https://packages.chef.io/files/stable/chef-server/12.17.33/el/7/chef-server-core-12.17.33-1.el7.x86_64.rpm

# rpm -ivh chef-server-core-12.17.33-1.el7.x86_64.rpm

# chef-server-ctl reconfigure

# chef-server-ctl status

# chef-server-ctl user-create chefadmin FirstName LastName jonyue@datajaguar.com chefadminpassword -f /etc/chef/chefadmin.pem

# chef-server-ctl service-list

# chef-server-ctl user-list

# chef-server-ctl org-create datajaguar “DataJaguar, Inc” –association_user chefadmin -f /etc/chef/datajaguar-validator.pem

# firewall-cmd –permanent –zone public –add-service http

# firewall-cmd –permanent –zone public –add-service https

3. On HD7 (chefdk)

# yum install ruby

# yum install git

# cd /usr/local/src

# wget https://packages.chef.io/files/stable/chefdk/1.5.0/el/7/chefdk-1.5.0-1.el7.x86_64.rpm

# rpm -ivh chefdk-1.5.0-1.el7.x86_64.rpm

# chef verify

# useradd chefadmin

# passwd chefadmin

# su – chefadmin

In user chefadmin account:

$ echo ‘eval “$(chef shell-init bash)”‘ >> ~/.bash_profile

$ . ~/.bash_profile

$ cd ~

$ chef generate repo chef-repo

$ cd chef-repo

$ git init

$ git config –global user.name “chefadmin”

$ git config –global user.email “chefadmin@datajaguar.com”

$ mkdir .chef

$ echo ‘.chef’ >> ~/chef-repo/.gitignore

$ cd ~/chef-repo

$ git add .

$ git commit

$ scp -pr root@chefserver:/etc/chef/chefadmin.pem ~/chef-repo/.chef/

$ scp -pr root@chefserver:/etc/chef/datajaguar-validator.pem ~/chef-repo/.chef/

$ vi ~/chef-repo/.chef/knife.rb

current_dir = File.dirname(__FILE__)
log_level :info
log_location STDOUT
node_name “chefadmin”
client_key “#{current_dir}/chefadmin.pem”
validation_client_name “datajaguar-validator”
validation_key “#{current_dir}/datajaguar-validator.pem”
chef_server_url “https://HD8/organizations/datajaguar”
syntax_check_cache_path “#{ENV[‘HOME’]}/.chef/syntaxcache”
cookbook_path [“#{current_dir}/../cookbooks”]

$ knife ssl fetch

$ knife bootstrap chefclient1 -x chefadmin –sudo

(chefadmin is user account on host chefclient1. It must have sudo privilege)

$ knife bootstrap chefclient2 -x chefadmin –sudo

(chefadmin is user account on host chefclient2. It must have sudo privilege)

Posted by

exeray

Posted on

September 13, 2018

Posted under

Uncategorized

Comments

Leave a comment

Install Boost on Linux

download boost_1_68_0.tar.gz
# cp boost_1_68_0.tar.gz /usr/local/src
# tar zxf boost_1_68_0.tar.gz
# cd boost_1_68_0
# ./bootstrap.sh –prefix=/usr/local/boost_168_0
# ./b2
# ./b2 install

/usr/local/boost_168_0/include/ will contain header files

/usr/local/boost_168_0/lib/ will contain individual library files

Posted by

exeray

Posted on

September 13, 2018

Posted under

Uncategorized

Comments

Leave a comment

Install CGAL Library on Linux

1. download the source tar ball CGAL-4.12.tar.gz
2. Run the following commands as sudo or root
3. # cp CGAL-4.12.tar.gz /usr/local/src
4. # tar zxf CGAL-4.12.tar.gz
5. # cd cgal-releases-CGAL-4.12
6. # mkdir -p build/release
7. Make sure you have an updated cmake (old cmake will not work)
8. # cmake -DCMAKE_BUILD_TYPE=Release -DBoost_INCLUDE_DIR=/usr/local/boost_168_0/include ../..
9. # make
10. # make install

The CGAL header files will be in /usr/local/include/CGAL/

The library .so files will be in /usr/local/lib64/ (libCGAL.so)

Posted by

exeray

Posted on

September 3, 2017

Posted under

Uncategorized

Comments

Leave a comment

Replicating database using triggers

Suppose you have a table on any RDBMS database:

table123: column uid and column addr

You can create another table to capture insert, update, and delete operations on table123:

create table table123_trigger_table

(

ts datetime primary key,

uid int,

addr varchar(64),

action: char(1)

);

Then you can create three triggers to capture the changes in table123:

DELIMITER $$
CREATE TRIGGER after_table123_insert AFTER INSERT ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘I’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

DELIMITER $$
CREATE TRIGGER after_table123_update AFTER UPDATE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘U’,
uid = NEW.uid,
addr = NEW.addr,
ts = NOW();
END$$
DELIMITER ;

DELIMITER $$
CREATE TRIGGER after_table123_delete AFTER DELETE ON table123 FOR EACH ROW
BEGIN
INSERT INTO table123_trigger_table
SET action = ‘D’,
uid = OLD.uid,
addr = OLD.addr,
ts = NOW();
END$$
DELIMITER ;

After the 3 triggers are created, you can write a Java program to use JDBC and pull the records into target database and table. The ‘ts’ column in the trigger table is a timestamp and is primary key, which can be used to track the change time. The trigger table can be cleaned up periodically.

Posted by

exeray

Posted on

December 29, 2016

Posted under

Uncategorized

Comments

Leave a comment

HOW TO INSTALL JAVA 1.8

Suppose you want to install all Java 1.8 files into /opt directory:

(# means root prompt, $ means your regular user account)

#mkdir /opt

# cd /opt/
# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.tar.gz"

# tar xzf jdk-8u111-linux-x64.tar.gz
# ln -sf /opt/jdk1.8.0_77 /opt/java

$ export JAVA_HOME=/opt/java
$ export JRE_HOME=/opt/java/jre

$ export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

There you go, check "java -version" for the new 1.8 version!

Posted by

exeray

Posted on

April 12, 2016

Posted under

Uncategorized

Comments

Leave a comment

How to replace a failed hard disk from Linux RAID

Suppose /dev/raid1 consists of two bootable physical drives /dev/hde1 and /dev/hdf1 and /dev/hdf1 has failed. Here are the steps to replace the failed /dev/hdf1.

Step One: confirm that /dev/hdf1 failed

# cat /proc/mdstat

You should see [U_] instead of [UU] in degraded RAID1 array.

Step Two: remove the failed disk

# mdadm –manage /dev/raid1 –fail /dev/hdf1

# mdadm –manage /dev/raid1 –remove /dev/hdf1

Step Three: shutdown the system and install a new disk

# shutdown -h now

# physically install new disk

# boot up the system

Step Four: add the new disk /dev/hdf

# sfdisk -d /dev/hde | sfdisk /dev/hdf

# mdadm –manage /dev/raid1 –add /dev/hdf1

Step Five: wait for /dev/hde and /dev/hdf to become fully synchronized

# cat /proc/mdstat

Posted by

exeray

Posted on

April 11, 2016

Posted under

Uncategorized

Comments

Leave a comment

How to make software RAID on Linux

RAID is redundant arrays of inexpensive disks. In this article, we will show you how to implement RAID 1 (disk mirroring) where data is duplicated on two disks (either HDD or SSD) simultaneously.

Step One: Use two disk partitions that are of approximately the same size. For example, /dev/hde1, /dev/hdf1

Step Two: Set the type of each the disk parition to “Linux raid autodetect”

# fdisk /dev/hde
Command (m for help): m
…
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition’s system id
…
Command (m for help): t
Partition number (1-5): 1
Hex code (type L to list codes): L
…
16 Hidden FAT16 61 SpeedStor f2 DOS secondary
17 Hidden HPFS/NTF 63 GNU HURD or Sys fd Linux raid auto
18 AST SmartSleep 64 Novell Netware fe LANstep
1b Hidden Win95 FA 65 Novell Netware ff BBT
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Repeat this step for /dev/hdf1:
# fdisk /dev/hdf
… (similar to /dev/hde1) …

Step Three: create the RAID set of type 1

# mdadm –create –verbose /dev/raid1 –level=1 \
–raid-devices=2 /dev/hde1 /dev/hdf1

# cat /proc/mdstat (to confirm it is created)

Step Four: format the new RAID set

# mkfs.xfs /dev/raid1

Step Five: create config file

On Centos, Redhat, Fedora:
# mdadm –detail –scan > /etc/mdadm.conf

On Debian, Ubuntu:
# mdadm –detail –scan > /etc/mdadm/mdadm.conf

Step Six: mount the RAID set
# mkdir /mnt/raid1
# vi /etc/fstab:
/dev/raid1 /mnt/raid1 xfs defaults 1 2
# mount -a

You can check the status of all devices:
# cat /proc/mdstat

Linux software RAID provides redundancy across hard disks, but it is slower than a hardware-based RAID disk controller, which is usually done via the system BIOS and transparent to Linux.

Posted by

exeray

Posted on

February 24, 2016

Posted under

Uncategorized

Comments

Leave a comment