How to replace a failed hard disk from Linux RAID

Suppose /dev/raid1 consists of two bootable physical drives  /dev/hde1 and /dev/hdf1 and /dev/hdf1 has failed. Here are the steps to replace the failed /dev/hdf1.

Step One: confirm that /dev/hdf1 failed

# cat /proc/mdstat

You should see [U_] instead of [UU] in degraded RAID1 array.


Step Two: remove the failed disk

# mdadm   –manage /dev/raid1   –fail /dev/hdf1

# mdadm –manage /dev/raid1   –remove /dev/hdf1


Step Three:  shutdown the system and install a new disk

# shutdown -h now

# physically install new disk

# boot up the system


Step Four:  add the new disk  /dev/hdf

#  sfdisk -d /dev/hde | sfdisk /dev/hdf

# mdadm –manage /dev/raid1  –add /dev/hdf1


Step Five: wait for /dev/hde and /dev/hdf  to become fully synchronized

#  cat /proc/mdstat




How to make software RAID on Linux

RAID is redundant arrays of inexpensive disks.  In this article, we will show you how to implement RAID 1 (disk mirroring) where data is duplicated on two disks (either HDD or SSD) simultaneously.

Step One: Use two disk partitions that are of approximately the same size. For example, /dev/hde1, /dev/hdf1

Step Two:  Set the type of each the disk parition to “Linux raid autodetect”

# fdisk   /dev/hde
Command (m for help): m

p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition’s system id

Command (m for help): t
Partition number (1-5): 1
Hex code (type L to list codes): L

16 Hidden FAT16 61 SpeedStor f2 DOS secondary
17 Hidden HPFS/NTF 63 GNU HURD or Sys fd Linux raid auto
18 AST SmartSleep 64 Novell Netware fe LANstep
1b Hidden Win95 FA 65 Novell Netware ff BBT
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Repeat this step for /dev/hdf1:
# fdisk  /dev/hdf
… (similar to /dev/hde1) …

Step Three: create the RAID set of type 1

# mdadm  –create  –verbose  /dev/raid1  –level=1 \
–raid-devices=2  /dev/hde1  /dev/hdf1

# cat /proc/mdstat   (to confirm it is created)

Step Four: format the new RAID set

# mkfs.xfs   /dev/raid1

Step Five: create config file

On Centos, Redhat, Fedora:
# mdadm –detail –scan > /etc/mdadm.conf

On Debian, Ubuntu:
# mdadm –detail –scan > /etc/mdadm/mdadm.conf

Step Six: mount the RAID set
# mkdir  /mnt/raid1
# vi  /etc/fstab:
/dev/raid1    /mnt/raid1     xfs      defaults     1 2
# mount  -a

You can check the status of all devices:
# cat  /proc/mdstat

Linux software RAID provides redundancy across hard disks, but it is slower than a hardware-based RAID disk controller, which is usually done via the system BIOS and transparent to Linux.

How to upgrade Linux Kernel

Here are the steps to upgrade any Linux system (Fedora, Centos, etc) to a newer version of Linux:

  1.  # wget
  2.  # tar xvfJ linux-4.4.2.tar.xz
  3. # cd linux-4.4.2
  4. # mkdir -p /home/name/build/kernel
  5. # make O=/home/name/build/kernel defconfig
  6. # vi /home/name/build/kernel/.config
  7.   CONFIG_R8169=y   (for your network card)
  8.   CONFIG_XFS_FS=y
  9.   CONFIG_EFI=y
  12.   # make O=/home/name/build/kernel
  13.  # make O=/home/name/build/kernel modules_install install


Reboot and you will have a new version of  Linux.


How to upgrade g++ compiler

  1. Download the latest source package from a mirror site
  2. tar zxf gcc-4.9.3.tar.gz
  3. cd  gcc-4.9.3
  4. ./contrib/download_prerequisites
  5. cd ..
  6. mkdir -p objdir
  7. cd objdir
  8. ../gcc-4.9.3/configure –prefix=$HOME/install/gcc-4.9.3 –enable-languages=c,c++,go –disable-multilib
  9. make
  10. make install

You are ready to use the $HOME/install/gcc-4.9.3/bin/g++   compiler.

Jaguar and Cassandra

This blog compares Jaguar and Cassandra (NoSQL distributed database).

1.  Install and setup Cassandra on multiple nodes

Install the package on each host in your Cassandra cluster:

a) Visit  and download  apache-cassandra-3.0.0-bin.tar.gz

b) tar zxf  apache-cassandra-3.0.0-bin.tar.gz

c) cd  apache-cassandra-3.0.0/conf

d) vi   cassandra.yaml


(This is the IP address of the host you are on).


– seeds: “”

(seeds are the host IP addresses of seed hosts)

You can specify one seed host or multiple seed hosts. If you want to use multiple seed hosts, then specify them as:

– seeds: “,”

e) vi

if [ “x$LOCAL_JMX” = “x” ]; then


After the configuration has been completed on all hosts in the cluster, you can login to the seed host(s), and start the cassandra server:


Then start the cassandra server on all non-seed hosts with the same command.

2.  Execute Cassandra commands

From any host in the Cassandra cluster, run this command:

$ bin/cqlsh
cqlsqh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> use mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users ( user_id int PRIMARY KEY, fname text,lname text );
cqlsh:mykeyspace> INSERT INTO users (user_id,  fname, lname) VALUES (1745, 'john', 'smith');
cqlsh:mykeyspace> INSERT INTO users (user_id, fname, lname)  VALUES (1744, 'john', 'doe'); 
cqlsh:mykeyspace> INSERT INTO users (user_id, fname, lname)  VALUES (1746, 'john', 'smith');
cqlsh:mykeyspace> select * from users where user_id > 100 and user_id < 200;
InvalidRequest: code=2200 [Invalid query] message="Only EQ and IN relation are supported on the partition key (unless you use the token() function)"
cqlsh:mykeyspace> help select

 SELECT <selectExpr>
 FROM [<keyspace>.]<table>
 [WHERE <clause>]
 [ORDER BY <colname> [DESC]]
 [LIMIT m];

 SELECT is used to read one or more records from a CQL table. 

So Cassandra does not support range query and multi-table join.

   3. Jaguar Supports Range Query and Table Joins

jaguar> select * from users where user_id > 10000; 
jaguar> select * join ( TABLE, TABLE1, TABLE2, ...) [WHERE];
jaguar> select * starjoin (tab1, tab2, tab3 ) [WHERE];
jaguar> select * indexjoin ( index(idex), tab1, tab2 ) [WHERE];

Jaguar Integration with SparkR

Once you have R and SparkR packages installed, you can start the SparkR program by executing the following command:



export JAVA_HOME=/usr/lib/java/jdk1.7.0_75

sparkR \
–driver-class-path $JDBCJAR \
–driver-library-path $LDLIBPATH \
–conf spark.executor.extraClassPath=$JDBCJAR \
–conf spark.executor.extraLibraryPath=$LDLIBPATH


Then in the SparkR command line prompt, you can execute the following R commands:



sc <- sparkR.init(master=”spark://mymaster:7077″, appName=”MyTest”)

sqlContext <- sparkRSQL.init(sc )

drv <- JDBC(“”, “/home/exeray/jaguar/lib/jaguar-jdbc-2.0.jar”, “`”)

conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )


df <- dbGetQuery(conn, “select * from int10k where uid > ‘anxnfkjj2329’ limit 5000;”)

head( df )

> cor(df$uid,df$score)
[1] 0.05107418

#build the simple linear regression
> model<-lm(uid~score,data=df)
> model

lm(formula = uid ~ score, data = df)

(Intercept) score
2.115e+07 1.025e-03

#get the names of all of the attributes
> attributes(model)
[1] “coefficients” “residuals” “effects” “rank”
[5] “fitted.values” “assign” “qr” “df.residual”
[9] “xlevels” “call” “terms” “model”

[1] “lm”



Jaguar’s successful integration with Spark and SparkR  allows wide range of data analytics  over the underlying fast Jaguar data engine.


Jaguar Supports R

R is a powerful language and environment for statistical computing and graphics.  Jaguar’s JDBC API can integrate with R for extensive data modelling and analysis.  To use R with Jaguar, the RJDBC library needs to be installed first:



           $ sudo apt-get install r-cran-rjava

$ sudo R

> install.packages(“RJDBC”, dep=true)

> q()


$ unset JAVA_HOME

$ R

> library(RJDBC)

> drv <- JDBC(“”, “/pathtomy/jaguar-jdbc-2.0.jar”, “`”)

> conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )

> dbListTables(conn)

> dbGetQuery(conn, “select count(*) from mytable;”)

> d <- dbReadTable(conn, “mytable;”)

> q()