Jaguar Integration with SparkR

Once you have R and SparkR packages installed, you can start the SparkR program by executing the following command:

 

#!/bin/bash

export JAVA_HOME=/usr/lib/java/jdk1.7.0_75
LIBPATH=/usr/lib/R/site-library/rJava/libs:$HOME/jaguar/lib
LDLIBPATH=$LIBPATH:$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server
JDBCJAR=$HOME/jaguar/lib/jaguar-jdbc-2.0.jar

sparkR \
–driver-class-path $JDBCJAR \
–driver-library-path $LDLIBPATH \
–conf spark.executor.extraClassPath=$JDBCJAR \
–conf spark.executor.extraLibraryPath=$LDLIBPATH

 

Then in the SparkR command line prompt, you can execute the following R commands:

 

library(RJDBC)
library(SparkR)

sc <- sparkR.init(master=”spark://mymaster:7077″, appName=”MyTest”)

sqlContext <- sparkRSQL.init(sc )

drv <- JDBC(“com.jaguar.jdbc.JaguarDriver”, “/home/exeray/jaguar/lib/jaguar-jdbc-2.0.jar”, “`”)

conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )

dbListTables(conn)

df <- dbGetQuery(conn, “select * from int10k where uid > ‘anxnfkjj2329’ limit 5000;”)

head( df )

#correlation
> cor(df$uid,df$score)
[1] 0.05107418

#build the simple linear regression
> model<-lm(uid~score,data=df)
> model

Call:
lm(formula = uid ~ score, data = df)

Coefficients:
(Intercept) score
2.115e+07 1.025e-03

#get the names of all of the attributes
> attributes(model)
$names
[1] “coefficients” “residuals” “effects” “rank”
[5] “fitted.values” “assign” “qr” “df.residual”
[9] “xlevels” “call” “terms” “model”

$class
[1] “lm”

 

 

Jaguar’s successful integration with Spark and SparkR ¬†allows wide range of data analytics ¬†over the underlying fast Jaguar data engine.

 

Advertisements