Jaguar Integration with SparkR

Once you have R and SparkR packages installed, you can start the SparkR program by executing the following command:



export JAVA_HOME=/usr/lib/java/jdk1.7.0_75

sparkR \
–driver-class-path $JDBCJAR \
–driver-library-path $LDLIBPATH \
–conf spark.executor.extraClassPath=$JDBCJAR \
–conf spark.executor.extraLibraryPath=$LDLIBPATH


Then in the SparkR command line prompt, you can execute the following R commands:



sc <- sparkR.init(master=”spark://mymaster:7077″, appName=”MyTest”)

sqlContext <- sparkRSQL.init(sc )

drv <- JDBC(“”, “/home/exeray/jaguar/lib/jaguar-jdbc-2.0.jar”, “`”)

conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )


df <- dbGetQuery(conn, “select * from int10k where uid > ‘anxnfkjj2329’ limit 5000;”)

head( df )

> cor(df$uid,df$score)
[1] 0.05107418

#build the simple linear regression
> model<-lm(uid~score,data=df)
> model

lm(formula = uid ~ score, data = df)

(Intercept) score
2.115e+07 1.025e-03

#get the names of all of the attributes
> attributes(model)
[1] “coefficients” “residuals” “effects” “rank”
[5] “fitted.values” “assign” “qr” “df.residual”
[9] “xlevels” “call” “terms” “model”

[1] “lm”



Jaguar’s successful integration with Spark and SparkR  allows wide range of data analytics  over the underlying fast Jaguar data engine.


