Exeray has released new version (V5) of Bigdata Enabler

Exeray has released version 5 of BigData Enabler. This new version has changed the method of low level data storage.   It has been redesigned and developed from scratch and uses pure array-based indexing technology.  Most new containers maintain order and hash lookup of keys, so they are extremely fast in point search and ordered range search.

On top of the existing data containers, new classes such as AbaxCounter and AbaxGraph are added to the family. The AbaxCounter container collects data and maintains ordered counts of each key in real-time. The AbaxGraph container dynamically manages directed graph as well as undirected graph. One can easily add nodes, edges to a graph. Also methods are provided to detect adjacency relationship, retrieve neighor list, and find minimum spanning tree in a graph.

ArrayDB uses less memory and is almost as fast as memcached

We have performed benchmark test of ArrayDB and memcached. The memcached server is started with the following settings:

memcached -d -p 11211 -u memcached -m 512 -c 1024 -P /var/run/memcached/memcached.pid

The following C listing is the client program to insert and lookup 3000000 random strings in memcached:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <libmemcached/memcached.h>

char *randomString( int size );
int main(int argc, char *argv[])
{
int i;
int max = 100;
if ( argc > 1 ) {
max = atoi( argv[1] );
}

memcached_server_st *servers = NULL;
memcached_st *memc;
memcached_return rc;
char *key;
char *value;
size_t vlen;
uint32_t flags;
memcached_return_t err;

memcached_server_st *memcached_servers_parse (server_strings);
memc= memcached_create(NULL);
servers= memcached_server_list_append(servers, “localhost”, 11211, &rc);
rc= memcached_server_push(memc, servers);

if (rc == MEMCACHED_SUCCESS) {
printf(“Added server successfully\n”);
} else {
printf(“Couldn’t add server: %s\n”,memcached_strerror(memc, rc));
}
time_t t1 = time(NULL);
srand(10);
for ( i = 0; i < max; ++i ) {
key = randomString( 16 );
value = randomString( 16 );
rc= memcached_set(memc, key, 16, value, 16, (time_t)0, (uint32_t)0);
if (rc != MEMCACHED_SUCCESS) {
fprintf(stderr,”Couldn’t store key: %s\n”,memcached_strerror(memc, rc));
}
}
time_t t2 = time(NULL);
printf(“Insert %d strings took %d seconds\n”, max, t2-t1 );

t1 = time(NULL);
char *res;
srand(10);
for ( i = 0; i < max; ++i ) {
key = randomString( 16 );
value = randomString( 16 );
res = memcached_get( memc, key, 16, &vlen, &flags, &err );
if ( res ) {
free( res );
} else {
printf(“error in memcached_get NULL\n”);
}
}
t2 = time(NULL);
printf(“Lookup %d strings took %d seconds\n”, max, t2-t1 );
return 0;

}

char *randomString( int size )
{
int i, j;
static char cset[] = “0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ”; // 62 total
char *mem = (char*)malloc(size+1);
for ( i = 0; i< size; i++) {
j = rand() % 62;
mem[i] = cset[j];
}
mem[i] = ”;
return mem;
}

 

To insert and query data in ArrayDB, the following commands are executed:

$ rbench -r “3000000:0:0”  -k 0   (create table and insert data)

$ rbench -r “0:0:3000000”        (read data)

 

———————————————————————————–

Insert Time     |    Lookup Time   |    Memory

———————————————————————————–

Memcached    |     359 sec            |       317 sec           |    370 MB

————————————————————————————

ArrayDB          |    486 sec             |       440 sec           |   46 MB

————————————————————————————

 

The benchmark shows that ArrayDB uses much less memory but achieves similar speed to memcached. ArrayDB could be very valuable in cloud computing environment where memory resource is expensive.

Exeray ArrayDB Achieves fast query speed

ArrayDB’s data query speed is also significantly faster. With 3 millions records each in table t1 and t2, we performed a simple join of the two tables on the key userid. The following result shows the query performance under Commercial Enterprise and ArrayDB:

Commmercial Enterprise:

sql> select t1.userid, t1.addr, t2.msgid from t1, t2 where t1.userid=t2.userid;

USERID ADDR MSGID
—————- —————- —————-
peach     orange    orange
cherry     red         red
apple      green     green
mango    orange   orange
banana   yellow    yellow

Elapsed: 00:00:24.09

We ran the query again:

sql> select t1.userid, t1.addr, t2.msgid from t1, t2 where t1.userid=t2.userid;

USERID ADDR MSGID
—————- —————- —————-
peach     orange    orange
cherry     red         red
apple      green     green
mango    orange   orange
banana   yellow    yellow

Elapsed: 00:00:10.04

ArrayDB:

rdb> join t1, t2 on key;
userid addr msgid
—————- —————- —————-
apple        green       green
banana    yellow     yellow
cherry     red           red
mango    orange    orange
peach     orange     orange
server time is 268 milliseconds
—————————————————
Done in 307 milliseconds

The query speed is about 37X of Commercial Enterprise even we used the time of the second query. Multiple runs of data query have used the system’s cached data which makes the query faster than reading cold data.

Exeray data write speed is 20X more than Commmercial Database

Exeray’s revolutionary big data technology shows its power in big data storage and indexing. We inserted 3,000,000 records into our ArrayDB and recorded the clock time. We did the same in a well-known Commercial Database (CD). It took CD 23 minutes 3 seconds to finish the insert, but it took only 51 seconds in ArrayDB. This is 26X speed up! As more data is inserted, we expect this speedup factor steadily increases. Exeray’s disruptive data storage and indexing technology brings significant advance in today’s computing environment where data enters IT systems with extremely high velocity.

Each inserted record contains a 16 byte key and a 16 byte value filed. CD and ArrayDB all indexed on the keys.

$ wc -l /tmp/3M1.txt
3000005 /tmp/3M1.txt

$ head /tmp/3M1.txt
5MF0awxWQ3yjG319,ghDzzn2S8EfTruhw
hXupr1ki3QBHTBQ7,SrGrOGjUlwNK05gf
0JDrKXJLNjsEUgLM,Fpbu5sooWa8Wfpad
8NEQImAtF07xhQhU,fqokSLHOTPI6cQji
BV7jhHKUHRrWHGRU,6ddYWUKPHrVSfeaR
8f8pWTiCIHyooniu,Btrvn9l3AgVQt3FB
jNYdEePnWmJkHZOi,tdMOn7RVlKJMOol5
aiiMw67sqQM7Qzoj,Ma57fX3AHKlt7Gyh
YP1vV9VjXGqLfO4Z,Y85d56MKQ7cXLKcI
xedql6JiMa4ZW6ZT,e24h8Q0YVcWHU6ps

In Commercial Database environment:

load  ‘/tmp/3M1.txt’ into table T1:

Start time: Thu Oct 23 23:04:39 EDT 2014
End time:   Thu Oct 23 23:27:42 EDT 2014

Top shows 1.2GB memory usage.

sql>  select count(*) from t1;

Count(*)
———-
3000005

Elapsed: 00:00:00.11

In ArrayDB environment:

$ rsql

Raydb Server 1.0 from Exeray

rdb> load file /tmp/3M1.txt into t1;
Done in 51048 milliseconds

Top shows 166MB memory usage.

rdb>  select count(*) from t1;
3000005
—————————————————
Done in 98 milliseconds

In the next blog post, we will compare Commercial Database and ArrayDB in data query.

Exeray Received Two SVIEF Awards

Exeray first time participated the SVIEF (www.svief.org) start-up contest and we received two awards: 2014 SVIEF Start-up Contest Top 25 Award and SVIEF 2014 Innovation Awards Top 30.   See  2014 SVIEF for details.  Silicon Valley Technology Innovation & Entrepreneurship Forum (SVIEF) is an international conference designed to foster innovation and promote business partnerships connecting US and Asia-Pacific region. It is a leading venue in high-tech industry field gathering multi-tech and business professionals, while providing a platform for talent, technology and capital exchange. The past three SVIEF were held in Silicon Valley in Fall, 2011, 2012, and 2013, with more than 5,000 attendees each year.

Exeray has released new Version (V4) of BigData Enabler

Exeray has released version 4 of BigData Enabler. This version has fixed bugs in V3 and added improved performance of the software. The in-memory data containers are state-of-the-art technology for big data computing.  Based on this new release, we are working on a even faster software product, which is pure array-based indexing technology. There are  many areas that our ArrayDB indexing technology will bring disruptive impact to in-memory computing as well as persistent storage and query.

Commercial Exeray Bigdata Enabler is released

Following the release of Version 2 of Free Bigdata Enabler, Exeray is excited to announce that the commercial version of Bigdata Enabler is released. The commercial version does not have the restriction of number of keys in the containers and all its features all fully functional. Users will be able to enjoy a 30-day free trial after obtaining a license key from Exeray. They also will have access to the full product service and upgrades to our innovative software once they purchase the software licenses.

More containers and products are in the pipeline. Stay tuned to Exeray, the innovation engine for Big & Fast Data ! You will be 100% satisfied with the bleeding-edge tools we provide.

Version 2 of Free Bigdata Enabler Is Released

Today Exeray is glad to announce that Version 2 of Free Bigdata Enabler (TM) is released. After the first release of Bigdata Enabler, many have expressed strong interest in the innovative approach to information retrieval. We have worked hard to fix bugs and make improvements to the new product.  Some of the changes in this release include:

  • consolidated and optimized code in some main container classes
  • made improvement in dynamic memory allocation for storing key value pairs
  • resolved tickets raised after the first release (Version 1)
  • made corrections to errors in documentation
  • added AbaxClock class for time tracking in test cases
  • added more test cases in the testmain.cc program
  • changed default value of the parameter factor in the setCapacity method from four to one.

Exeray strives to leverage its innovative data structure and delivers the best technology for the world to resolve the challenges of big and fast data.  Users are welcome to give suggestions, recommendations, and feedback about our product.

In-memory computing can boost both CPU-bound and IO-bound jobs

In enterprise computing, usually there are CPU-bound and IO-bound jobs. In CPU-bound jobs, CPU utilization is high while IO tasks are minimal. In IO-bound jobs, CPU utilization is low because the jobs are mostly waiting for IO operations to start and finish. As we know, disk-based IO operations are several magnitudes slower than CPU-memory operations.

 

Good in-memory computing technology directly impacts performance of CPU-bound processes. You may then ask this question: how does better in-memory technology can have any impact on IO-bound jobs since CPU is already under-utilized and it is the IO that is the bottleneck? 

Look at this way you will find the answer: advances in data storage technology. Storage technology dictates how fast IO operations can be performed between CPU and where data is stored. The gradual adoption of Solid Sate Flash Memory as data storage mechanism by enterprises is boosting IO operations dramatically. IO-bound jobs are less IO-bound now. Some IO-bound jobs may have even shifted to CPU-bound.

Recognizing bottlenecks in your enterprise computing is crucial now since you are dealing with big data. The point is that good in-memory computing technology and good storage technology all play important roles to improve your enterprise productivity.