Hadoop summit
I'm going to attend the Hadoop summit in San Francisco in June, had a great time last year, learned a bunch of stuff and got to meet a lot of people I previously only knew by name.
NOSQL
To make the most of the flight money I'm putting together a free meetup about "open source, distributed, non relational databases" or NOSQL for short.
It's taking place on the 11th of June, the day after the Hadoop summit in San Francisco. CBS interactive have been kind enough to provide us with both a venue and free lunch!
If you wish to attend, please register.
Preliminary schedule
09.45: Doors open
10.00: Intro session (Todd Lipcon, Cloudera)
10.40: Voldemort (Jay Kreps, Linkedin)
11.20: Short break
11.30: Cassandra (Avinash Lakshman, Facebook)
12.10: Free lunch (sponsored by CBSi)
13.10: Dynomite (Cliff Moon, Powerset)
13.50: HBase (Ryan Rawson, Stumbleupon)
14.30: Short break
14.40: Hypertable (Doug Judd, Zvents)
15.20: Panel discussion
16.00: End of meetup, relocate to a pub called Kate O’Brien’s nearby
Location
Magma room, CBS interactive
235 Second Street
San Francisco, CA 94105
Tuesday, 12 May 2009
Saturday, 9 May 2009
VPork
VPork background
With the wide range of distributed, non relational databases out there it is hard to know which one to choose. One part of the puzzle is of course performance. Personally I'm interested in low response times.
A couple of weeks ago Jon Travis put up a useful program called VPork on his github repository. It's a fairly straight forward performance testing tool for Voldemort, written in Groovy. You can find the announcement on the Voldemort mailing list.
Short description of how it works from the wiki:
* A single JVM is started, with any number of client threads
* Each thread executes for a given number of iterations
* For each iteration, the thread can read an existing record, and/or create a new one
* The probability of each read/write is configurable
* The location of where reads happen is configurable (by default, it reads the most recently written records, trailing off to less frequent reads of writes which occurred long ago)
At the end of the run it gives you info such as average, 99th percentile and standard deviation of read and write latency.
VPork + Cassandra
I'm interested in both Voldemort and the Cassandra project. Thus, it seemed like a good idea to add the ability to benchmark Cassandra to VPork. The result can be found my branch here.
How to run a one node Cassandra test (on Ubuntu)
First let's fetch Cassandra, there's no official release yet so we'll use a nightly build.
sudo apt-get install sun-java6-jdk
wget http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/cassandra-0.3.0-dev.tgz
tar -zxvf cassandra-0.3.0-dev.tgz
sudo mkdir -p /var/cassandra/logs
sudo chown -R `whoami` /var/cassandra/
cd cassandra-0.3.0-dev
bin/cassandra -f
Now Cassandra should be up and running, let's start VPork.
sudo apt-get install git-core groovy
git clone git://github.com/johanoskarsson/vpork.git
cd vpork
./vpork.sh configs/cassandra/30-thread-pork.groovy configs/cassandra/nodes.conf
After a while hopefully you'll get some meaningful results. This is of course a very basic test, you probably want to add more Cassandra nodes, run the client on another node etc.
Where do we go from here?
VPork has some drawbacks such as suffering from the power-of-ten syndrome and it doesn't warm up the databases before it starts measuring, but it's a good start!
I'd love to see other interesting storage engine tests added to VPork to give users a simple way of comparing them with the load pattern they expect. There's an issue open for it over in HBase land.
It's fairly easy to do, create a class that implements a createClient() method returning a client that implements the basic get(String key) and put(String key, byte[] value) methods.
Another possible use is to alert developers of performance regressions. There's been discussions about setting up nightly benchmarks on multiple servers to do just that.
Disclaimer: I'm no performance testing expert or a statistician (or a magician for that matter, but it's not important right now).
With the wide range of distributed, non relational databases out there it is hard to know which one to choose. One part of the puzzle is of course performance. Personally I'm interested in low response times.
A couple of weeks ago Jon Travis put up a useful program called VPork on his github repository. It's a fairly straight forward performance testing tool for Voldemort, written in Groovy. You can find the announcement on the Voldemort mailing list.
Short description of how it works from the wiki:
* A single JVM is started, with any number of client threads
* Each thread executes for a given number of iterations
* For each iteration, the thread can read an existing record, and/or create a new one
* The probability of each read/write is configurable
* The location of where reads happen is configurable (by default, it reads the most recently written records, trailing off to less frequent reads of writes which occurred long ago)
At the end of the run it gives you info such as average, 99th percentile and standard deviation of read and write latency.
VPork + Cassandra
I'm interested in both Voldemort and the Cassandra project. Thus, it seemed like a good idea to add the ability to benchmark Cassandra to VPork. The result can be found my branch here.
How to run a one node Cassandra test (on Ubuntu)
First let's fetch Cassandra, there's no official release yet so we'll use a nightly build.
sudo apt-get install sun-java6-jdk
wget http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/cassandra-0.3.0-dev.tgz
tar -zxvf cassandra-0.3.0-dev.tgz
sudo mkdir -p /var/cassandra/logs
sudo chown -R `whoami` /var/cassandra/
cd cassandra-0.3.0-dev
bin/cassandra -f
Now Cassandra should be up and running, let's start VPork.
sudo apt-get install git-core groovy
git clone git://github.com/johanoskarsson/vpork.git
cd vpork
./vpork.sh configs/cassandra/30-thread-pork.groovy configs/cassandra/nodes.conf
After a while hopefully you'll get some meaningful results. This is of course a very basic test, you probably want to add more Cassandra nodes, run the client on another node etc.
Where do we go from here?
VPork has some drawbacks such as suffering from the power-of-ten syndrome and it doesn't warm up the databases before it starts measuring, but it's a good start!
I'd love to see other interesting storage engine tests added to VPork to give users a simple way of comparing them with the load pattern they expect. There's an issue open for it over in HBase land.
It's fairly easy to do, create a class that implements a createClient() method returning a client that implements the basic get(String key) and put(String key, byte[] value) methods.
Another possible use is to alert developers of performance regressions. There's been discussions about setting up nightly benchmarks on multiple servers to do just that.
Disclaimer: I'm no performance testing expert or a statistician (or a magician for that matter, but it's not important right now).
Subscribe to:
Posts (Atom)