Wednesday, 24 June 2009

Zohmg

Interns are great
This summer we've had two great interns in the Last.fm data team, they have been working on a project named Zohmg.


From the announcement

I'm happy to announce Zohmg, a data store for aggregation of multi-dimensional time series data built on top of Hadoop, Dumbo and HBase. Data is imported with a mapreduce job and is exported through an HTTP API.

A typical use-case for Zohmg is the analysis of Apache log files. The analyst would be interested in breaking down pageviews by path, user agent, country of origin, etc. In-house at Last.fm, we have successfully demo'd an installation that served access data in realtime for millions of paths broken down by several dimension.


Zohmg 0.2.0
Congrats to both Fredrik Möllerstrand and Per Andersson on their first public release that just went out.

For more information check out the readme.

Saturday, 13 June 2009

NOSQL debrief

The relatively young but rapidly growing "nosql" community met last Thursday in San Francisco. The idea was to give attendees a solid introduction to how distributed, non relational databases work as well as an overview of the various projects out there. If I may say so myself we succeeded in doing both. Thanks to all the presenters for very interesting talks and everyone for great hallway discussions.



Presentation slides and videos
Intro session - Todd Lipcon, Cloudera (slides, video1, video2)
Voldemort - Jay Kreps, Linkedin (slides pdf ppt, video1, video2)
Cassandra - Avinash Lakshman, Facebook (slides pdf ppt, video)
Dynomite - Cliff Moon, Powerset (slides, video)
HBase - Ryan Rawson, Stumbleupon (slides, video)
Hypertable - Doug Judd, Zvents (slides pdf ppt, video1, video2)
CouchDB - Chris Anderson, couch.io (slides, video1, video2)

VPork - Jon Travis, Springsource (slides, video)
MongoDb - Dwight Merriman, 10gen (slides, video)
Infinite Scalability - Jonas S Karlsson, Google (slides, video)

Some videos by Digg's John Quinn, the rest by Martin Dittus from Last.fm. Pictures by Russ Garrett from Last.fm.

NOSQL mailing list
At the event I got requests to set up a NOSQL mailing list as a cross project discussion forum.
Hopefully it will encourage collaboration and exchange of ideas. If that sounds interesting subscribe here.

Sponsors
Thanks again to presenters and the sponsors (last.fm, cbsi, digg and github).