Saturday, 12 December 2009

Travel in the 21st century

I am currently on a trip through parts of south east asia and the US with Steve Gravell and Tobias Köppen. Naturally I couldn't leave my Eee PC or iPhone at home, so during the trip I have gathered a few tips for the geeky traveler. Nothing revolutionary, but I found these apps and websites to be quite helpful. In my usual manner I also complain a bit and suggest improvements. If you have any additional recommendations please add a comment.

Maps
Unfortunately the holy trinity of Google maps, GPS and an unlimited data plan is one short when abroad. Without a data plan you are plunged back into the dark ages again and roaming is not an option. Luckily there is an iPhone app called oMaps that allows you to mark certain areas on the map for offline use. Perfect for getting the city map while you are in a hotel room near wifi, mark a few places you want to visit and off you go.
oMaps uses OpenStreetMap as the data source, the license used enables the offline mode. OpenStreetMap builds its map from user contributions so some areas have excellent coverage whilst others are lacking. You can help out by contributing straight from your iPhone, although I had limited success with that so far.

Music
Music is essential on a trip, perfect for all those long bus, plane and boat rides. I love the Last.fm iPhone app but unfortunately we are once again back to the problem of not having a data plan and no offline mode. I decided to give the Spotify app a go. It allows you to select playlists that will be available in offline mode. I can easily search for and add new albums as I go on the trip and have them synced while I am near wifi. The price is quite hefty at €9.99 a month but worth it when on the move.
As many others I would kill for being able to run the app in the background. When me and Steve drove to Napa valley I was using the phone to navigate, which meant we couldn't listen to my music at the same time. It was also the most ass-clinching car ride in a long long time, but that's another story. While I am at it, how about adding some sort of playlist rss subscription feature to the app? For example I read the music blog Let's pretend we're bunny rabbits. They regularly post playlists with a spotify link and I'd like to see that automatically pop up in the Spotify app.

Steve and I were having a few beers in a horrible bar in Kuala Lumpur when the DJ announced that the next song was part of a competition. Figure out what song it is and you win two pitchers of beer. Steve had the great idea try out Shazam. Shazam is a nifty service and iPhone app that can figure out the artist and title of a track from about 10 seconds of audio. After a couple of tries Steve got a result. We won the two pitchers and subsequently had the worst hangover of the trip the next day.

Pictures
Unlike Tobias who brought three cameras on the trip I travel light. I decided my iPhone camera is good enough for the few pics I bother taking. I use Pano to take panorama pictures. It seems AutoStitch is also good or even better, but I haven't tried it. Then I use the official Flickr app to upload the pics as soon as I get near wifi. That way friends and family can see what you are up to as you go along.

Communications
Besides e-mail I post various updates to Twitter using Tweetie. Of course the Skype app is also useful for keeping in touch with the old world. The only problem is that access to wifi and the overlap of waking hours in the different timezones rarely happen at the same time.

What to do?
Steve had the great idea of creating a custom Google map that we can all add places to visit to. We started looking at various sources for ideas on what to add to the map. I found wikitravel to be a great source of general location information. We picked the first couple of accommodations from tripadvisor reviews, it worked surprisingly well. We also managed to get friends and friends of friends to add their own tips to the map. Thanks everyone!
The only problem is that the maps app on the iPhone doesn't seem to be able to read custom maps. Google Earth was supposed to do it but we can't get it to work either.

I started using Foursquare when it launched in London just before I left on this trip. You simply "check in" using the app when you get to a restaurant, bar or another venue of some type. Basically scrobbling for places instead of music. Unfortunately there is no way to get recommendations from the app yet. It now roughly knows what kind of places me and my friends like in London. From that data I want it to tell me where I should go when I get to a new city. I can't even figure out how to get it to give me a list of places my friends have been to the most number of times.

Aardvark is a service that can help you find the answer to questions that normal search engines struggle with. Instead of looking up the results in an index it forwards the question to real people via various IM services. Aardvark keeps track of what kind of topics their users know enough about to answer and routes questions accordingly. It has worked quite well so far, it is especially worthwhile when your travel question is sent to a local that hopefully knows a lot more about an area then fellow travelers.

Games
Rock band and Civilization for the iPhone are quite fun and suitably time consuming. Unfortunately they also eat tons of battery, so make sure you don't draing the phone completely. Rock band has a multiplayer mode that works over bluetooth, perfect for playing on planes to discover that they don't crash when you do so.

Wifi
I have been pleasantly surprised by the number of hostels and restaurants who provide free wifi. Often they are password protected though, so it takes a while to grab someone that knows the password and understands why you are waving an iPhone around. All phrasebooks should contain the important phrase "Excuse me, what is the wifi password?" from now on.
On some networks just trying the classic passwords such as "1234567890" often work. At other times I have considered trying out wpacracker, but that is probably both illegal and not very economical, but looks like fun. It basically lets you use a couple of hundred machines to run a dictionary attack against a network password for a small fee.

DATA!
As you may have noticed throughout this post, I feel castrated when without internet access. It is amazing how quickly one gets used to having it everywhere. I will try almost anything to get my fix, except paying the roaming charges. Having the cost tick away makes me stressed out and constantly thinking about it.
Since the stay in the US would be roughly two months in total I thought it would be worth getting an AT&T subscription with unlimited data. It cost about $70 per month, I earned that money back the first day compared to paying the UK roaming charges. Thanks to Karl for helping me out, thanks to the AT&T staff for eventually figuring out how get it set up for a dirty foreigner like me and finally thanks to the guys who released the software that I unlocked the iPhone with.

I am longing for the day when one can travel around the world without the fear of getting a million dollar phone bill. Luckily the EU seems to be working towards lower roaming costs, but only within the EU of course. It's a great first step.

I have been looking for a pay as you go sim card on some of the places we have been to, but there doesn't seem to be any that I have seen that provide unlimited data plans for a fixed cost per week or month. I welcome any suggestions for how to get around this issue when abroad.

Monday, 17 August 2009

Leaving Last.fm

After four amazing years at Last.fm in London I have decided it's time to move on.

I'm honored to have been a part of the company since it was just four guys in a dodgy Whitechapel office (and Russ working from uni) to where we are today, via vc funding and the sale of the company to CBS.

The greatest part of the job has been the people, clever guys and girls who are great at what they are doing (and nice too, bonus!). Keep an eye on them, I'm sure we'll see more interesting stuff created at the hands of Last.fm staff, past and present.

Thanks to the various dev team members who have encouraged me to work on open source projects such as Hadoop, it's been very educational.

I'm going spend the next few months travelling and then we'll see what happens, if anyone wants to get in touch, send me a tweet or a message.

A warning goes out to the Last.fm alumni around the world, keep your sofas at the ready, I might be knocking on your door any day now.

Wednesday, 24 June 2009

Zohmg

Interns are great
This summer we've had two great interns in the Last.fm data team, they have been working on a project named Zohmg.


From the announcement

I'm happy to announce Zohmg, a data store for aggregation of multi-dimensional time series data built on top of Hadoop, Dumbo and HBase. Data is imported with a mapreduce job and is exported through an HTTP API.

A typical use-case for Zohmg is the analysis of Apache log files. The analyst would be interested in breaking down pageviews by path, user agent, country of origin, etc. In-house at Last.fm, we have successfully demo'd an installation that served access data in realtime for millions of paths broken down by several dimension.


Zohmg 0.2.0
Congrats to both Fredrik Möllerstrand and Per Andersson on their first public release that just went out.

For more information check out the readme.

Saturday, 13 June 2009

NOSQL debrief

The relatively young but rapidly growing "nosql" community met last Thursday in San Francisco. The idea was to give attendees a solid introduction to how distributed, non relational databases work as well as an overview of the various projects out there. If I may say so myself we succeeded in doing both. Thanks to all the presenters for very interesting talks and everyone for great hallway discussions.



Presentation slides and videos
Intro session - Todd Lipcon, Cloudera (slides, video1, video2)
Voldemort - Jay Kreps, Linkedin (slides pdf ppt, video1, video2)
Cassandra - Avinash Lakshman, Facebook (slides pdf ppt, video)
Dynomite - Cliff Moon, Powerset (slides, video)
HBase - Ryan Rawson, Stumbleupon (slides, video)
Hypertable - Doug Judd, Zvents (slides pdf ppt, video1, video2)
CouchDB - Chris Anderson, couch.io (slides, video1, video2)

VPork - Jon Travis, Springsource (slides, video)
MongoDb - Dwight Merriman, 10gen (slides, video)
Infinite Scalability - Jonas S Karlsson, Google (slides, video)

Some videos by Digg's John Quinn, the rest by Martin Dittus from Last.fm. Pictures by Russ Garrett from Last.fm.

NOSQL mailing list
At the event I got requests to set up a NOSQL mailing list as a cross project discussion forum.
Hopefully it will encourage collaboration and exchange of ideas. If that sounds interesting subscribe here.

Sponsors
Thanks again to presenters and the sponsors (last.fm, cbsi, digg and github).

Tuesday, 12 May 2009

NOSQL meetup

Hadoop summit
I'm going to attend the Hadoop summit in San Francisco in June, had a great time last year, learned a bunch of stuff and got to meet a lot of people I previously only knew by name.

NOSQL
To make the most of the flight money I'm putting together a free meetup about "open source, distributed, non relational databases" or NOSQL for short.

It's taking place on the 11th of June, the day after the Hadoop summit in San Francisco. CBS interactive have been kind enough to provide us with both a venue and free lunch!

If you wish to attend, please register.

Preliminary schedule
09.45: Doors open
10.00: Intro session (Todd Lipcon, Cloudera)
10.40: Voldemort (Jay Kreps, Linkedin)
11.20: Short break
11.30: Cassandra (Avinash Lakshman, Facebook)
12.10: Free lunch (sponsored by CBSi)
13.10: Dynomite (Cliff Moon, Powerset)
13.50: HBase (Ryan Rawson, Stumbleupon)
14.30: Short break
14.40: Hypertable (Doug Judd, Zvents)
15.20: Panel discussion
16.00: End of meetup, relocate to a pub called Kate O’Brien’s nearby

Location
Magma room, CBS interactive
235 Second Street
San Francisco, CA 94105

Saturday, 9 May 2009

VPork

VPork background
With the wide range of distributed, non relational databases out there it is hard to know which one to choose. One part of the puzzle is of course performance. Personally I'm interested in low response times.

A couple of weeks ago Jon Travis put up a useful program called VPork on his github repository. It's a fairly straight forward performance testing tool for Voldemort, written in Groovy. You can find the announcement on the Voldemort mailing list.

Short description of how it works from the wiki:
* A single JVM is started, with any number of client threads
* Each thread executes for a given number of iterations
* For each iteration, the thread can read an existing record, and/or create a new one
* The probability of each read/write is configurable
* The location of where reads happen is configurable (by default, it reads the most recently written records, trailing off to less frequent reads of writes which occurred long ago)

At the end of the run it gives you info such as average, 99th percentile and standard deviation of read and write latency.

VPork + Cassandra
I'm interested in both Voldemort and the Cassandra project. Thus, it seemed like a good idea to add the ability to benchmark Cassandra to VPork. The result can be found my branch here.

How to run a one node Cassandra test (on Ubuntu)
First let's fetch Cassandra, there's no official release yet so we'll use a nightly build.
sudo apt-get install sun-java6-jdk
wget http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artifact/cassandra/build/cassandra-0.3.0-dev.tgz
tar -zxvf cassandra-0.3.0-dev.tgz
sudo mkdir -p /var/cassandra/logs
sudo chown -R `whoami` /var/cassandra/
cd cassandra-0.3.0-dev
bin/cassandra -f

Now Cassandra should be up and running, let's start VPork.
sudo apt-get install git-core groovy
git clone git://github.com/johanoskarsson/vpork.git
cd vpork
./vpork.sh configs/cassandra/30-thread-pork.groovy configs/cassandra/nodes.conf

After a while hopefully you'll get some meaningful results. This is of course a very basic test, you probably want to add more Cassandra nodes, run the client on another node etc.

Where do we go from here?
VPork has some drawbacks such as suffering from the power-of-ten syndrome and it doesn't warm up the databases before it starts measuring, but it's a good start!

I'd love to see other interesting storage engine tests added to VPork to give users a simple way of comparing them with the load pattern they expect. There's an issue open for it over in HBase land.
It's fairly easy to do, create a class that implements a createClient() method returning a client that implements the basic get(String key) and put(String key, byte[] value) methods.

Another possible use is to alert developers of performance regressions. There's been discussions about setting up nightly benchmarks on multiple servers to do just that.

Disclaimer: I'm no performance testing expert or a statistician (or a magician for that matter, but it's not important right now).

Thursday, 16 April 2009

HUGUK #2

I recently organized the second Hadoop user group UK meetup at Sun's customer briefing center in London. On the off chance that there's a Hadoop user in the UK that reads this blog and didn't attend, shame on you!

For more information on what you missed have a look at huguk.org