Graph partitioning part 2: connected graphs

In a previous post, we talked about finding the partitions in a disconnected graph using Cascading. In reality, most graphs are actually fully connected, so only being able to partition already disconnected graphs is not very helpful. In this post, we'll take a look at partitioning a connected graph based on some criterium for creating a partition boundary.

Let's take the sample graph from previous post and make it a connected one. We'll do this by adding two extra nodes that are connected to lots of other nodes in the graph. Like this:

Withcounts-flat

Read the rest of this post »

Graph partitioning in MapReduce with Cascading

I have recently had the joy of doing MapReduce based graph partitioning. Here's a post about how I did that. I decided to use Cascading for writing my MR jobs, as it is a lot less verbose than raw Java based MR. The graph algorithm consists of one step to prepare the input data and then a iterative part, that runs until convergence. The program uses a Hadoop counter to check for convergence and will stop iterating once there. All code is available. Also, the explanation has colorful images of graphs. (And everything is written very informally and there is no math.)

Read the rest of this post »

Twitter data fun

I made a map of my followers on Twitter. This is not entirely straight forward, as most Twitter users don't attach geo coordinates to their tweets or profiles. Luckily, many people leave something sensible in the location field of their profile (e.g. 'Amsterdam' or 'London, UK'). You can match this field against a Lucene index of all the cities in the world, which I happen to have. I was able to place 15 out of my grand total of 19 followers on the map.

Followers of @fzk:

Why is this important? Read on! Also, somewhere down the line I will explain how to make such a map for your own account.

Read the rest of this post »

Choosing a Twitter handle

When creating a Twitter account there is the very difficult problem of choosing a handle. I could try my full name, but apparently Twitter thinks it’s too long. I can’t even fit my full first and last name in the name field on the profile settings page. I’m stuck with just Friso. My first name as a handle was already taken and I don’t feel that @friso1981 looks very compelling. Beyond that I figured that it would be important to have a name that’s fairly easy to memorize, so people pick it up from slide decks easily (my main Twitter use case is communicating at conferences). With this in mind, I decided that I should have a three character Twitter handle. Of course, you’d expect all of these to be spoken for already. Guess again. There was exactly one available: @fzk.

How do you figure out which three character names are still there to pick? Well, you check all permutations against the API that Twitter’s sign up page uses to do the AJAX magic that let’s you know whether your chosen handle is still available.

import urllib
import json

result = open('tla.txt', 'w')
str = "abcdefghijklmnopqrstuvwxyz"
for permutation in ["" +x +y +z for x in str for y in str for z in str]:
    try:
        req = urllib.urlopen('http://twitter.com/users/username_available?suggest=0&username=' + permutation)
        data = json.loads(req.readline())
        if (data['valid']):
            print "Got one: " + permutation
            result.write("Available: " + permutation + '\n')
            result.flush()
        else:
            print "Nope: " + permutation
    except:
        print "Error: " + permutation
        result.write("Error: " + permutation + '\n')
        result.flush()

Interestingly, these three character handles come and go. Apparently people free them up every now and then. Don’t know why. I decided to run the script again and at the time of this writing both @jqn and @zqo are available. Perhaps people had trouble with the names containing a q. Now I am tempted to run this daily and see how TLA Twitter handles get claimed and released over time…

Why blog?!

Until recently, I have pretty much used the web in a read only fashion. I have once written on my Facebook wall (because I was looking for an apartment), my LinkedIn profile contains just my previous employers plus a brief description, I have only twice written a blog post for different corporate blogs and I usually publish slides for public talks online. All of this appears to be changing now. In the past month I have gotten a Twitter account and now a blog. Why am I suddenly compelled to shout stuff on the internet? Well, the main reason is this:

Read the rest of this post »