August 13th, 2003

Friends & interests

I'm sure you've all seen the '50 closest non-friends'-thing that's been floating around LJ for some time now. While it is pretty interesting, it doesn't take into account why you've made someone a friend, and why they made someone else their friend. It just assumes that their journals are interesting to you, because the people that write interesting things find them interesting.
This is, of course, not necessarily the case.

Friends on LJ is a directional graph: a collection of nodes (in this case: users) with connections between the nodes (in this case: friendship relations). Connections are directional: if A has friended B, it is not necessarily so that B has also friended A. Example: I have chadu listed as a friend, but he doesn't list me as a friend. With this graph, you can do all sorts of fun things, if you can devise an algorithm that does something meaningful with the graph.

I haven't checked this theory, but I assume that most users are part of a few 'clusters': users that all have friended each other. For instance, arnoudens, damanique, isabelgou and shironuchan all have friended me and each other. I have also listed suckerrlove as friend, but she has nothing in common with the other cluster, and leads into a different cluster.
Clusters are defined by 'topicality': people in a certain cluster write about similar things. For instance, the previously mentioned cluster are all Dutch anime fans and write about that. The second cluster makes up a completely different demographic and writes about different things.
If you look at just the friendship-relations, you miss all that.

But LJ also has the 'interests'. Assuming that people write about their interests (here's an experiment for you: look up all people that list 'anime' as an interest in your country, and see how many of their entries are about anime. I predict the harvest will be quite low.). These could be used to modify the graph-traversal algorithm to asses whether this other user matches up with your interests.

This will take more coding and some experiments. Does anybody have any thoughts on the issue in the meantime?
