The SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious.
A major problem of the Internet today is that finding high quality information is not easy nor fast. The steady increase of spam and junk content on the Web further complicates this challenge. Another related issue is that finding knowledgeable and trustworthy users on social platforms like Delicious is much more difficult than it should be. Wouldn’t it be nice if Delicious recommended “good” users with similar interests?
To tackle this problem, we created the SPEAR algorithm that analyzes the timeline of the bookmarking and tagging activities of users. The focus of SPEAR is on the ability of users to find new, high quality information on the Internet. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality.
Technically, SPEAR is based on the well-known information retrieval algorithm HITS, a technique presented in 1999 that is used by search engines to rank Web pages. We came up with SPEAR by modifying HITS so that it fits to the characteristics of open and shared systems like Delicious and extended it with a new component that integrates the timeline of user activities into its analysis. This resulted in further performance improvements of the algorithm.
The two main elements of the new SPEAR algorithm are:
1. Mutual reinforcement of user expertise and document quality: A user’s expertise in a particular topic depends on the quality of the documents she or he has found, and the quality of documents in turn depends on the expertise of the users who have found them.
2. Discoverers vs. followers: Expert users should be discoverers – they tend to be faster than others to identify new and high quality documents. In other words, “the early bird catches the worm”. SPEAR gives more credit to users the earlier they find high quality documents.
The combination of both these elements has the effect that SPEAR favors quality over quantity of user actions, and that the algorithm is quite resistant to today’s spamming attacks.
We believe SPEAR is very useful in the context of open systems, particularly, social networks. That said, we are already researching the next version of the algorithm – the popularity of online services like Delicious is rising, and so is the spam threat. Whether we want to improve the user experience on Delicious or win the arms race against spammers, there’s still a lot of work left to do!
sounds interesting, but how can we try it out?
Posted by: Jon Denison | September 02, 2009 at 02:36 PM