The past week has been very productive for my GSoC project. My mentor and I agreed that the project will be written in PHP4, but the learning routines will be written in PHP5. The scripts can’t run from Apache or any other SAPI, it only runs from CLI, that’s because the learning routine takes sometime to compute (and exceeds most of webservers timeouts). The learning scripts can be computed in another machine, and the frequency of re-learning depends of how the site growth, for big sites a re-run for week is fine.

The friends suggester is now working pretty well. The way it works is simple, it basically discovered groups of users that has something in common (in this case friends), then when the learning process is finished the meta-data is stored into the MySQL db for fast queries. In my test it discovered around ~70 users group from about ~18,000 users, then when an user login the system tries to find the best users groups and returns the differences between the users groups (and their friends) and the actual user’s friends. The actual algorithm to discover groups is the K-Means** with some improvements that I’m doing to speed up the process.

Actually, the project could be download from the main repository* and tested on any Buddypress installation. For those sites with less than 1,000 users (during this week I’ll fix it, so it will work with lesser users) I included some dataset that I fetch from Twitter.

