I suppose it’s my turn. During this time, I’ve been reading some papers related to my project and implementing fast-prototyping if I think it worth.
First, I focused my research on unsupervised learning, implementing the K-means (http://en.wikipedia.org/wiki/K-means) and testing it against a dataset of about 8,000 news headlines. The result could be found http://cesar.la/projects/cluster-abc-ln (Spanish dataset), and for those who are interested in the project, it could be found at http://github.com/crodas/phpcluster/tree/master
Then, I’ve been reading about Hadoop (http://en.wikipedia.org/wiki/Hadoop), then I started writing a set of PHP-wrappers (with Haddop streaming), at the end I reproduced almost the same Java API. This project is not public yet, I will publish it when I need it for my project. You might be wondering “why Hadoop?”, basically it will be useful for big sites that uses BP.
During this week I’ll create a public repo at github, and I will start coding there.
P.S 1: I dunno if there will be a repository for all the projects, but still I’d like to have mine at github.
P.S 2: I don’t know if I can but, I’d like to write my week status report at my blog too.