I suppose it’s my turn. During this time, I’ve been reading some papers related to my project and implementing fast-prototyping if I think it worth.

First, I focused my research on unsupervised learning, implementing the K-means ( and testing it against a dataset of about 8,000 news headlines. The result could be found (Spanish dataset), and for those who are interested in the project, it could be found at

Then, I’ve been reading about Hadoop (, then I started writing a set of PHP-wrappers (with Haddop streaming), at the end I reproduced almost the same Java API. This project is not public yet, I will publish it when I need it for my project. You might be wondering “why Hadoop?”, basically it will be useful for big sites that uses BP.

During this week I’ll create a public repo at github, and I will start coding there.

