Updates from César D. Rodas Toggle Comment Threads | Keyboard Shortcuts

  • César D. Rodas 8:58 pm on July 24, 2009 Permalink
    Tags: Buddypress,   

    Hello, Sorry for my later update. My pr… 

    Hello,

    Sorry for my later update. My project has huge updates since my last report, the most notable are a better UI interface for configuration and the way to append/extend it without modify the core properly.

    The project has two major areas where it could be extended. The most notable is the DataCollector, as you might know in order to work with Recommendations there must a be a set of inputs to analyze (this is not a really hard task since WP has a bunch of hooks), this base class automatically loads sub-classes placed on a given folder (by default ./collectors/) and register all the needed hooks. Just take a look about how easy is to use it http://github.com/crodas/BuddyPress-Recommendation-Engine/blob/66c4586b725e369c297af7585211b7108c296f0e/bp-recommendation/collectors/blogs_visitor.php, it basically registers a single hook (“loop_end”) then it ensures that it is a single page, user is logged and that user being different that the post writer, then the result is stored “somewhere”.

    The other way to extend the project it’s writing a DataCollectorStorage, as its name denotes, it is the way how the DataCollector are stored “somewhere”. It is very important since it might be a huge bottleneck and it might be used carefully. I wrote a sample class that store all the data in the harddriver (in regular files), http://github.com/crodas/BuddyPress-Recommendation-Engine/blob/66c4586b725e369c297af7585211b7108c296f0e/bp-recommendation/collectors_storage/disk.php this isn’t the best best approach for large sites, but it is useful as an example.

    In the UI interface, the site admin could choose what DataCollectors and DataCollectorStorage use, also DataCollectorStorage might have configuration variables, for instance the DiskStorage subclass has a single configuration variable (save directory).

    Right now is missing implement some algorithms, in order to do this is missing a real dataset. Probably (I’ll talk to Andy) it could be installed on the testbp sandbox site (at least some datacollectors).

    Happy hacking to all.

     
  • César D. Rodas 6:52 am on June 25, 2009 Permalink
    Tags: devel, , week-report   

    Hello to everybody, During the past we… 

    Hello to everybody,

    During the past week no changes on the code so far, we (my mentors and I) are about to test the Friends recommendations in the Buddypress demo site (http://testbp.org). I’m also researching a bit about find similar texts, probably this week I will start coding this part of the code.

    Also on my ToDo list for this week is gathered information about what users read in order to improved the recommendation of possible interested post (that will be done by text similarity and tags).

    Happy coding to all.

     
  • César D. Rodas 11:14 pm on June 16, 2009 Permalink
    Tags: Buddpress,   

    Howdy, The past week has been very prod… 

    Howdy,

    The past week has been very productive for my GSoC project. My mentor and I agreed that the project will be written in PHP4, but the learning routines will be written in PHP5. The scripts can’t run from Apache or any other SAPI, it only runs from CLI, that’s because the learning routine takes sometime to compute (and exceeds most of webservers timeouts). The learning scripts can be computed in another machine, and the frequency of re-learning depends of how the site growth, for big sites a re-run for week is fine.

    The friends suggester is now working pretty well. The way it works is simple, it basically discovered groups of users that has something in common (in this case friends), then when the learning process is finished the meta-data is stored into the MySQL db for fast queries. In my test it discovered around ~70 users group from about ~18,000 users, then when an user login the system tries to find the best users groups and returns the differences between the users groups (and their friends) and the actual user’s friends. The actual algorithm to discover groups is the K-Means** with some improvements that I’m doing to speed up the process.

    Actually, the project could be download from the main repository* and tested on any Buddypress installation. For those sites with less than 1,000 users (during this week I’ll fix it, so it will work with lesser users) I included some dataset that I fetch from Twitter.

    Best Regards,

    [*] http://github.com/crodas/BuddyPress-Recommendation-Engine/tree/master
    [**] http://en.wikipedia.org/wiki/K-means_clustering

     
  • César D. Rodas 5:41 am on June 10, 2009 Permalink
    Tags:   

    The next steps for this week are basical… 

    Hello,

    Sorry for my kinda late report, but I was hard coding my GSoC project. The past week I’ve been testing some algorithm to suggest friends (based on what friend they do follow on common). The first hard task was getting some dataset to test some ideas that I had at my tiny little mind, so I wrote a brief Twitter crawler and I download some users (~ 5,000).

    When I downloaded the dataset (it took awhile since Twitter limit the API calls 100/hour), I run the KMeans algorithm (http://github.com/crodas/phpcluster/tree/master) to find groups of similar users (based on what they follow – aka “Friends”), and after 7 minutes of running time it discovered around 50 groups of user. The query process is pretty simple, simply compare the user’s friend against the groups of users, choose the one which is more similar and return the diff of friends as suggestion. It has great results on my tests.

    The main feature of this algorithm is that it can also suggest friends to new users (that did not exist at the computation time) as long as they have some friends. Another key feature is that the computation does not to be very frequently, it depends on how many new friends has been added to the system, probably every week is fine for large BP installations.

    As you may notice, the computation time takes awhile (7min per 5,000 friends with 300,000 nodes of friendship is not that bad) in a near future it will be able to be computed in parallel with Hadoop for largest installations since the KMeans problem could run in parallel, but I’ll cover it later since “premature optimization is the root of evil”.

    The next steps for this week are basically:

    • Move the phpcluster to a PHP4 version and integrate with the system
    • Ping to my mentor to discuss about the off-line computation, how it would be handle and integrated with the system.

    The project has a git public repository (http://github.com/crodas/BuddyPress-Recommendation-Engine/tree/master), actually it has just the skeleton, during the week I’ll upload my test.

    Best regards,

     
  • César D. Rodas 3:04 am on June 2, 2009 Permalink
    Tags: Updates buddypress   

    Howdy, During the past week I have been… 

    Howdy,

    During the past week I have been playing with WordPress MU and Buddypress. Basically I installed it, then I read its documentations. Also, I spend sometime testing (locally yet) some algorithms to discover similar blog-entries (based on the content).

    In this new week I’ll contact my mentors (Alex and Andy) in order to discuss some details, mainly about from where I should gather user’s information, in order to get the *knowledge* for the suggestions.

    During this week, I’ll also focus on:

    • Design the component itself, it will be full OOP, I’ll use PHP5 if my mentors are agreed with it, or in PHP4.
    • Implement action, hooking actions to gather information.
    • Start mixing my algorithms with the component.

    Best regards,

     
  • César D. Rodas 1:32 pm on May 25, 2009 Permalink
    Tags:   

    Howdy! I suppose it’s my turn. During … 

    Howdy!

    I suppose it’s my turn. During this time, I’ve been reading some papers related to my project and implementing fast-prototyping if I think it worth.

    First, I focused my research on unsupervised learning, implementing the K-means (http://en.wikipedia.org/wiki/K-means) and testing it against a dataset of about 8,000 news headlines. The result could be found http://cesar.la/projects/cluster-abc-ln (Spanish dataset), and for those who are interested in the project, it could be found at http://github.com/crodas/phpcluster/tree/master

    Then, I’ve been reading about Hadoop (http://en.wikipedia.org/wiki/Hadoop), then I started writing a set of PHP-wrappers (with Haddop streaming), at the end I reproduced almost the same Java API. This project is not public yet, I will publish it when I need it for my project. You might be wondering “why Hadoop?”, basically it will be useful for big sites that uses BP.

    During this week I’ll create a public repo at github, and I will start coding there.

    Best regards,
    Happy coding!

    P.S 1: I dunno if there will be a repository for all the projects, but still I’d like to have mine at github.
    P.S 2: I don’t know if I can but, I’d like to write my week status report at my blog too.

     
    • Sergio 5:58 am on October 5, 2009 Permalink

      А если посмотреть на это с другой точки зрения то не все так гладко получается

  • César D. Rodas 9:17 pm on April 21, 2009 Permalink
    Tags:   

    Hello, this is César Rodas from Paragua… 

    Hello, this is César Rodas from Paraguay, GMT -4. Very exiting to be here for the second time.

    This year I’m going to work with to implement some social algorithm and text categorization for Buddypress.

    My mentor are Andy Peatling and Alex Shiels.

    Currently I’m studying Computer Programming at the Universidad Nacional de Asunción (www.una.py).

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel