Algorithms Of The Intelligent Web
Algorithms of the Intelligent Web by Haralambos Marmanis and Dmitry Babenko - ISBN: 1933988665 - Manning 2009
Motivation
Understanding the web requires to understand not just the usages or the infrastructure but also how information is being processed to provide better and news experiences which uses increasingly complexes techniques.
Pre-reading model
Draw a schema (using PmGraphViz or another solution) of the situation of the area in the studied domain before having read the book.
Reading
"Unlike traditional applications, intelligent applications adjust their behavior according to their input" (p.xiv)
- 1 What is the intelligent web?
- example of an app that would not just check orthograph or grammar but facts (p2)
- defining the triangle of intelligence (p5) as aggregated content (raw data), reference structures (knowledge) and algorithms (thinking)
- paragraph on wikis (p9)
- discussing about automatic categorization and how "natural linkage of the pages provides fertile ground for advanced search (chapter 2), clustering (chapter 4), and other analytical techniques."
- "identify the areas where an intelligent component would add most value to your application." (p11)
- 2 Searching
- crawling -> indexing on tokenized content (brief mention of different analyzers) -> ranking -> result of search
- see also ApacheProjects#Lucene
- injecting spam
- update ranking with PageRank used against spam
- using user clicks in a Naive Bayes classifier
- see also RailsCampParis3#FullTextSearch
- crawling -> indexing on tokenized content (brief mention of different analyzers) -> ranking -> result of search
- 3 Creating suggestions and recommendations
- similarity, mathematical distance and its 4 properties, metrics in general
- Wikipedia:Collaborative filtering
- Wikipedia:Jaccard metric
, Wikipedia:Jaccard index
, Wikipedia:Cosine similarity
- Wikipedia:Pearson correlation
- see also Recked: A Night of Recommendation Technologies held in January 2009 discovered earlier for PersonalInformationStream#Sources
- 4 Clustering: grouping things together
- nice viz (p135)
- mention of Wikipedia:Curse of dimensionality
- see also
- gCluto Graphical Clustering Toolkit by Matt Rasmussen
- Single-Link, Complete-Link & Average-Link Clustering, book chapter Hierarchical clustering of Introduction to Information Retrieval, Cambridge University Press 2008
- 5 Classification
- introduction on the value of having proper classes and the importance of hierarchies
- see also on ontologies the SemanticWeb page
- mention of Wikipedia:Rete algorithm
and Drools for JBoss implementation
- on Prolog, backward and forward chaining, see class AI01/AI02 at UTC
- mention of http://ai.eecs.umich.edu/cogarch0/common/issue/utility.html
- see also QuantitativeTrading#Chapter3 on backtesting
- introduction on the value of having proper classes and the importance of hierarchies
- 6 Combining classifiers
- bagging, bootstrap aggregating, introduced only after allowing to check one classifier against another
- chi-square and z statistic
- Cochran’s Q test and the F test
- different strategies, weight, ...
- majority vote
- boosting, iterative improvement
- picking training sets biased toward those instances that were previously misclassified by the ensemble
- "the essence of boosting [...]: find out what you don’t know and bring in someone who does to cover for it" (p265)
- e.g. arc-x4, AdaBoost
- bagging, bootstrap aggregating, introduced only after allowing to check one classifier against another
- 7 Putting it all together: an intelligent news portal
- rapid review and integration of most techniques that have been explained so far
Tools for examples and todos
- http://code.google.com/p/yooreeka/
- http://www.manning-sandbox.com/forum.jspa?forumID=438&start=0
- manually added
System.setProperty("iweb2.home","c:/iWeb2");todeploy/bin/.bshrcto have the proper path - http://www.beanshell.org
See also
- IntelligentBio
- WatchingNotes#MachineLearningCS229
- Mathematics#Statistics
- ApacheProjects#Mahout
- http://seeks-project.info/wiki/index.php/Veille_Services_Machine_Learning
- http://www.marmanis.com
- reviews
- Russ Abbott on Amazon
- by Ana Gabriela Maguitman, Journal of Computer Science and Technology (JCS&T) 2010
Overall remarks and questions
- keep BDD mindset to analyze what is actually the value of Collective Intelligence/ML/AI/... (as it seems to have been done by some during the RailsCampParis3 constantly asking if end-users would actually notice and benefit from it)
- no magic and still hard problems yet partial working solutions
- symbolic AI is looking for abstract structures (ideally one encompassing and efficient abstraction)
- ML is using data (ideally small and up to date)
- what are the most new usages, not techniques, and who applies them?
- what are the most famous non-Java framework?
- outside of WEKA, Mahout, ...
- and why are the Java frameworks so dominant?
- TDD/BDD of assertions/tests to apply to learning machine learning?
- are those computation 1-use only? i.e. no re-use or generalization? hard incremental?
Synthesis
So in the end, it was about X and was based on Y.
Critics
- examples requiring a rather specific environment and yet often not working without modifications
- hard to read code
- nearly no equations
- and the few present can be... false! e.g. http://www.manning.com/marmanis/excerpt_errata.html regarding Bayes theorem
Vocabulary
(:new_vocabulary_start:) new_word (:new_vocabulary_end:)
Post-reading model
Draw a schema (using PmGraphViz or another solution) of the situation of the area in the studied domain after having read the book. Link it to the pre-reading model and align the two to help easy comparison.
Categories
Back to the Menu
CONTENT
CONTACT
UPDATES
LAST TWEET


RSS for this page only


