Algorithms of the Intelligent Web by Haralambos Marmanis and Dmitry Babenko - ISBN: 1933988665 - Manning 2009
Motivation
Understanding the web requires to understand not just the usages or the infrastructure but also how information is being processed to provide better and news experiences which uses increasingly complexes techniques.
Pre-reading model
Draw a schema (using PmGraphViz or another solution) of the situation of the area in the studied domain before having read the book.
Reading
"Unlike traditional applications, intelligent applications adjust their behavior according to their input" (p.xiv)
- 1 What is the intelligent web?
- example of an app that would not just check orthograph or grammar but facts (p2)
- possible with Needs#QA and limited to a personal database of facts?
- see also Chapter5 for a functionnality description
- defining the triangle of intelligence (p5) as aggregated content (raw data), reference structures (knowledge) and algorithms (thinking)
- paragraph on wikis (p9)
- discussing about automatic categorization and how "natural linkage of the pages provides fertile ground for advanced search (chapter 2), clustering (chapter 4), and other analytical techniques."
- "identify the areas where an intelligent component would add most value to your application." (p11)
- 2 Searching
- crawling -> indexing on tokenized content (brief mention of different analyzers) -> ranking -> result of search
- see also ApacheProjects#Lucene
- injecting spam
- update ranking with PageRank used against spam
- using user clicks in a Naive Bayes classifier
- see also RailsCampParis3#FullTextSearch
- 3 Creating suggestions and recommendations
- 4 Clustering: grouping things together
- 5 Classification
- 6 Combining classifiers
- bagging, bootstrap aggregating, introduced only after allowing to check one classifier against another
- chi-square and z statistic
- Cochran’s Q test and the F test
- different strategies, weight, ...
- boosting, iterative improvement
- picking training sets biased toward those instances that were previously misclassified by the ensemble
- "the essence of boosting [...]: find out what you don’t know and bring in someone who does to cover for it" (p265)
- e.g. arc-x4, AdaBoost
- 7 Putting it all together: an intelligent news portal
- rapid review and integration of most techniques that have been explained so far
Tools for examples and todos
See also
Overall remarks and questions
- keep BDD mindset to analyze what is actually the value of Collective Intelligence/ML/AI/... (as it seems to have been done by some during the RailsCampParis3 constantly asking if end-users would actually notice and benefit from it)
- no magic and still hard problems yet partial working solutions
- symbolic AI is looking for abstract structures (ideally one encompassing and efficient abstraction)
- ML is using data (ideally small and up to date)
- what are the most new usages, not techniques, and who applies them?
- what are the most famous non-Java framework?
- outside of WEKA, Mahout, ...
- and why are the Java frameworks so dominant?
- TDD/BDD of assertions/tests to apply to learning machine learning?
- are those computation 1-use only? i.e. no re-use or generalization? hard incremental?
Synthesis
So in the end, it was about X and was based on Y.
Critics
- examples requiring a rather specific environment and yet often not working without modifications
- hard to read code
- nearly no equations
Vocabulary
(:new_vocabulary_start:)
new_word
(:new_vocabulary_end:)
Post-reading model
Draw a schema (using PmGraphViz or another solution) of the situation of the area in the studied domain after having read the book. Link it to the pre-reading model and align the two to help easy comparison.
Categories
Back to the Menu
Other read books linking to the AlgorithmsOfTheIntelligentWeb page :
Back to the Menu