Fabien Benetou's PIM | Wiki / LearningSearch

Based on PmWiki:PageListTemplates redirect and "search star" a la Google as suggested by Jonathan

grep searchstar /var/log/lighttpd/access.log | awk '{print $7}'
- note that this is incomplete (because of logrotate) yet not necessarily a bad thing as older results could become less and less relevant
log analysis can also be done with the Referrer field
- can it work without coming from the search page? does the referrer field include the full GET URL?
LocalTemplates#searchresultswithredirectwithstarring allow starring

Example
With completion	(:searchboxcomplete :)
	(limited to Tools

Visualization of Seeks principle

PmGraphViz

Problematic queries

12/08/11 "MIT logistic map that showed which part of an iPhone came from and where it was assembled, you could see the cradle-to-cradle footprint"
- tried Keywords#brain (thus now via Sphinxsearch with "carbon" and "footprint" and both without success
- finally found sourcemap.com via Google after asking on Blinkenshell without succes
- Greasemonkey#RevertedPIMLinks pointed back to Projetautonomieenergetique

To do

~~add prototype.js completion solution for names of pages but also previous searches~~
1. done before with Seedea search restricted to Seedea:OIMP/
1 result redirection as done in local instance through Site/LocalTemplates#searchresultswithredirect
compare with Google Instant which seems really close
use Path:/pub/stared_searches.txt to redirect to stared search results, not just pages
extend to content
1. Keywords#inmybooks
2. transcripts from videos and audios listed
  1. e.g. PersonalInformationStream.WithoutNotes
  2. patterns of transcripts location
  3. Videocrux custom video player displays the TOC alongside the video.
  4. community based websites
    1. http://dotsub.com
    2. http://universalsubtitles.org (by Person:Felipe and others)
    3. http://www.opensubtitles.org
    4. http://www.allsubs.org
include query reformulation
check if it is legal to put PDFs on a public website but only allowed to crawlers e.g. GoogleBot so that it indexes everything one has read but without risking copyright infringement notifications
1. now supporting indexing of pdf, doc, ppt, txt, etc... (documentation still to write)
consider Cookbook:GlossaryPlus used with a filtered list from e.g. http://sphinxsearch.com/docs/current.html#ref-spelldump
1. could first link to a search page for those
  1. uncommon expressions
  2. OwnConcepts ?
previous searches
1. last searches
2. most popular searches
3. related searches (based on time, location, words, ...)
search within archives
1. filter off file too larges
2. check available space
3. uncompress all in temporary place
4. index this temporary path
5. keep archived specific path
6. delete those uncompressed files
search amongst indexed websites in PIM
1. remove clutter via http://boilerpipe-web.appspot.com
2. convert to HTML and ignore it (cf Sphinx option) or convert to text directly
3. import and index with relevant attributes, e.g. source PIM page
- note that this must be incremental as very links are added on a daily basis but tons have been added until now

Overall see lab/improve_brain_search and lab/desktopsearch/to_index

Learning Search {Wiki}

Visualization of Seeks principle

Problematic queries

To do

See also