Based on PmWiki:PageListTemplates redirect and "search star" a la Google as suggested by Jonathan

  • grep searchstar /var/log/lighttpd/access.log | awk '{print $7}'
    • note that this is incomplete (because of logrotate) yet not necessarily a bad thing as older results could become less and less relevant
  • log analysis can also be done with the Referrer field
    • can it work without coming from the search page? does the referrer field include the full GET URL?
  • LocalTemplates#searchresultswithredirectwithstarring allow starring
Example
With completion
 (limited to Tools

Visualization of Seeks principle

PmGraphViz

Problematic queries

  • 12/08/11 "MIT logistic map that showed which part of an iPhone came from and where it was assembled, you could see the cradle-to-cradle footprint"

To do

  1. add prototype.js completion solution for names of pages but also previous searches
    1. done before with Seedea search restricted to Seedea:OIMP/
  2. 1 result redirection as done in local instance through Site/LocalTemplates#searchresultswithredirect
  3. compare with Google Instant which seems really close
  4. use Path:/pub/stared_searches.txt to redirect to stared search results, not just pages
  5. extend to content
    1. Keywords#inmybooks
    2. transcripts from videos and audios listed
      1. e.g. PersonalInformationStream.WithoutNotes
      2. patterns of transcripts location
        1. TED http://on.ted.com/23
        2. http://www.google.com/video/upload/video_transcripts.html
        3. http://www.archives.gov/social-media/transcripts.html
        4. http://help.youtube.com/support/youtube/bin/answer.py?hl=en&answer=166810
          1. YouTube's Interactive Transcripts June 2010
      3. Videocrux custom video player displays the TOC alongside the video.
      4. community based websites
        1. http://dotsub.com
        2. http://universalsubtitles.org (by Person:Felipe and others)
        3. http://www.opensubtitles.org
        4. http://www.allsubs.org
  6. include query reformulation
  7. check if it is legal to put PDFs on a public website but only allowed to crawlers e.g. GoogleBot so that it indexes everything one has read but without risking copyright infringement notifications
    1. now supporting indexing of pdf, doc, ppt, txt, etc... (documentation still to write)
  8. consider Cookbook:GlossaryPlus used with a filtered list from e.g. http://sphinxsearch.com/docs/current.html#ref-spelldump
    1. could first link to a search page for those
      1. uncommon expressions
      2. OwnConcepts?
  9. previous searches
    1. last searches
    2. most popular searches
    3. related searches (based on time, location, words, ...)
  10. search within archives
    1. filter off file too larges
    2. check available space
    3. uncompress all in temporary place
    4. index this temporary path
    5. keep archived specific path
    6. delete those uncompressed files
  11. search amongst indexed websites in PIM
    1. remove clutter via http://boilerpipe-web.appspot.com
    2. convert to HTML and ignore it (cf Sphinx option) or convert to text directly
    3. import and index with relevant attributes, e.g. source PIM page
    • note that this must be incremental as very links are added on a daily basis but tons have been added until now

Overall see lab/improve_brain_search and lab/desktopsearch/to_index

See also