Tools
Sphinxsearch

From personal use
- verify that indexes have to be rebuild localy
- especially sharing indexes between 32 and 64bits instances
- Cookbook:Sphinx
- on Debian Squeeze the PHP API is not included but can be downloaded with tests in the branch of the installed version (here 0.9.9)
- else consider the trunk
- even though sorting by
RELEVANCEis available from the indexer by default via the API it still requires to properly select the matching modeSPH_MATCH_EXTENDED2and the ranking mode e.g.SPH_RANK_PROXIMITY_BM25- it can be tricky at first since results all have the same weight thus are properly rank... by document ID (thus mostly randomly)
- used as Keywords#brain
- interface as pim_search.php
- should consequently update browser_queries
- clarify the setup
- add
pmwiki-to-sphinxxmlandsphinx_sources - usage of crontab for rsync then indexer
- add
- interface as pim_search.php
- note that inheritance in the configuration only apply to parameters, new indexes still have to be generated from scratch thus takes about the same resource
- Steps (with their specific vocabulary)
- pick data sources
- create and test driver with its pitfalls
- configure sphinx to load the drivers
- run the drivers through the
indexer - make the
indexerresult available throughsearchd - query
searchdvia the api - encapsulate the query and its results in an interface
- Details on this installation
- disabled mysql, only used for test1 tried during installation and configuration
indexer --all --rotateto avoid having to manually stop and start it again, done via Crontab- still seems to create some down time, to check
- performances
collected 1962 docs, 26.5 MB, sorted 4.5 Mhits, total 1962 docs, 26545570 bytes total 17.149 sec, 1547933 bytes/sec, 114.40 docs/sec total 28 reads, 0.052 sec, 430.9 kb/call avg, 1.8 msec/call avg total 34 writes, 0.126 sec, 768.6 kb/call avg, 3.7 msec/call avg
- real 3m1.913s/user 0m28.142s/sys 0m15.921s for
indexer --all --rotate - average of 0.001second per query and always under 0.1sec including the first query of the day
- see query.log for details
To explore
- recommendation of PIM pages via Greasemonkey
GM_xmlhttpRequest()regarding the current visited page - distributed indexes
- go beyond rsync
- yet with privacy concerns, thus first to keep under a protected path then split
- particularly useful if local failover is possible
- own ideas LearningSearch
indexer --buildstops output.txt N --buildfreqsas a way to replace shell_scripts/pmwiki_keywords_distribution
- N=20 found words less than 3 char and "utopiah"
- might not be that relevant when then index mixes 3 very different sources
- Introduction to Search with Sphinx by Andrew Aksyonoff, O'Reilly Media April 2011
- Wikipedia:Inverted index
- tutorial or review of Sphinx for desktop search
- e.g. using man:pdftotext
- Best Practices for several servers, Sphinx forum
- Why is internal search so hard? by Espen Andersen, Applied Abstractions February 2012
- http://www.mediawiki.org/wiki/Extension:SphinxSearch/Page_rank
- to adapt for PmWiki
See also
- #sphinxsearch on freenode
- http://sphinxsearch.com/wiki/doku.php?id=sphinx_docs#sorting_modes
- actually better presentation than the official http://sphinxsearch.com/docs/current.html
- http://sphinxsearch.com/wiki/
- RailsCampParis3#FullTextSearch
- MySQL
Note
My notes on Tools gather what I know or want to know. Consequently they are not and will never be complete references. For this, official manuals and online communities provide much better answers.
CONTENT
CONTACT
UPDATES
LAST TWEET

RSS for this page only


