From personal use

  • verify that indexes have to be rebuild localy
    • especially sharing indexes between 32 and 64bits instances
  • Cookbook:Sphinx
  • on Debian Squeeze the PHP API is not included but can be downloaded with tests in the branch of the installed version (here 0.9.9)
  • even though sorting by RELEVANCE is available from the indexer by default via the API it still requires to properly select the matching mode SPH_MATCH_EXTENDED2 and the ranking mode e.g. SPH_RANK_PROXIMITY_BM25
    • it can be tricky at first since results all have the same weight thus are properly rank... by document ID (thus mostly randomly)
  • used as Keywords#brain
    • interface as pim_search.php
    • should consequently update browser_queries
    • clarify the setup
      • add pmwiki-to-sphinxxml and sphinx_sources
      • usage of crontab for rsync then indexer
  • note that inheritance in the configuration only apply to parameters, new indexes still have to be generated from scratch thus takes about the same resource
  • Steps (with their specific vocabulary)
    1. pick data sources
    2. create and test driver with its pitfalls
    3. configure sphinx to load the drivers
    4. run the drivers through the indexer
    5. make the indexer result available through searchd
    6. query searchd via the api
    7. encapsulate the query and its results in an interface
  • Details on this installation
    • disabled mysql, only used for test1 tried during installation and configuration
    • indexer --all --rotate to avoid having to manually stop and start it again, done via Crontab
      • still seems to create some down time, to check
    • performances
      collected 1962 docs, 26.5 MB, sorted 4.5 Mhits, total 1962 docs, 26545570 bytes
      total 17.149 sec, 1547933 bytes/sec, 114.40 docs/sec
      total 28 reads, 0.052 sec, 430.9 kb/call avg, 1.8 msec/call avg
      total 34 writes, 0.126 sec, 768.6 kb/call avg, 3.7 msec/call avg
    • real 3m1.913s/user 0m28.142s/sys 0m15.921s for indexer --all --rotate
    • average of 0.001second per query and always under 0.1sec including the first query of the day
      • see query.log for details

