0

The documentation states for sphinx-0.9.9-rc2:

The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.

However, I can't find any documentation on setting up a a source besides SQL. The config file doesn't seem to indicate that the source can be anything but a database. Anyone have any helpful links for setting up sphinx with an HTML source?

Tyler K
  • 526
  • 1
  • 4
  • 12

1 Answers1

1

Are you looking for the xmlpipe (now called xmlpipe2) feature on Sphinx? I've tried it out for XML files and it works just like it does for SQL.

I haven't tried out Sphinx with vanilla HTML files, so I'm guessing you'll need to parse your HTML file and create XML files with the attributes/fields that you want indexed and feed them to Sphinx using xmlpipe.

You can see here and here for more.

HTH

Arun
  • 118
  • 5
  • No, I specifically wanted to read in html files, index them and then use that to build a search engine for my site. I've given up on trying to use Sphinx and have approached the problem from another way. Here is the most information I was able to find, for anyone else looking: http://www.sphinxsearch.com/forum/view.html?id=3867 – Tyler K Jul 30 '09 at 14:19