Wikia

Community Central

WatchlistRandom pageRecent changes
Comments24

Lucene Search launching soon

KyleH October 5, 2009

Hi everyone,

This Wednesday, we are tentatively planning to release a new search for Wikia called Lucene Search. This is the search engine that Wikipedia has been using for quite some time, and provides a number of significant advantages:

  • Short words are properly indexed (the old search ignored words with fewer than 4 letters).
  • Numbers are indexed.
  • "Common words" are indexed (words such as "from", "with", and "what" were previously ignored).
  • Search is no longer case sensitive.
  • Relevance should be dramatically improved--results are ranked by how the search terms are used within the page, number of incoming links to the page, and a number of other factors to provide the best possible results.
  • Results display the rendered text of the pages rather than the raw source with MediaWiki markup, which should make them significantly more readable.

You can try out Lucene Search at Fallout and FFXI. If you have any questions or comments, please reply to this blog post.

Thanks! --KyleH@Wikia (talk) 20:36, October 5, 2009 (UTC)

Want to receive updates on the latest Staff blog posts? Then click here to follow this blog.

24 comments


You cannot add a comment to the article.

  • Hello guys, I'm not sure how is wikia internally organized and why mwsearch+lucene-search cannot be deployed out of the box, but I think you will find it easier to adapt these existing extensions than to re-invent the wheel. Making a decent search engine from open-source components is surprisingly difficult as I'm sure you'll soon find out .... so is reinventing did you mean if you want it to do anything better than single-word spellchecking that will give rather disappointing results for most queries (e.g. compare with did you mean of sphinxsearch extension or built-in did you mean in lucene)..

    rainman

  • I have a few bugs (they may be due to Lucene Search, they may not be,) but something is up.

    • Notice this search, the sidebar and that column is entirely off of its position and overlapping the content area (or search area). Noted in Firefox and Chrome.
    • Also, I'm noticing that the search is showing results for pages that have been deleted long ago (does it cache really badly?) results like this should not appear as the file had already been deleted (I recognize this is happening with all page types) over 2 weeks ago. Not to mention it appears as a normal link, not as a red (broken) link.
    • Lastly, I don't think I should be told I'm causing a "System error" when I visit Special:Search, I may not have entered a term to search for, but the message upon arriving there should be different.

    Cheers!

  • Searching for "Fleming" does not show the newly created Jack Fleming.

  • I don't think this happened before the new search. See special:search/Forum:Is the help.wikia search broken?. Clicking the result for Forum:Is the help.wikia search broken? It doesn't seem to find anything gets you nowhere, because the question mark doesn't get a recoding or whatever.

    PS: The "edit to confirm blog unwatch" bug is still active.

  • You're right, I missed that some shared help pages had blank pages created on our wiki, causing these to show up in the search results. My bad.

  • Currently, it will not display results for shared help pages which do not also have a page on the local wiki. Also, note that on most wikis, the Help namespace isn't searched by default--you need to manually select the Help namespace checkbox.

    Also, a quick update: we believe that we have resolved the performance issues that prevented the new search from being released site-wide last Wednesday, so we plan to try again this Wednesday. Sorry for the delay!

  • Occasionally it misses full title matches - might be related to Shared Help. If I type "Help:Images", the search results show all kinds of help pages but not the one which has exactly the name I entered.

  • Also, I'm not fond of getting rid of e.g. "56 KB (8657 words) - 07:03, 30 September 2009" below each result.

  • Kyle thanks. It turns out that Rainman wrote up some notes on search internals that may be of interest to tech heads. For instance:

    • more recent edits score higher than very old.
    • talk pages generally score 50% less

    ... and so on and so forth.

  • It would be greatly appreciated. We get a lot of character name misspellings over at the Stargate wiki and I'm sure it's the same for a lot of the sci-fi/fantasy related wikis.

  • The "Did you mean.." feature will not be included in this release; however, we are considering adding something similar in a future release.

  • What about wikipedia's "Did you mean..." function? If I type "Falout" for example, I'd either get no results, or pages where the word has been misspelled. However in Wikipedia, I also get "Did you mean: Fallout"

  • Now I see. I was thinking in the search suggestions. Wikipedia shows redirections in its suggestions while Wikia doesn't. That's what I mean.

  • The short answer is no, there are no technical docs on the capabilities. The long answer is that we're using a modified version of Extension:MWSearch to query a Solr back-end. (We weren't able to use MWLucene and Lucene-search out of the box because it would require having a separate instance for each of our 53,000 wikis.) The source-code for the front-end is available on our public SVN if you would like to investigate further.

  • Are there technical docs on the capabilities? For example, their ranking algorithm, enhancement plans for range search on numerics, dates, leveraging of rdf/ semantic media properties?

  • It does when it's working properly, but the load problems that are causing us to delay the full-scale roll-out sometimes (rarely) cause the search to return empty results. My search for Fallout 1 returns a lot of results.

  • Good! Nevertheless, typing Fallout 1 in the Fallout wiki gives no result. Neither Fallout (redirect Fallout 1).

  • We've had to delay the full-scale roll-out because of load issues so we won't be launching today; however, you can still try out the new search at Fallout or FFXI.

    Profesor Pokémon: our version of Lucene displays redirects in the form of "Main article title (redirect Redirect title)"

  • Great to see a feature that has nothing to do with User namespace but with the wiki itself. Wikipedia's search supports redirections, will Wikia's?

  • I'm not really that fond of it searching through the content of all templates transcluded on a page. If a wiki uses lots of navbars, like Fallout does, it makes the search results cluttered.

Latest Photos

Add a Photo
7,508photos on this wiki
See more >
Create blog post

Popular blog
posts

See more >

Around Wikia's network

Random Wiki