How to reindex a Solr Database

by Jason on May 22, 2011

The past few months I’ve ventured into new territories such as Hadoop Map Reduce, Amazon Web Services, and the topic of this post Solr.

My experience with Solr has been amazing. The learning curve for this database is VERY light. In the past I’ve attempted to work with Cassandra and Amazon’s Key/Value Pair database, but both suffered from complexity/learning curve issues, limited database drivers, and in Amazon’s case, a lack of sufficient documentation.

Inevitably, after working with Solr for a little while, you’ll think to yourself, “I really need to tweak this field (analyzer, filter, etc)”. If you’re like me, you’ll begin with trial & error. You’ll modify the schema.xml file, re-deploy it, restart the server…. nothing happened? I still see the exact same data. WTF?!

Disappointment sets in when you realize that you have to re-index your data. You read it in the forums but don’t really know what it means. If you were like me you frantically started looking around for a Re-Index button in the Solr Admin, but you won’t find it.

So, I’m here to explain.

There are two methods to re-index your data:

  1. Re-run whatever process(es) initially processed your data set. For me, this wasn’t an option. I am currently gathering several gigabytes of data from a variety of sources and I’m not going to hold on to all of it.
  2. Query Solr, Re-Insert results. Any fields that you have chosen stored=”true” for in your schema.xml will be available to you in original form to re-insert (reindex).

For those interested, my company has allowed me to open-source my PHP script that will help you to re-index your Solr database.

Have a look