lucene.net guest post up on simo’s blog!

I wrote up a guest post for Simone, about how I’ve used Lucene and Lucene.NET in the past. He’s put it up today!

This includes my experience at IDG (in the first .COM boom/bomb), AfterMail/Quest and on the Top Gear website.

The existing AfterMail product used a very simple index system: break up a document into it’s component words, either by tokenizing the content of an email, or using an iFilter to extract the content of an attachment, and then do a mapping between words and email or attachment primary keys. It was pretty simple, and it worked quite well with small data sets, but the size of the index database compared to the size of the source database was a problem – it was often more than 75%! This was really not good when you have a lot of data in the database. This was combined with not having any relevance ranking, or any other of the nice features a “real” index provides.

We decided to give Lucene a try for second major release of AfterMail. On the same data set, Lucene created an index which was about 20% of the size of the source data, performed a lot quicker, and scaled up to massive data sets without any problem.

Have a read of Simo‘s other Lucene.net articles while you are over there, especially if you have never used Lucene.net before.

About Nic Wise

Nic Wise. I build software. I take photos. Living in London, Loving New Zealand. More info.
This entry was posted in tech. Bookmark the permalink.

Comments are closed.