Solr from the field -- Lessons learned while maintaining over 30 billion documents

Click for: original source

An article published by Alex Puschinsky about WalkMe Insights experience with SOLR and processing of vast amount of data. They provide real-time search and analytics capabilities. To achieve this, they have chosen Apache Solr as the core of our WalkMe Insights search functionality.

Before throwing hardware at your performance issue, try to figure out the root cause of your problem. There is a good chance you’ll find that your issue arises from incorrect Solr usage rather than insufficient hardware.

The data we collect is largely comprised of end-user interactions with websites, i.e., mouse clicks, URL transitions, text inputs, and even WalkMe customer-defined custom events. Currently, we hold around 30 billion such events, a number that is growing rapidly and is expected to accelerate given WalkMe’s exponential growth.

The article will walk you through:

  • The infrastructure – large Solr index requires a sharding solution
  • Capacity planning
  • Hardware recommendations
  • Understanding of Solr query execution
  • Wildcard queries can be evil
  • Facet performance

… and much more. If you need high performance real time search capabilities this is excellent resource and starting point for you!

[Read More]

Tags apache search web-development