ElasticSearch sharding work

Wanted to share some insights my team has found with changing sharding strategies for ElasticSearch (ES).

Today we moved 100% of my work's search and autocomplete queries from a mutli-sharded index to a single-sharded index.

Short version: Reducing shard count reduces system load, CPU usage, mostly positive results on performance.

Medium version: In moving 100% of our searches and autocomplete queries from a mutli-sharded index to a single-sharded index, we saw drops in cluster load, queries generated by ES, and overall ES CPU usage.

Total Search Load Top CPU Load ElasticSearch Load

We saw improvements in response time for retrieval and ranking from search.

Search Client Ranking Client

Along with response time drops, we saw a smoothing of response times from search in general.

Search TP95

Average query & fetch times increased. While we reduced the total number of queries ES generated, each of those queries and fetches was doing more work.

Fetch TP95 Query TP95

We managed to reduce response from search but autocomplete (which uses the same indice) saw its average response time increase. The response time is now much more stable, and the service performs better under high load. In the graph below, the red line is autocomplete with a single shard, the yellow line is the 5-shard autocomplete response time. As our load increased during the day today, you can see the 5-shard performance steadily decrease while the single shard response remained steady.

Autocomplete TP95