sharding - How to scale write and index's size dynamically with Elasticsearch? -

September 15, 2012

i exploring solutions in order archive , provide web search engine enormous documentation data. have firstly started search looking search engine solution , end conclusion elasticsearch 1 of best 1 when have deal huge amount of data. have read scale , out of box , convinced.

then looked no sql database , because of number of actors, spent more time on searching , have read several resources (no sql distilled, amazon dynamo paper, google bigtable paper, etc.) led me better understanding of distributed system in general. have seen of no sql scalable databases have ability automatically split shard in 2 shards when becomes big.

then realize elasticsearch not provide feature. moreover, believing documentation :http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html

we can not increase number of shards of index after creation. brings questions :

suppose create index specifying number of shards expected traffic/amount of data , 1 day expectation exceeded, haven't enough shard handle write request , index's size, how can manage situation ?

i think found way, if knows elasticsearch can confirm work great, nice.

i have read article , last section inspire me idea:

http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

the idea create 2 alias (index_search , index_write) point @ beginning same index (let's call index_1). imagine 1 day number of shard in index_1 isn't enough, in case, can create new index (let's call index_2) same mappings , number of shard, have added index_1 if have done it.

then, update alias index_search make point "index_1, index_2" (both index_1 , index_2), search made on 2 index. then, update index_write index_2 write made on new shards because shards of index_1 considered full.

in future, add new index (index_3) , map index_search "index_1, index_2, index_3".

of course in our application use alias , never real name of index that, transformation invisible application , not have change code of our application.

example using sense syntax :

put index_1 {     "settings": {         "number_of_shards": 1     } }  post _aliases {     "actions": [        {           "add": {              "index": "index_1",              "alias": "index_search"           }        },         {           "add": {              "index": "index_1",              "alias": "index_write"           }        }     ] }  put index_write/article/1 {     "title":"one first index",     "article":"this article indexed on index_1" }  put index_2 {     "settings": {         "number_of_shards": 2     } }  post _aliases {     "actions": [        {           "add": {              "index": "index_2",              "alias": "index_search"           }        },         {           "add": {              "index": "index_2",              "alias": "index_write"           }        },         {           "remove": {              "index": "index_1",              "alias": "index_write"           }        }     ] }  put index_write/article/2 {     "title":"one second index",     "article":"this article indexed on index_2" }

the problem solution if update document on index_1 while index_write point on index_2, make copy of it. means have search before update in order found real index. can not use command id 1 index_write.

Search This Blog

And

sharding - How to scale write and index's size dynamically with Elasticsearch? -

Comments

Post a Comment

Popular posts from this blog

google app engine - 403 Forbidden POST - Flask WTForms -

Android layout hidden on keyboard show -

Parse xml element into list in Python -