Category Archives: sharding

my version of consistent hashing

When sharding, a common method of distributing your key randomly is to use some sort of hashing algorithm in your code, which is not time dependent, key dependent (i.e. evens, odds), etc.. You also want to keep in mind the work in reallocating data if you either add or remove shards.

The is a good amount of material on the subject,

here are a few links,

http://michaelnielsen.org/blog/consistent-hashing/
http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html

and the original paper,

http://www.akamai.com/dl/technical_publications/ConsistenHashing…

if you like technical papers.

Why am I writing about it again? I don’t love the visual explanation out there (the circle) and breaking it into whole sections, so I’m going to explain it how I understand it best.

Naive hashing

Let’s say you had shards 1, 2 and 3, and wanted to add shard 4,

with naive hashing, you do some sort of modulus of the hash to assign it to a shard.

whichshard = hash('key') % N

where N = number of shards, % is the modulus and produces a number between 1 and N, which would be the number of the shard. A typical hash function might be crc32().

reallocation of data when adding or removing a shard is a real problem with this method. To demonstrate, we’ll assign some data represented by numbers 1-> 15 through out the nodes.

 shard1 1, 4, 7, 10, 13
 shard2 2, 5, 8, 11, 14
 shard3 3, 6, 9, 12, 15

now let’s add a shard, and redistribute these same pieces of data,

 shard1 1, 5, 9, 13
 shard2 2, 6, 10, 14
 shard3 3, 7, 11, 15
 shard4 4, 8, 12

you’ll see that when you compare the data, most of them have shifted to other shards. in fact 3/4 of your data will be moved for 4 shards, 7/8 for 8 shards, etc… you’ll
have to move almost all your data each time you adjust the number of shards. This is because your algorithm using a modulus is accounting for an extra shard.

consistent hashing

So we turn to consistent hashing, and the explanation is usually accompanied by a unit circle, where it’s divided up into sections, one per shard and a range of values can be assigned per shard. It is then explained to add a shard, that you essentially divide one of these sections into two, and move only the keys you need from the shard you divided to the new one. Of course you’ll now have two smaller shards that are half the size of the rest of the shards, when of course you want to balance the load on all shards. The explanation is then instead of dividing one shard into two sections, take bits from all the shards to make your new shard. Now why push the idea that the shards are assigned full sections (i.e. 0 -> 1/5 of 15  = (1, 2, 3)) when your additional shard will need to be assigned random values (4, 9, 14)? Let’s just start with a random distribution for all shards so there is the understanding it doesn’t matter what the algorithm assigned to what node, as long as it assigns the same thing with additional shards in the mix. So let’s start with the same distribution as before (but really, any of these numbers could be assigned anywhere)

 shard1 1, 4, 7, 10, 13
 shard2 2, 5, 8, 11, 14
 shard3 3, 6, 9, 12, 15

now let’s add a shard, and redistribute manually,

 shard1 1, 4, 7, 10
 shard2 2, 5, 8, 11
 shard3 3, 6, 9, 12
 shard4 13, 14, 15

We see that we have chosen to assign node4 with a result from each on the first three shards, it’s does not matter which ones go to node4, and the result is none of the first three shards need data moved onto them, only moved off to the new shard. It also only accounts for 1/5 of the data; if we had 20 shards, we would only have to move about 1/20 of the data. Obviously you are going to have some sort of logic that maps the formula results to a node,

if (result in (2,5,8,11)){
   shard=2
}
etc...

so for our demo, how does getting a random value from 1 to 15 work in a formula? We now have,

whichshard = ceiling(hash('key')/maxvalue(hash())*N)

ceiling would be a function that rounds up to the closest result N. N normalizes the result so instead of an answer between 0 and 1 to give you a result between 1 and N, matching your assigned values.

if we are using crc32, the max value is 4294967295, so something like,

whichshard = ceiling(crc32('key')/4294967295 * N)
e.g.. -> ceiling(crc32('1234567')/4294967295 * 15) = 5

and we see that the result ’5′, we assigned to shard2, and will always be assigned to shard2 unless we move it to another shard in our grouping definition.

Lastly, you can run the above formula as a select SQL statement, but in production, do it in your code so you are not hitting the db.