These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.Īdditionally, these numbers have to fit into 64 bits. This naturally led us to choose an uncoordinated approach.
![proximity uuid generator proximity uuid generator](https://www.fatalerrors.org/images/blog/752c72c6514bb4807eca9ee90445fb7e.jpg)
We needed something that could generate tens of thousands of ids per second in a highly available manner. Our requirements for this system were pretty simple, yet demanding: Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard). In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. We currently use MySQL to store most of our online data.
#Proximity uuid generator code#
To give everyone a chance to familiarize themselves with the techniques we’re employing and how it’ll affect anyone building on top of the Twitter platform we are open sourcing the Snowflake code base today.īefore I go further, let me provide some context. While we’re not quite ready to make this change, we’ve been hard at work on Snowflake which is the internal service to generate these ids. A while back we announced on our API developers list that we would change the way we generate unique ID numbers for tweets.