The SELECT problem in redis
tl;dr: the need to maintain your own indexes for structured data is the sole reason that makes redis unsuitable for certain applications.
Almost three years ago I wrote a downright cavalier article on why you should always use redis as your database. After trying to follow my own advice in numerous projects, I have realized that using redis as the main database of your project is not always possible. Why? Because of the SELECT problem.
The SELECT problem arises when you want to search your database using the value of a field as a search criteria.
Let’s illustrate this with an example: you want to store a list of users with two fields: email and name.
In a SQL database, you would create a table USERS with two columns, EMAIL and NAME. To retrieve the users with a certain email, you would enter the query SELECT * FROM USERS WHERE EMAIL = “someemail” .
In redis, the most straightforward way to store users would be to create one hash per user. Hashes are the natural choice for storing an entity with fields. So, you would enter your first user doing something like HMSET user:1 email someemail name somename. However, if you do this, you cannot search your users by email or name without having to go through all of them!
In other words, there’s no SELECT FROM WHERE in redis, when you are using hashes. To be able to quickly search users by name and by email, you need to create additional fields where the keys are the values and the values are the keys. There’s more than one way of doing this and I am in possession of no authoritative way of doing it, so I won’t go in detail on how to do this. What I am trying to say here is: if you use redis, you have to do your own indexing.
Now, if you know beforehand which attributes you need for searches, creating a finite amount of indexes is not hard – you can come with a solution that is very fast and doesn’t take up much space, either. However, if 1) you don’t know which fields to index beforehand; or 2) you want to index n fields (where n > 20), your application will become unwieldy.
No matter how much I dislike SQL, I have to admit that it solves this problem beautifully, because it allows you to search quickly using any value in any table. That’s exactly what makes relational databases powerful: the ability to search tuples by the value of one of its subitems.
It is because of this reason (and solely this reason) that I had to employ MySQL, Postgres and MongoDB on a few projects, instead of using redis 100% of the time.
Does this mean that redis is not well-designed? Should redis add this functionality?
Not at all! Redis is the most well-designed piece of software I have ever used. It is a data structures server which does its job wonderfully well. Adding something like indexing over hashes, at the level of redis itself, feels like something sorely out of place.
The way I would go around this problem is to build a layer on top of redis that provides this functionality. That’s the way you should go too, if you were so inclined.
What about other objections to redis?
They are all moot, to the best of my knowledge. The last three years have confirmed that:
- redis is production-ready and is now successfully used by very big players.
- RAM is cheap enough that you can store your data there, as long as it’s not files.
- redis’ persistence is acceptable and extremely configurable, so concerns regarding data volatility are misplaced.