Main differences with classic object storage solutions (part 1)
It’s often a question we receive when we introduce and present our solution: “We understand object storage but what are your differentiators with others ?”
Let me first tell you that 2 populations exist, first a group who discovers Object Storage and wish to learn and understand more and second a group of people who are already familiar with the technology and wish to get the details and differences with other products.
To the 1st group, the answer is almost the same from all players, object storage is a technology that helps to build and run large data storage environments, facilitate scalability of the platform with no real limits, expose with object APIs and usually operate on commodity Linux servers at a reasonable cost. Many flavors exist just to say that some vendors offer the solution as just software or installed and sold as appliance.
For the 2nd group, it’s a bit different as they understand the basic definition and for some of them already experienced some products. Questions are then really precise. Among them the main point is around scalability, SLA consistency and some potential impact on the service delivered by the platform. We understand that these points can create some issues for classic object storage solutions that are based on a DHT / Consistent-Hashing based scheme for data placement. For OpenIO, this is pretty simple as the design chosen almost 10 years ago has integrated and anticipated this potential future issue.
So what is the problem ? In fact, with these traditional object storage, every time a change occurs in the topologies, the system reacts and triggers a rebalance mechanism to reflect the new address scheme and satisfy the protection policy. The result is obvious, there is an impact on the service delivered by the platform. We can easily understand that the platform must provide a transparent and automatic protection but how to reduce or even make this fully transparent for the platform. Some people will argue that this is a key attribute of an autonomous platform. Two alternatives for such approaches, first, you give a very low priority to the rebalance mechanism, it will last longer and data could be exposed to a new or second failure and again add a new load for recovery or second, you give to the rebalance a very high priority to reduce data failure exposure and allow limited amount of time with mixed internal workload: workload to serve data for the service the platform was defined to plus the workload for cluster health.
OpenIO engineers have made some choices to avoid unnecessary rebalance. When you add servers, these servers will receive new data, they’re integrated to the data placement scheme. With this approach, the service platform can be considered and deployed for demanding environments and satisfy strict SLA requirements. It is perfectly illustrated by large email storage deployments made in France for the 2 main telco operators supporting multi dozens millions of users. Of course, it a rebalance is needed, OpenIO can trigger one. Other products don’t have the choice, OpenIO has these 2 methods ready, proven and deployed at large scale.
So when you select an object storage, please ask this question: “How does the cluster react when servers are added ?” Very simple request and you’ll get where the product is.