How Aerospike achieves fine-grained global replication

Aerospike’s Cross-Datacenter Replication with Expressions makes it easy to route the right data at the right time across global applications to meet compliance mandates and reduce server, cloud, and bandwidth costs.

How Aerospike achieves fine-grained global replication
Thinkstock

Modern digital transformation is data-intensive, and presents three issues that conflict with each other in various ways.

First, large amounts of data need to be ingested in real time, and the latest data needs to be available across geographically distributed systems for enterprises to make the most accurate real-time decisions to delight their customers. To date, the majority of companies have simply duplicated and moved large amounts of data to keep up, essentially covering the business need no matter the cost, risk, or impracticality of such large data volumes on a global scale.

Second, there are bandwidth, server, and cloud costs when moving data back and forth between the distributed data management systems that participate in serving the customers of the global enterprise.

Third, there are compliance requirements imposed by systems like CCPA and GDPR that direct which data can move and under what conditions. In fact, any violation of these rules may be audited by regulatory agencies, and steep fines may be levied if any compliance failures are discovered. The fines will be compounded by bad customer experiences and resulting brand damage.

In order for enterprises to build a system that leverages the data for real-time decisions while still maintaining compliance rules and minimizing costs, a distributed data system must be able to provide mechanisms that can make it easy to solve the complex problems that arise in the interplay of these three sets of requirements.

Cross-Datacenter Replication (XDR) in Aerospike Database 5, with Expressions, enables enterprises to build a global data hub that successfully addresses these complex and often changing requirements.

Aerospike XDR filtering with Expressions

Aerospike XDR uses asynchronous replication to connect two or more Aerospike clusters located at multiple geographically distributed sites. A site can be a physical rack in a data center (DC), an entire DC, an availability zone in a cloud region, or a cloud region. XDR can extend the data infrastructure to any number of clusters with control, flexibility, ease of administration, faster writes, and regional autonomy.

The XDR shipping algorithm is based on a record’s last-update-time (LUT), resulting in simpler and more efficient metadata management. LUT-based shipping allows easy resynchronization of a DC starting at a specific point of time in the past. Furthermore, XDR supports dynamic configuration of and independent shipping between any pair of source and target sites.

aerospike xdr 01 Aerospike

Figure 1: Data replication to geographically distributed clusters.

And now, with the new addition of Expressions to XDR, Aerospike can apply fine-grained filtering at the record level to configure XDR routing rules—sending the right data to the right target at the right time.

Expressions is a functional syntax that is more intuitive and expands the scope of what can be used to select records. It allows control over which namespaces, sets, or bins to ship. A namespace is a collection of records that share one specific storage engine. A record can be grouped into sets and also subdivided into bins.

Aerospike nodes can have multiple namespaces, and the different namespaces can be configured to ship to different remote clusters. In Figure 2, DC1 is shipping namespaces NS1 and NS2 to DC2, and shipping namespace NS3 to DC3.

aerospike xdr 02 Aerospike

Figure 2: Flexible clustering.

Enterprises can use this flexibility to configure different replication rules for different data sets. They can configure XDR to ship certain sets to a data center. The combination of namespace and set determines whether to ship a record. Enterprises should use sets if not all data in a namespace in a local cluster needs to be replicated in other clusters. They can also configure XDR to ship only certain bins and ignore others.

Expressions can be set using the new info command xdr-set-filter. They can also be set programmatically via a new client API. The next Aerospike Client release will support C, C#, and Java.

Expressions can be used on any query or API in the database. For example, a secondary index query can have an expression that restricts the amount of queried data. Global enterprises don’t need to worry about how much data they have. With Expressions, only the data that has to move will move—in an efficient way. This dynamic, fine-grained control of Expressions helps global enterprises better comply with a wave of new privacy regulations while reducing server, cloud, and bandwidth resources.

Meeting compliance mandates

GDPR and other emerging European Union and state-by-state US regulations mandate strict rules for how enterprises gather, store, map, use, and share consumer data across regions or partner entities. They require enterprises to ensure they are protecting every aspect of consumer data as indicated by the consumer.

Aerospike XDR with Expressions helps global enterprises to easily and efficiently meet these compliance mandates to safeguard consumer data. It provides the management granularity to help eliminate data redundancies to minimize the footprint of personal data. This helps them comply with what a consumer wants as well as show proof they took proper steps if authorities investigate.

Aerospike XDR with Expressions also allows enterprises to manage data movement across geographies with different data control requirements. Expressions in Aerospike have a rich syntax with access to both record data and metadata, permitting precisely crafted filtering policies.

Every global enterprise requires some way to archive data and back it up in other data centers. Depending on what data regulations exist, they might not want to move data from the Netherlands to the UK, as an example. They could write an expression to enforce that rule at the record level. The following expression demonstrates the enforcement of geographic restrictions on where a record is stored. This pseudocode expression will only ship records to a remote DC if they are tagged as originating from the Netherlands:

Expression exp = Exp.build(Exp.eq(Exp.stringBin(“ISOrgn”),Exp.val(“NL”)))

Additionally, data regulations are often changing requirements. Since Aerospike XDR is dynamic and Expressions are applied in a fine-grained manner, Aerospike reduces the complexity of setting up or reconfiguring rules in the database.

Reducing bandwidth, server, and cloud resources

Aerospike XDR with Expressions provides a level of efficiency when transferring data. Enterprises can decrease the volume of data shipped to destination DCs, which reduces network traffic as well as storage and processing requirements on the destination DC. In Figure 3, with a hub-and-spoke XDR topology, the savings from avoiding overprovisioning of the destination clusters can be significant.

aerospike xdr 03 Aerospike

Figure 3: Aerospike’s XDR topology with Expressions.

There are also additional productivity gains and cost savings to this approach. First, filtering natively in the database can reduce the programming needed on an application-by-application basis. And lower data volumes reduce egress costs when moving data across or from public clouds. It costs money to take data out of the cloud. By using Expressions to restrict transfers to the exact data that needs to leave the cloud, egress costs will decrease.

Simplifying application integration and global data management

Many enterprises are racing forward with digital transformation and the applications behind it in the wake of the pandemic. One issue that can cause these efforts to fail is poor data management, which can drag down business processes and increase costs. Aerospike’s XDR with Expressions delivers precise control of data, without complexity, throughout the application stack—leading to a reduced total cost of ownership to run complex global systems.

Srini Srinivasan is the founder and chief product officer at Aerospike, a leader in next-generation, real-time NoSQL data solutions. He has two decades of experience designing, developing and operating high-scale infrastructures. He also has more than 30 patents in database, web, mobile and distributed systems technologies. He co-founded Aerospike to solve the scaling problems he experienced with internet and mobile systems while he was senior director of engineering at Yahoo.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2021 IDG Communications, Inc.