The most important new features in CockroachDB

CockroachDB 20.2 brings a Kubernetes operator, spatial data, a new storage engine, SQL enhancements, and much more, extending the range of workloads for which the database can be used.

The most important new features in CockroachDB
Thinkstock

CockroachDB was architected from the ground up to be cloud-native, so that it can scale elastically and survive any failure, natively, without any additional setup or configuration. Since its inception, the Cockroach Labs team has made regular updates and improvements to the distributed database. Today, this database helps thousands of developers more efficiently build data driven applications in the cloud and is the foundation of many of the game-changing applications that are driving the modern, digital economy.

The latest release, CockroachDB 20.2, includes support for spatial data and introduces a custom Kubernetes Operator. All the while, the team continues to improve the security and management capabilities of the database, along with considerable improvements to the overall performance of CockroachDB. Notably, with this release, the majority of these new capabilities are available in the free version of the database, CockroachDB Core.

CockroachDB on Kubernetes

Unlike traditional monolithic databases, CockroachDB was architected from the ground up to deliver on the core distributed principles of Kubernetes. Cockroach Labs defines the core principles of a Kubernetes-native database with four key descriptors: disposability, API symmetry, shared-nothing architecture, and horizontal scaling:

  • Disposability refers to the ability of a database to handle when processes stop, start, or crash with little-to-no notice.
  • API symmetry provides consistency of query responses to scale without disrupting existing application instances.
  • Shared-nothing ensures that a database can operate without any centralized coordinator or single point of failure.
  • Horizontal scaling gives the impression of a single database that magically becomes twice as powerful by scaling out linearly rather than exponentially.

With CockroachDB 20.2, Cockroach Labs is introducing CockroachDB on Kubernetes, which packages the distributed database with a new, open-source Kubernetes operator (currently in beta). Cockroach Labs has learned a lot about Kubernetes over the past few years by deploying and managing thousands of clusters on its own database-as-a-service, CockroachCloud. Now the team has packaged many of those learnings into a Kubernetes operator.

While legacy databases can run with Kubernetes, they are simply not designed to run on Kubernetes and fail to deliver on the ease of scale and resilience that this cloud-native platform provides. They hold back workloads from truly taking advantage of the platform.

With CockroachDB on Kubernetes, the relational database that provides your distributed services and applications in Kubernetes are efficiently orchestrated in the same environment. Building on Stateful Sets, you simply attach storage to each Kubernetes pod and CockroachDB handles scale, resilience and distribution of data. There is no need for additional, complex tasks to manage shards or deal with the inevitable pod failures.

CockroachDB on Kubernetes allows you to: 

  • Deploy with an operator, simplifying basic configuration and common installation tasks.
  • Roll out upgrades, enabling you to apply fixes and upgrades incrementally across nodes (pods) in production, and apply schema changes on the fly.
  • Scale your database with ease, by spinning up new instances and scaling without manual sharding of data or manipulation of the database.
  • Survive pod failures, by automating replication of data across nodes (pods) so you can survive any failure and avoid downtime

Spatial data in a distributed database

In CockroachDB 20.2, the focus is on creating developer tools that will extend the number of workloads for which the database can be used. This release introduces the ability to store and index spatial data using PostGIS-compatible SQL syntax in the free version of the database, CockroachDB Core. The combination of a distributed database with spatial capabilities opens the door for even more innovation and will empower new workloads for Internet of Things (IoT), transportation, and environmental applications.

Spatial data powers some of the world’s most innovative apps and services, letting you answer questions like, “Where’s the nearest gas station?,” “How long will it take for my ride-sharing vehicle to arrive?,” and even “Where can I catch a Pokémon?” The only problem is, this data has been locked away in brittle legacy or separate specialized databases, making it difficult for developers to support large datasets in the cloud.

This is the first cloud-native SQL database to include spatial data types and associated libraries and build this functionality from the ground up for a distributed environment. With CockroachDB 20.2, Cockroach Labs gives spatial data the same first-class treatment as other data types, making it easier to develop applications that use it. CockroachDB users can now effortlessly scale spatial data and have the confidence it will survive outages.

CockroachDB now supports the following spatial features, all of which are open source, available for free, and accessible with PostGIS-compatible SQL:

Pebble storage engine

Earlier this year, Cockroach Labs built a new, open-source storage engine, Pebble, which in CockroachDB 20.2 replaces the database’s previous storage engine, RocksDB. An open-source key value store written in Go, Pebble brings a number of improvements to CockroachDB:

  • Better performance and stability
  • Avoids the challenges of traversing the C-Go boundary
  • Gives more control over future enhancements tailored for CockroachDB needs

Pebble is the default storage engine in CockroachDB 20.2, with the option to enable RocksDB if desired.

TPC-C performance update

Cockroach Labs is committed to constantly improving CockroachDB’s performance and has made significant advances with CockroachDB 20.2.

TPC-C is the industry standard transactional database benchmark, simulating an e-commerce environment. The company has written a lot about TPC-C in the past as we think it is the best measure of OLTP database workloads. CockroachDB 20.2 passed TPC-C with a maximum volume of 140K warehouses (previously we reported 100K) and a maximum throughput of 1.7M transactions per minute (tpmC), which represents a 40% performance improvement over the past year.

Cockroach Labs also ran TPC-H, which extends its benchmarking work with complex analytic queries. While CockroachDB is primarily a transaction-oriented database, it can also perform complex joins and aggregations that are best measured through a benchmark like TPC-H. On the TPC-H benchmark, the team saw a decrease in query latency for 20 out of the 22 queries with TPC-H Query 9 latency improving by 80x.

Backup and restore comes to CockroachDB Core

CockroachDB Core, the community option, enables users to build scalable production applications for free. With each new release, Cockroach Labs carefully reviews all the capabilities to determine if any existing or new features should be placed into Core. The company outlined a set of guidelines to help them make these determinations and it seems they increasingly err on the side of Core these days.

Cockroach Labs has added more advanced backup and restore capabilities to Core, including BACKUP, RESTORE, and EXPORT. CockroachDB Core clusters have grown to support terabytes of data, and Cockroach Labs recognizes that scalable, distributed backups are crucial for these types of production applications. These additions will help community users achieve both easier scale and peace of mind in production, with rock-solid disaster recovery plans.

Management improvements and easier debugging

CockroachDB is already low-touch, requiring minimal operations, and CockroachDB 20.2 enables teams to save even more time with scheduled automated backups, faster data imports, and more options for bulk data imports. Furthermore, CockroachDB’s monitoring UI displays key metrics that are critical for troubleshooting. CockroachDB 20.2 also includes a SQL transactions page and database sessions page to help developers understand query performance.

Additional SQL functionality

CockroachDB is wire compatible with PostgreSQL and delivers standard SQL syntax, so you can use the database as your next generation relational store. In order to help ease deployments, in 20.2, Cockroach Labs also improved its SQL capabilities to enable developers to access data in a familiar way, adding:

  • User-Defined schemas: Structure your data hierarchy with schemas, which are commonly used in relational databases including PostgreSQL. This update makes CockroachDB more familiar for developers, more compatible with PostgreSQL applications and tools, and more flexible in its support for different data isolation patterns such as microservices.
  • Partial indexes: Index only the subset of rows needed for fast reads. More precise indexing reduces the amount of data stored by your indexes and therefore the performance impact on writes to data that does not need to be indexed.
  • Materialized views. Reduce costs for frequently-run queries by caching query results in-memory and updating only when necessary.
  • Enumerated types (ENUMs): With this popular data type, you can restrict inputs to a defined set of values like a drop-down list.
  • Improved performance of foreign keys. As a crucial component of relational databases, foreign keys protect data integrity by creating references between two tables to ensure the entry into one table is a valid entry into the other. In CockroachDB 20.2, performance improvements in foreign keys will let more customers use them.

Enhanced support for Java and Ruby

CockroachDB supports a variety of popular data access tools, including ORMs, making it easier to develop in your preferred programming language. With CockroachDB 20.2 the company improved support for Java by adding better compatibility with Hibernate, MyBatis, Spring Data JPA, and Spring Data JDBC, and improved support for Ruby by adding compatibility with Active Record. It also built out an adaptor for the Go data access layer upper/db.

CockroachDB for modern cloud applications

Cockroach Labs is committed to making CockroachDB the database of choice for developers everywhere, no matter the use case. With CockroachDB 20.2, the company has listened to users and made improvements to all areas of the database, as well as introduced a new package, CockroachDB on Kubernetes, to ease deployments in cloud-native environments.

Hundreds of organizations, including Comcast, DoorDash, eBay, Nubank, and SpaceX, use CockroachDB as the backbone for transactional applications in the cloud. The improvements in CockroachDB 20.2 strengthen the product’s position as the right database for the latest wave of data-intensive applications. It gives you a flexible database for modern cloud applications that automatically handles all rote operations, scaling, and resilience. And that means you can focus your energy and output on developing, creating, and innovating.

Jim Walker is vice president of product marketing at Cockroach Labs, creator of CockroachDB. During his career, Jim has brought multiple products to market in a variety of fields, including data loss prevention, master data management (MDM), Hadoop, predictive analytics, Kubernetes, and distributed SQL. He specializes in open source business models and focuses on accelerating the emergence of categories and broad developer movements.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2021 IDG Communications, Inc.