CockroachDB review: Distributed SQL shifts into high gear

CockroachDB 19.2.2 impresses with lower latency, higher throughput, greater scalability, and new fully managed CockroachCloud service

CockroachDB review: Distributed SQL shifts into high gear
Peter Griffin (CC0)
At a Glance

When I reviewed CockroachDB early in 2018, I found that the distributed SQL database, built on top of a transactional and consistent key-value store, was designed to survive disk, machine, rack, and even data center failures with minimal latency disruption and no manual intervention. That is all still true.

At the time, CockroachDB had three large deficits, in my opinion: Limited optimization of SQL JOIN queries, no fully managed service, and no support for JSON or Protobuf data types. I’m happy to report that all of these lacks have since been remedied. JOINs now use a cost-based optimizer, the CockroachCloud is in beta, and a JSONB data type has been implemented.

What’s still missing in CockroachDB? Quite a bit, if you care about PostgreSQL compatibility:

  • Stored procedures and functions
  • Triggers
  • Events
  • User-defined functions
  • Full-text functions and indexes
  • Geospatial functions and indexes
  • Drop primary key
  • XML functions
  • Savepoints
  • Column-level privileges
  • CREATE TEMPORARY TABLE syntax
  • XA syntax

Most existing OLTP PostgreSQL applications can be ported to CockroachDB with some workarounds at the application level. If you have used geospatial features (PostGIS) or full-text search, however, I don’t know of a good way to implement them in the current version of CockroachDB.

There is a tracking issue for geospatial indexes and functions, but even though it has been open for several years, the status of geospatial features is only “potential.” There has been a user poll about desired geospatial use cases, but that’s not the same as promising the feature.

Full-text indexing is “planned,” but not yet on the roadmap. Several people have suggested integrating CockroachDB with Bleve to accomplish this. Again, no promises.

In June 2019, Cockroach changed its OSS license from APL-2 to an “extremely permissive version of the Business Source License (BSL).” This was basically in response to Amazon Web Services offering a forked version of ElasticSearch as a paid service, and allows Cockroach to offer its own database as a service without worrying about AWS or any other cloud vendor stealing its thunder.

CockroachCloud is a fully hosted and fully managed service created and owned by Cockroach Labs that claims to make deploying, scaling, and managing CockroachDB effortless. CockroachCloud currently runs on Amazon Web Services and the Google Cloud Platform.

CockroachDB installation and basic testing 

I installed CockroachDB 19.2.2 on my MacBook Pro using Homebrew. I first explicitly uninstalled the old version (1.1.3) I had left over from my initial review.

Homebrew is specific to Macs. It’s only one of five ways of installing CockroachDB on Macs, the others being to download the binary; use Kubernetes; use Docker; and build from source. Linux and Windows have fewer installation options.

martinheller@Martins-Retina-MacBook ~ % brew uninstall cockroach
Uninstalling /usr/local/Cellar/cockroach/1.1.3... (5 files, 72.9MB)

martinheller@Martins-Retina-MacBook ~ % brew install cockroachdb/tap/cockroach
==> Tapping cockroachdb/tap
Cloning into '/usr/local/Homebrew/Library/Taps/cockroachdb/homebrew-tap'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 6 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), done.
Tapped 1 formula (32 files, 45.6KB).
==> Installing cockroach from cockroachdb/tap
==> Downloading https://binaries.cockroachdb.com/cockroach-v19.2.2.darwin-10.9-a
######################################################################## 100.0%
==> /usr/local/Cellar/cockroach/19.2.2/bin/cockroach gen man --path=/usr/local/C
==> /usr/local/Cellar/cockroach/19.2.2/bin/cockroach gen autocomplete bash --out
==> /usr/local/Cellar/cockroach/19.2.2/bin/cockroach gen autocomplete zsh --out=
==> Caveats
For local development only, this formula ships a launchd configuration to
start a single-node cluster that stores its data under:
  /usr/local/var/cockroach/
Instead of the default port of 8080, the node serves its admin UI at:
  http://localhost:26256

Do NOT use this cluster to store data you care about; it runs in insecure
mode and may expose data publicly in e.g. a DNS rebinding attack. To run
CockroachDB securely, please see:
  https://www.cockroachlabs.com/docs/secure-a-cluster.html

Bash completion has been installed to:
  /usr/local/etc/bash_completion.d

zsh completions have been installed to:
  /usr/local/share/zsh/site-functions

To have launchd start cockroachdb/tap/cockroach now and restart at login:
  brew services start cockroachdb/tap/cockroach
Or, if you don't want/need a background service you can just run:
  cockroach start --insecure
==> Summary
🍺  /usr/local/Cellar/cockroach/19.2.2: 114 files, 143.3MB, built in 12 seconds
==> `brew cleanup` has not been run in 30 days, running now...
Removing: /Users/martinheller/Library/Caches/Homebrew/fauna-shell--0.9.8.catalina.bottle.tar.gz... (4MB)
Removing: /Users/martinheller/Library/Caches/Homebrew/node--12.12.0.catalina.bottle.tar.gz... (14.8MB)
Pruned 18 symbolic links from /usr/local

martinheller@Martins-Retina-MacBook ~ % cockroach start-single-node --insecure
*
* WARNING: RUNNING IN INSECURE MODE!
*
* - Your cluster is open for any client that can access <all your IP addresses>.
* - Any user, even root, can log in without providing a password.
* - Any user, connecting as root, can read or write any data in your cluster.
* - There is no network encryption nor authentication, and thus no confidentiality.
*
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.2/secure-a-cluster.html
*
*
* WARNING: neither --listen-addr nor --advertise-addr was specified.
* The server will advertise "Martins-Retina-MacBook.local" to other nodes, is this routable?
*
* Consider using:
* - for local-only servers:  --listen-addr=localhost
* - for multi-node clusters: --advertise-addr=<host/IP addr>
*
*
*
* INFO: Replication was disabled for this cluster.
* When/if adding nodes in the future, update zone configurations to increase the replication factor.
*
CockroachDB node starting at 2019-12-30 16:30:35.369965 +0000 UTC (took 0.6s)
build:               CCL v19.2.2 @ 2019/12/11 01:27:47 (go1.12.12)
webui:               http://Martins-Retina-MacBook.local:8080
sql:                 postgresql://root@Martins-Retina-MacBook.local:26257sslmode=disable
RPC client flags:    cockroach <client cmd> --host=Martins-Retina-MacBook.local:26257 --insecure
logs:                /Users/martinheller/cockroach-data/logs
temp dir:            /Users/martinheller/cockroach-data/cockroach-temp884406444
external I/O path:   /Users/martinheller/cockroach-data/extern
store[0]:            path=/Users/martinheller/cockroach-data
status:              initialized new cluster
clusterID:           9f7141f8-d53d-49e3-9a5a-264de8cfa626
nodeID:              1

At this point I was able to open the web UI link shown above and see the web-based management interface shown in the screenshot below.

cockroachdb local cluster overview 01 IDG

Even local CockroachDB clusters offer a web-based management interface. Here we see the cluster overview for the single-node cluster I had just created.

To smoke-test the installation I followed the first exercise in Cockroach University in another Terminal tab, as shown below. I found the tutorial good, albeit presented in short videos rather than text, and geared for beginners rather than experienced DBAs or developers. The hands-on part starts by using the workload tool to create a small database, movr, then continues in the CockroachDB SQL shell.

martinheller@Martins-Retina-MacBook ~ %  cockroach workload init movr
I191230 16:55:34.351650 1 workload/workloadsql/dataload.go:135  imported users (0s, 50 rows)
I191230 16:55:34.356751 1 workload/workloadsql/dataload.go:135  imported vehicles (0s, 15 rows)
I191230 16:55:34.382023 1 workload/workloadsql/dataload.go:135  imported rides (0s, 500 rows)
I191230 16:55:34.404733 1 workload/workloadsql/dataload.go:135  imported vehicle_location_histories (0s, 1000 rows)
I191230 16:55:34.429203 1 workload/workloadsql/dataload.go:135  imported promo_codes (0s, 1000 rows)

martinheller@Martins-Retina-MacBook ~ % cockroach sql --insecure
#
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: \q.
#
# Server version: CockroachDB CCL v19.2.2 (x86_64-apple-darwin14, built 2019/12/11 01:27:47, go1.12.12) (same version as client)
# Cluster ID: 9f7141f8-d53d-49e3-9a5a-264de8cfa626
#

# Enter \? for a brief introduction.
#
root@:26257/defaultdb> SHOW databases;
  database_name 
+---------------+
  defaultdb     
  movr          
  postgres      
  system        
(4 rows)

Time: 2.028ms

root@:26257/defaultdb> SHOW TABLES FROM movr;
          table_name          
+----------------------------+
  promo_codes                
  rides                      
  user_promo_codes           
  users                      
  vehicle_location_histories 
  vehicles                   
(6 rows)

Time: 2.863ms

root@:26257/defaultdb> SELECT * FROM movr.users LIMIT 10;
                   id                  |   city    |        name         |            address            | credit_card 
+--------------------------------------+-----------+---------------------+-------------------------------+-------------+
  ae147ae1-47ae-4800-8000-000000000022 | amsterdam | Tyler Dalton        | 88194 Angela Gardens Suite 94 | 4443538758  
  b3333333-3333-4000-8000-000000000023 | amsterdam | Dillon Martin       | 29590 Butler Plain Apt. 25    | 3750897994  
  b851eb85-1eb8-4000-8000-000000000024 | amsterdam | Deborah Carson      | 32768 Eric Divide Suite 88    | 8107478823  
  bd70a3d7-0a3d-4000-8000-000000000025 | amsterdam | David Stanton       | 80015 Mark Views Suite 96     | 3471210499  
  c28f5c28-f5c2-4000-8000-000000000026 | amsterdam | Maria Weber         | 14729 Karen Radial            | 5844236997  
  1eb851eb-851e-4800-8000-000000000006 | boston    | Brian Campbell      | 92025 Yang Village            | 9016427332  
  23d70a3d-70a3-4800-8000-000000000007 | boston    | Carl Mcguire        | 60124 Palmer Mews Apt. 49     | 4566257702  
  28f5c28f-5c28-4600-8000-000000000008 | boston    | Jennifer Sanders    | 19121 Padilla Brooks Apt. 12  | 1350968125  
  2e147ae1-47ae-4400-8000-000000000009 | boston    | Cindy Medina        | 31118 Allen Gateway Apt. 60   | 6464362441  
  33333333-3333-4400-8000-00000000000a | boston    | Daniel Hernandez MD | 51438 Janet Valleys           | 0904722368  
(10 rows)

Time: 2.977ms

The tutorial goes on from there to teach more about CockroachDB, including the fundamentals of CockroachDB clusters and the basics of running SQL at scale.

CockroachCloud

Spinning up one CockroachDB node is fairly simple, as we just saw. Spinning up a cluster of three or more nodes takes a little more effort and planning, especially if you create multi-region clusters and need to tune the table topology. If you need help, the Cockroach sales engineers are happy to pitch in.

On the other hand, creating a cluster in the CockroachCloud is a matter of filling out a web form, as shown in the screenshot below. Currently you can only create a single-region cluster from this self-service interface; if you need multi-region clusters, larger nodes, or more than 24 nodes per cluster, you need to contact Cockroach Labs support to provision them for you in the CockroachCloud.

cockroachcloud create cluster 02 IDG

Creating a cluster in the CockroachCloud is a point and click exercise. The AWS and GCP regions available are ones that support Kubernetes and are known to work properly. At the moment, GCP instances are less expensive than equivalent AWS instances.

CockroachCloud clusters are as isolated and secure as possible. They are single tenants, each in its own sub-account and VPC, and the VPCs are firewalled from each other and from any other outside connection, unless whitelisted for SQL and web UI ports. All connections to the cluster over the internet use TLS 1.2.

Note that Cockroach Labs does not currently support private clouds. They have plans to do so in the future, however.

As a rule of thumb, Cockroach Labs estimates that each vCPU can handle around 1000 TPS. The estimates given on the cluster creation page as you do the provisioning are probably more accurate, but are given in different units (IOPS rather than TPS). Currently a 2-vCPU node on GCP is estimated at 1800 IOPS, and a 2-vCPU node on AWS is estimated at 600 IOPS.

CockroachDB performance improvements

When I looked at CockroachDB 1.1.3 at the beginning of 2018, its SQL JOIN implementation was limited to hash joins and a heuristic planner; its query performance often scaled linearly but was nothing like the state of the art — it was closer to the performance of SQLite. By November 2018, CockroachDB 2.1 had a cost-based query optimizer that was competitive with PostgreSQL for JOIN performance. As of version 19.2, after another year of development (and a switch to calendar versioning), all SQL queries use the cost-based optimizer, even DDL statements and window functions. In support of the cost-based optimizer, CockroachDB generates table statistics automatically.

CockroachDB’s new optimized atomic commit protocol cuts the commit latency of a transaction in half, from two rounds of consensus down to one. Combined with transaction pipelining, parallel commits bring the latency incurred by common OLTP transactions to near the theoretical minimum: the sum of all read latencies plus one round of consensus latency.

At a Glance
  • CockroachDB is scalable, distributed, portable OLTP SQL database with strict serializability, available in free open source and enterprise editions, and also as a fully managed cloud service.

    Pros

    • Strongly consistent OLTP across geographically distributed nodes
    • Horizontally scalable for writes as well as reads
    • Free open source CockroachDB Core version
    • Does not require GPS or atomic clocks
    • Available on most clouds and platforms
    • Scales to 100,000 data warehouses on TPC-C with near-perfect throughput

    Cons

    • Lacks stored procedures, triggers, events, and UDFs
    • Lacks full-text and geospatial functions and indexes
1 2 Page 1
Page 1 of 2