10 new tricks your old database can do

You might be surprised by the powerful “NoSQL” features lurking in your old Oracle, SQL Server, MySQL, or PostgreSQL database

10 new tricks your old database can do
Getty Images

In all the excitement about NoSQL databases, distributed databases, data warehouses, and GPU-accelerated databases these days, it’s easy to forget that traditional relational databases still store and serve the mission-critical information for many, if not most, line-of-business applications. Oracle Database, Microsoft SQL Server, MySQL/MariaDB, and PostgreSQL may trace their origins back to the 1980s, but they are all still under active development, and not just to fix bugs and tweak performance.

In this article we’ll discuss many of the ways that traditional SQL databases are improving. Some involve supporting different kinds of data, along with the indexes and search functions to use them. Others involve speeding up access to frequently used tables. Still others expand the databases beyond their own tables, beyond a single server, and beyond SQL queries.

Full-text search

Standard relational database indexes tend to use short fields or even hashes, for performance reasons. Full-text search is different. Full-text search is handled with a different kind of index, typically something like an inverted list of words and their record and offset locations. Often there’s a list of stop words that are ignored, and an algorithm for stemming to generate other forms of each word. Some full-text search engines support Boolean operations, fuzzy search, or proximity search.

Oracle Database, SQL Server, MySQL/MariaDB, and PostgreSQL all offer full-text search on specifically designated text fields, for example CHAR, VARCHAR, or TEXT columns that have FULLTEXT indexes (to use the MySQL terms). You can also use an external full-text search engine such as Elasticsearch or Solr to index and search text in a database.

JSON data

JSON (JavaScript Object Notation) came to prominence with the rise of JavaScript for the web, and became one of the standard data formats in NoSQL document databases. Not to be left out, many SQL databases added support for JSON, along with the additional syntax needed to search semi-structured and sometimes hierarchical JSON documents. Of course, each database added its own kind of JSON support.

For example, in SQL Server you can explicitly query JSON values from text fields, and you can convert JSON documents into tables. You can also add a constraint to text fields to guarantee that they are formatted as valid JSON. PostgreSQL, on the other hand, has an explicit JSON type as well as JSON functions and operators.

XML data

Like JSON, XML is a semi-structured data type. It was the prevalent data exchange format in the days when web services and AJAX web callbacks were invented, although it since has been replaced by JSON for many applications. XML schemas can impose some structure on XML documents; XPath uses path expressions to navigate in XML documents; and the XSLT transformation language has its uses, for example to generate webpages from XML data.

All of the relational databases we have been discussing support XML data to some extent. I’m not sure there’s much of a case for choosing XML rather than JSON for a greenfield application, but for a brownfield application that is built around XML documents the capability to store XML in relational tables can be handy.

Geospatial data

Geometric and geographic information might seem like they could fit into numeric types, but the kinds of queries you often run against spatial data requires an index that recognizes two-dimensional data, which is beyond the capabilities of a standard B-tree. For example, you may want to know the distance between two airports, the nearest neighbor airports to LAX, the nearest hotels to the location of your business meeting in Chicago, or whether a taxicab is inside the boundaries of Manhattan. For these queries you may want an R-tree, SP-GiST, quadtree, or UB-tree index.

All of the relational databases we have been discussing support spatial information and indexes, although some require an add-in, such as PostGIS for PostgreSQL. They generally conform to the Open Geospatial Consortium Simple Features specification and the SQL/MM Spatial ISO/IEC standard, although they often differ in the implementation details.

In-memory tables

With the exception of PostgreSQL, which does in-memory caching but not pure in-memory tables, all of the relational databases we have been discussing can create tables in memory. In-memory tables significantly increase the speed of working with the information in those tables, primarily in scenarios that are heavy in reads. The speed-up number sometimes mentioned is 30x, but that is an approximation that does not take into account the various kinds of disks that can be used for databases.

There are often limits that apply to in-memory tables. The first and foremost is the requirement for enough RAM to hold that table, which could have been used for caching and joins instead. Then there are limitations of the specific databases and engines.

The MySQL MEMORY storage engine is transient and non-durable (it disappears when the MySQL server halts or stops), lacks transactions, foreign keys, geographical types, and full-text search indexing, doesn’t perform well if it has a lot of writes, has table-level locking, and can’t be partitioned. The MySQL NDB Cluster engine removes some of these constraints, but it needs to run a special version of the MySQL daemon and needs to have several kinds of nodes active in the cluster.

SQL Server In-Memory OLTP uses memory-optimized tables, which are durable and support transactions, as well as non-durable tables, which are used for transient data, often instead of temp tables. It also uses natively compiled T-SQL modules to further reduce the time taken for an individual transaction by reducing the CPU cycles required to process the operations. SQL Server In-Memory OLTP tables can be combined with disk-based tables in queries and transactions.

Oracle Database In-Memory greatly improves performance for real-time analytics (OLAP) and mixed workloads (HTAP). The In-Memory Column Store (IM column store) is the key feature of Oracle Database In-Memory.

External data sources

Recent versions of SQL Server have a mechanism for querying data sources outside of the database’s own tables. You can create an EXTERNAL DATA SOURCE from Hadoop, blob storage, another relational database, or a shard map manager. Then you can use PolyBase or Elastic Database queries (Azure SQL Database v12+) against the external data source for data load and query.

PostgreSQL Foreign Data Wrappers allow PostgreSQL queries to run against a wide assortment of remote data sources. These range from other SQL databases, NoSQL databases, and big data platforms through to flat files. Foreign Data Wrappers follow the SQL/MED (SQL Management of External Data) standard.

Big data clusters

Starting with the SQL Server 2019 preview, Microsoft SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components run side by side to read, write, and process big data from Transact-SQL (via PolyBase) or Spark, allowing you to combine and analyze your high-value relational data with high-volume big data.

Oracle offers something similar in its Oracle Big Data Cloud Service and Oracle Big Data Appliance On Premises.

Read replicas

MySQL, MariaDB, and PostgreSQL all offer read replica capabilities. Read replicas increase read throughput directly, but can also increase write performance indirectly by reducing the read load on the read/write server.

MySQL/MariaDB supports multiple kinds of replicas: log-file and GTID-based; asynchronous, synchronous using in-memory NDB Clusters, semi-synchronous, and delayed; statement-based, row-based, and mixed-base. PostgreSQL has many different replication solutions.

Amazon Aurora implements its own read replica scheme for MySQL and PostgreSQL. Aurora allows up to 15 replicas, with less than 20 milliseconds of synchronization lag.

SQL Server supports a secondary read replica in an active-active cluster. Oracle Database supports basic, one-way read-only replication environments using read-only table snapshots, and Oracle Enterprise supports advanced replication features that extend the capabilities of basic read-only replication by allowing applications to update table replicas throughout a replicated database system.

Sharding

Sharding is a method for dividing data among servers. Vertical sharding involves distributing different tables on different servers, for example putting inventory on one server, orders on another, and aggregated tables for analysis on a third. That is only an issue when you need to join tables from multiple servers. Horizontal sharding involves dividing individual tables among servers, much like registration stations at a convention that are divided alphabetically. Manual horizontal sharding is a royal pain in the neck, but fortunately there are several options for automatic horizontal sharding.

Citus does automatic horizontal sharding of PostgreSQL. Citus also provides a coordinator node as a front-end to the worker nodes that hold the data.

Microsoft Azure SQL Database v12 supports an elastic database shard map for automatic horizontal sharding. It also supports cross-database queries for vertical sharding.

There are several ways to use Amazon RDS for sharding MySQL and the other supported databases. In general, you can use read replicas to create copies, then promote replicas to be new shards. In Aurora, there is a clone database command for this purpose. After cloning, you can delete the duplicated data that won’t be used on that shard, based on the partition key mapping table.

Stored procedures beyond SQL

Historically, every relational database had its own SQL extensions for stored procedures, such as Transact-SQL for SQL Server and PL/SQL for Oracle. In recent years, relational databases have been integrated with programming languages and machine learning capabilities. An early example of this was Java running on Oracle Database as an adjunct to PL/SQL. Since then we’ve seen C, C++, C#, Perl, PHP, Python, and R language database extensions.

The introduction of NoSQL databases in 2009 or so reflected the need for highly scalable and highly available databases for web applications, at the expense of SQL compatibility and strong consistency. In the last few years, the “classic” SQL databases have added many options to improve their scalability and availability without sacrificing consistency or compatibility.

Before you pass over your old SQL database for that “NoSQL” job, check the docs. You just might get the best of both worlds.

Copyright © 2019 IDG Communications, Inc.