Open Compute Project and Azure: hardware meets software

How the world of open hardware design interacts with Azure’s underlying infrastructure

Open Compute Project and Azure: hardware meets software
Getty Images

Microsoft has been a member of the Open Compute Project since 2014, donating many of the specifications for its Azure data centers to the project. It’s where Microsoft develops its Olympus servers and its Sonic networking software. So it’s always fascinating to go along to the annual OCP Summit to see what’s happening in the world of open hardware design, and to see what aspects of Azure’s underlying infrastructure are being exposed to the world. Here’s what I found this year.

Introducing Project Zipline

Public clouds like Azure have a very different set of problems from most on-premises systems. They have to move terabytes of data around their networks, without hindering system performance. As more and more users use their services, they have to move more data in the network on links that don’t support higher-bandwidth connections. That’s a big problem, with three possible solutions:

  • Microsoft could spend millions of dollars on putting new connectivity into its data centers.
  • It could take a performance hit on its services.
  • It could use software to solve the problem.

With the resources of Microsoft Research and Azure, Microsoft made the obvious choice: It came up with a new compression algorithm, Project Zipline. Currently in use in Azure, Project Zipline offers twice the compression ratios of the commonly used Zlib-L4 64KB algorithm. That’s a significant boost, almost doubling bandwidth and storage capacity for little or no capital cost. Having proved its worth on its own network and in its own hardware, Microsoft is donating the Zipline algorithm to OCP for anyone to implement and use.

But Project Zipline is more than software. To work at the speed it has to work, it needs to be implemented as hardware. Kushagra Vaid, general manager of Azure Hardware Infrastructure, gave me details about Zipline and how it works. The project began by analyzing many internal data sets from across Azure, using data from a mix of workloads. Although the data was different, the underlying binary had similar patterns, letting Microsoft develop a common compression algorithm that could work across not only static data but also streamed data.

By implementing Zipline’s pattern-matching in hardware, Project Zipline can match more than 64,000 block patterns in real time. Running in software, can handle only around 1,000. With more patterns that can be recognized and replaced by dictionary pointers, the resulting compression is faster and more efficient.

Sharing the Verilog

Once an algorithm has been implemented in hardware, it’s harder to share with organizations like OCP. But Microsoft is taking an interesting new approach with Zipline. Instead of publishing code, it’s sharing Verilog RTL files of the Azure Zipline implementation. Using those files, anyone can implement Zipline in silicon, either as an extension to existing networking hardware or in FPGAs like Azure’s Project Brainwave accelerators.

“Perhaps deeper buffers could improve speed or compression, or alternative layouts could speed up stream processing,” Vaid says. “More eyes, more ideas, more innovation.” He’s already thinking about where Zipline might be implemented, “It may turn up where data is stored, in archival systems, in accelerators in servers, even in the network or the storage fabric. Where it goes it will free up CPU cycles.”

That last point is an important one: Modern cloud architectures are moving from their original homogenous designs to something that’s much more heterogenous. Instead of doing everything in software, custom silicon is increasingly important. A related OCP announcement, from the Facebook infrastructure team, was a common module design for accelerators, letting standard motherboards plug in different accelerators as required. Clouds like Azure could reduce costs by have standard compute motherboards, plugging in specific accelerators for machine learning, for cryptography, and, of course, for data compression.

Securing cloud hardware with Cerberus

One of the more interesting OCP projects from Microsoft is Cerberus, a distributed security system that adds hard roots of trust (much like those used in the Azure Sphere secure IoT platform) to the various devices that make up a modern cloud data center. With Cerberus, Microsoft can ensure that only trusted hardware is installed in its data centers, and, more important, it can ensure that that hardware hasn’t been tampered with in its supply chain.

Microsoft has already implemented Cerberus in its OCP Denali SSD sticks, and it’s working with other OCP members including Intel to bring it to other components of cloud data centers. Vaid notes that there’s a cross-over with modern systems management techniques, because Cerberus can be policy-driven. The device holds its private key, which is automatically generated. All that’s shared with the world is the public key, published at the point that the silicon is manufactured.

Project Olympus for scale machine learning

Olympus got its time on stage as well, but much of the development showcased at OCP wasn’t from Microsoft. Instead it was from the companies that build the many thousands of white box servers that go into each Azure data center. Flextronics had some of its slimline Olympus systems on display on the Microsoft stand, but perhaps the most interesting development came from Inspur.

Probably the largest computer hardware company you’ve never heard of, Inspur is ranked in the Top 3 server companies by IDC, shipping public cloud hardware to some of the largest services. It’s developing compute hardware for AI workloads, based on a four-socket Olympus system with an attached 16-GPU box (what Inspur calls a JBOG: just a bunch of GPUs). Intended to handle deep learning workloads, it has 80 CPU cores and a significant amount of memory. While the initial twin-chassis system is powerful, things get very interesting when you interconnect two twin-chassis systems to deliver 160 cores and 32 GPUs.

Platforms like this are the future of the modern cloud, delivering the compute that’s needed for demanding deep learning workloads. By moving the GPU accelerators off the main board into their own chassis, you get a more flexible implementation that can balance CPU and GPU for the model that’s being trained, with the option of reconfiguring unused resources to handle a large number of less complex inference tasks, like those in Azure’s Cognitive Services.

One thing is clear from Microsoft’s work in OCP, that the boundaries between hardware and software in Azure are blurring. With technologies like Project Zipline, things that we saw as hardware are now software, things that were software are now hardware. By adding FPGAs and dedicated accelerators to Olympus-based commodity hardware, Azure is delivering servers that can support increasingly complex workloads economically.

Copyright © 2019 IDG Communications, Inc.