Focusing on the wrong open source issues

Yes, a tiny number of companies have relicensed their open source code. Let’s worry about actual problems, like security and megacompanies that contribute almost nothing.

Focusing on the wrong open source issues
alphaspirit.it/Shutterstock

If you follow open source topics on X/Twitter, you can be forgiven for believing the biggest issue in open source today is companies relicensing their open source code under different licenses. Thierry Carrez, the vice chairperson of the OSI, for example, recently issued a dire warning: “single vendor is the new proprietary.” Sounds terrible, right? I mean, once you forget that the vast majority of software that you and I use every day on our phones, laptops, servers, etc., is proprietary. (Yes, with plenty of open source buried inside and effectively “relicensed.”)

Here’s just a tiny bit of data that makes these concerns seem silly: Of the 10,000-plus companies that participate in Linux Foundation projects (and open source more generally), there have been exactly 14 single-vendor relicensing events. Yes, 14. And of those 14, despite all the digital ink we spill talking about the critical need to fork to maintain freedom, only three have been forked. Again, that’s 14 projects/repositories out of the 162 million that GitHub reports.

In other words, we’re fixating on very few edge cases when there are significant, foundational issues in open source that need fixing.

A question of trust

Bad actors are striking at the very nature of how open source works. The wonderful thing about open source is that anyone can participate, but that can also be a weakness. As we saw recently with the XZ Utils exploit, and again more recently with a similar attack, sophisticated bad actors (perhaps backed by nation-states) are using the standard open source contribution process to infiltrate relatively obscure but widely used projects.

Such social engineering tactics are hard to detect, given the nearly infinite attack surface that open source offers and the sophisticated nature of more recent attacks (which emerge at runtime). Of course, the open nature means spotting the problems and fixing them may be easier than in proprietary software. But with developers including open source code in close to 100% of all software, including proprietary, spotting all problems becomes a serious game of Whac-A-Mole.

The Linux Foundation and others are already working on ways to introduce new ways to deepen trust in the open source process. Their ideas should work for recent attempts to exploit open source packages, where contributions were proposed by newcomers to the project under suspicious circumstances. But would it have stopped the XZ Utils exploit, which happened over the course of years? That seems less likely.

Attempts to improve open source processes are also complicated by the nature of most open source software: It’s not written by a single vendor or even by a community of vendors. It’s written by a solo developer in her free time. Given those realities, what can be done? According to Jack Cable and Aeva Black, both at the United States Cybersecurity and Infrastructure Security Agency (CISA), it comes down to vendors doing what some of us have been advocating for years. As they argue, “Every technology manufacturer that profits from open source software must do their part by being responsible consumers of and sustainable contributors to the open source packages they depend on.”

I’d add that perhaps we should start at the top, with the vendors that make the most from open source yet sometimes give the least. Yes, trillion-dollar cloud companies make tens of billions off open source but can hardly muster tens of thousands of lines of code for any given project. Want open source security to improve overnight? Hold vendors accountable for giving back, as CISA suggests.

Making open source AI accessible

Another massive issue: artificial intelligence. Or, rather, the difficulty of applying open source to AI. I won’t go into the details here as I’ve already done that at length (see here or here), but there’s also the problem of accessibility in AI. By one estimate, it cost OpenAI $78 million to train GPT-4, and Google spent $191 million to train its Gemini Ultra model. These aren’t the only large language models, of course; there are many, including “open source” AI models (in air quotes because even by the OSI’s acknowledgement, it’s not yet settled what open source means in AI). It’s still up for debate whether code is truly open if only the very richest companies can afford to use it.

This isn’t a new problem, of course. The exact same issue plagues the cloud. Nearly 20 years ago I asked open source execs at Google and Yahoo! why they didn’t contribute more code. They rightly took umbrage. Both companies were among the leaders in open source contributions, but one of them also said, in effect, “Even if we open sourced our infra you couldn’t use it because you lack the resources to do anything with it.”

In cloud we’ve uncovered ways to get around this (Kubernetes, for example), and hopefully, we’ll see something similar with AI. Until we do, however, open source in AI might be like putting code in a museum; you can look, but there’s no practical way to touch the code or use it.

Back to my central premise. We can choose to spend our time wringing our hands over an infinitesimal number of open source projects relicensing to better enable themselves to invest in security and continued innovation (Disclosure: I work for one of these), and we can make hand-wavey statements about “open source AI” without defining it or making it useful for rank-and-file developers (and/or their employers). Or we can do the harder, more important work of figuring out how to make open source work more securely for everyone on the planet and ensure AI isn’t just for the richest companies. This harder work will pay real societal dividends. The former will—at most—get you kudos on X/Twitter.

Copyright © 2024 IDG Communications, Inc.