clock menu more-arrow no yes mobile

Filed under:

Why can’t anyone agree on how dangerous AI will be?

Researchers tried to get AI optimists and pessimists on the same page. It didn’t quite work.

An illustration of a head shown in profile, with cogs and wires overlayed and extending from it. Javier Zarracina/Vox
Dylan Matthews is a senior correspondent and head writer for Vox's Future Perfect section and has worked at Vox since 2014. He is particularly interested in global health and pandemic prevention, anti-poverty efforts, economic policy and theory, and conflicts about the right way to do philanthropy.

I’ve written a lot about AI and the debate over whether it could kill us all. But I still don’t really know where I come down.

There are people who deeply understand advanced machine learning systems who think they will prove increasingly uncontrollable, possibly “go rogue,” and threaten humanity with catastrophe or even extinction. There are other people who deeply understand how these systems work who think that we’re perfectly able to control them, that their dangers do not include human extinction, and that the first group is full of hysterical alarmists.

How do we tell who’s right? I sure don’t know.

But a clever new study from the Forecasting Research Institute tries to find out. The authors (Josh Rosenberg, Ezra Karger, Avital Morris, Molly Hickman, Rose Hadshar, Zachary Jacobs, and forecasting godfather Philip Tetlock) had previously asked both experts on AI and other existential risks, and “superforecasters” with a demonstrated track record of successfully predicting world events in the near term, to assess the danger that AI poses.

The result? The two groups disagreed a lot. The experts in the study were in general much more nervous than the superforecasters and put far higher odds on disaster.

The researchers wanted to know why these groups disagreed so profoundly. So the authors set up an “adversarial collaboration”: They had the two groups spend many hours (a median of 31 hours for the experts, 80 hours for the superforecasters) reading new materials and, most importantly, discussing these issues with people of the opposite view with a moderator. The idea was to see if exposing each group to more information, and to the best arguments of the other group, would get either to change their minds.

The researchers were also curious to find “cruxes”: issues that help explain people’s beliefs, and on which new information might change their mind. One of the biggest cruxes, for example, was “Will METR [an AI evaluator] or a similar organization find evidence of AI having the ability to autonomously replicate, acquire resources, and avoid shutdown before 2030?” If the answer turns out to be “yes,” skeptics (the superforecasters) say they will become more worried about AI risks. If the answer turns out to be “no,” AI pessimists (the experts) say they will become more sanguine.

So did everyone just converge on the correct answer? … No. Things were not destined to be that easy.

The AI pessimists adjusted their odds of a catastrophe before the year 2100 down from 25 to 20 percent; the optimists moved theirs up from 0.1 to 0.12 percent. Both groups stayed close to where they started.

But the report is fascinating nonetheless. It’s a rare attempt to bring together smart, well-informed people who disagree. While doing so didn’t resolve that disagreement, it shed a lot of light on where those points of division came from.

Why people disagree about AI’s dangers

The paper focuses on disagreement around AI’s potential to either wipe humanity out or cause an “unrecoverable collapse,” in which the human population shrinks to under 1 million for a million or more years, or global GDP falls to under $1 trillion (less than 1 percent of its current value) for a million years or more. At the risk of being crude, I think we can summarize these scenarios as “extinction or, at best, hell on earth.”

There are, of course, a number of other different risks from AI worth worrying about, many of which we already face today.

Existing AI systems sometimes exhibit worrying racial and gender biases; they can be unreliable in ways that cause problems when we rely upon them anyway; they can be used to bad ends, like creating fake news clips to fool the public or making pornography with the faces of unconsenting people.

But these harms, while surely bad, obviously pale in comparison to “losing control of the AIs such that everyone dies.” The researchers chose to focus on the extreme, existential scenarios.

So why do people disagree on the chances of these scenarios coming true? It’s not due to differences in access to information, or a lack of exposure to differing viewpoints. If it were, the adversarial collaboration, which consisted of massive exposure to new information and contrary opinions, would have moved people’s beliefs more dramatically.

Nor, interestingly, was much of the disagreement explainable by different beliefs about what will happen with AI in the next few years. When the researchers paired up optimists and pessimists and compared their odds of catastrophe, their average gap in odds was 22.7 percentage points. The most informative “crux” (an AI evaluator finding that a model had highly dangerous abilities before 2030) only reduced that gap by 1.2 percentage points.

Short-term timelines are not nothing, but it’s just not where the main disagreements are.

What did seem to matter were different views on the long-term future. The AI optimists generally thought that human-level AI will take longer to build than the pessimists believed. As one optimist told the researchers, “language models are just that: models of language, not digital hyperhumanoid Machiavellis working to their own end”; this optimist thought that fundamental breakthroughs in methods of machine learning are necessary to reach human-level intelligence.

Many cited the need for robotics to reach human levels, not just software AI, and argued that doing so would be much harder. It’s one thing to write code and text in a laptop; it’s quite another to, as a machine, learn how to flip a pancake or clean a tile floor or any of the many other physical tasks at which humans now outperform robots.

When disputes go deep

The most interesting source of splits the researchers identified was what they call “fundamental worldview disagreements.” That’s a fancy way of saying that they disagree about where the burden of proof lies in this debate.

“Both groups agree that ‘extraordinary claims require extraordinary evidence,’ but they disagree about which claims are extraordinary,” the researchers summarize. “Is it extraordinary to believe that AI will kill all of humanity when humanity has been around for hundreds of thousands of years, or is it extraordinary to believe that humanity would continue to survive alongside smarter-than-human AI?”

It’s a fair question! My experience is that most laypeople outside AI find “the machines will kill us all” to be the more extraordinary claim. But I can see where the pessimists are coming from. Their basic view is that the emergence of superhuman AI is like the arrival on earth of a superhuman alien species. We don’t know if that species would want to kill us all.

But Homo sapiens didn’t necessarily want to kill all the Homo erectus or Neanderthals hundreds of thousands of years ago, when multiple intelligent species of large apes were coexisting. We did, however, kill them all off.

Extinction tends to happen to dumber, weaker species when a smarter species that’s better at claiming resources for itself emerges. If you have this worldview, the burden of proof is on optimists to show why super-intelligent AI wouldn’t result in catastrophe. Or, as one pessimist in the study put it: “there are loads of ways this could go and very few of them leave humans alive.”

This is not the most encouraging conclusion for the study to arrive at. A disagreement driven by concrete differences of opinion about what will happen in the next few years is an easier disagreement to resolve — one based on how the next few years proceed rather than on deep, hard-to-change differences in people’s assumptions about how the world works, and about where the burden of proof should fall.

The paper reminded me of an interview I saw a long time ago with the late philosopher Hilary Putnam. Working in the late 20th century, Putnam was a believer that philosophy could make progress, even if the big questions — What is truth? How does the mind work? Is there an external reality we can grasp? — feel as hard to answer as ever.

Sure, we don’t know those answers, Putnam said. But we know more about the questions. “We learn more about how difficult they are, and why they’re so difficult. That is maybe the lasting philosophical progress.”

That’s how I felt reading the Forecasting Research Institute paper. I don’t feel like I know for sure how worried to be about AI. But I do feel like I know more about why this is a hard question.

A version of this story originally appeared in the Future Perfect newsletter. Sign up here!

Sign up for the newsletter Today, Explained

Understand the world with a daily explainer plus the most compelling stories of the day.