Python is devouring data science

Someone once said that Python’s data science training wheels would increasingly lead to the R language. Boy, was he wrong.

Python is devouring data science
Thinkstock

Back in 2015 I wrote that “Python’s data science training wheels increasingly lead to the R language,” suggesting that the more serious companies get about data science, the more they’ll want the heft of R. Boy, that perspective hasn’t aged well.

In fact, as a recent Terence Shin analysis of more than 15,000 data scientist job postings suggests, Python adoption keeps growing even as the more specialist R language is in decline. This isn’t to suggest that data scientists will drop R anytime soon. More likely, we’ll continue to see both Python and R used for their respective strengths.

Even so, if Nick Elprin is correct and “2021 is the year in which [data science] will become an enterprise-wide capability that impacts every line of business and functional department,” then the language most likely to dominate is the one that is most accessible to the broadest population within the enterprise.

Game. Set. Python.

Fueling the data science boom

The technologies and skills topping the data science charts in 2021 should look familiar:

python data science 01 Terence Shin

After all, they’re quite similar to what we saw in 2019, as detailed by Jeff Hale:

python data science 02 Jeff Hale

Yet there are some trends that appear if you squint a bit at the charts. As Shin calls out:

  • There is a huge increase in skills related to the cloud.
  • Similarly there is also a large increase in skills related to deep learning, like PyTorch and TensorFlow.
  • SQL and Python continue to grow in importance, while R remains stagnant.
  • Apache products, like Hadoop, Hive, and Spark, continue to decline in importance.

Easy does it

Dig a bit deeper, and the technologies/skills that seem to be growing fastest are those that are easiest to learn. Hence, while TensorFlow and PyTorch both saw growth, PyTorch’s growth significantly outpaced TensorFlow, for reasons I’ve outlined before. PyTorch’s popularity is starting to play out in the projects themselves, too, with cumulative PyTorch contributors set to exceed the number of TensorFlow contributors in the near future (whereas the number of contributors to PyTorch over the last 12 months already surpasses that of TensorFlow).

A few years back Redmonk analyst James Governor decreed that “convenience is the killer app” where developers are concerned. From MongoDB to Fastly to GatsbyJS, our go-to defaults across a wide range of technologies are those that enable developers to become productive faster.

Which brings us back to Python. And R.

R remains highly relevant in data science, something that we shouldn’t expect to change in the near future. Yet we’ve seen far more data scientists switch from R to Python than vice versa (twice as many, in fact). Reasons include better usability, performance, ecosystem, and more for Python, argues Emmett Boudreau. R remains broadly used for statistical computing, but as more and more companies (and their developers and data scientists) embrace data science from a technical, not scientific, standpoint, Python will continue to soar.

Copyright © 2021 IDG Communications, Inc.