What’s new for Python in 2025?

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Python 3.14 was released on 7th October 2025. Here we summarise some of the more interesting changes and some trends in Python development and data-science over the past year. We will highlight the following:

  • the colourful Python command-line interface;
  • project-management tool uv;
  • free-threading;
  • and a brief summary of other developments.

The Python 3.14 release notes also describe the changes to base Python.

Colourful REPL

At Jumping Rivers we have taught a lot of people to program in Python. Throughout a programming career you get used to making, and learning from, mistakes. The most common mistakes made in introductory programming lessons may still trip you up in 10 years time: unmatched parentheses, typos, missing quote symbols, unimported dependencies.

Our Python training courses are presented using Jupyter. Jupyter notebooks have syntax highlighting that makes it easy to identify an unfinished string, or a mis-spelled keyword.

But, most Python learners don’t use Jupyter (or other high-level programming tools) on day one – they experiment with Python at the command line. You can type “python” into your shell/terminal window and start programming into the “REPL” (read-evaluate-print loop).

Any effort to make the REPL easier to work with will be beneficial to beginning programmers. So the introduction of syntax highlighting in the Python 3.14 REPL is really beneficial.

uv and package development

One of the big trends in Python development within 2025, is the rise of the project management tool uv. This is a Rust-based command-line tool and can be used to initialise a package / project structure, to specify the development and runtime environment of a project, and to publish a package to PyPI.

At Jumping Rivers, we have used poetry for many of the jobs that uv excels at. Python is used for the data preparation tasks for diffify.com, and we use poetry to ensure that our developers each use precisely the same package versions when working on that project (See our current blog series on Poetry). But, poetry doesn’t prevent developers using different versions of Python. For that, we need a second tool like pyenv (which allows switching between different Python versions) or for each developer to have the same Python version installed on their machine.

uv goes a step further than poetry and allows us to pin Python versions for a project. Let’s use uv to install Python 3.14, so that we can test out features in the new release.

First follow the instructions for installing uv.

Then at the command line, we will use uv to create a new project where we’ll use Python 3.14.

# [bash]
cd ~/temp
mkdir blog-py3.14
cd blog-py3.14

# Which versions of Python 3.14 are available via uv?
uv python list | grep 3.14
# cpython-3.14.0rc2-linux-x86_64-gnu <download available>
# cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu <download available>

You’ll see something similar regardless of the operating system that you use. That lists two versions of Python 3.14 – one with an optional system called “Free Threading” (see later). We’ll install both versions of Python:

uv python install cpython-3.14.0rc2-linux-x86_64-gnu
uv python install cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu

Users of pyenv will be able to install Python 3.14 in a similar manner.

We can select between the two different Python versions at the command line. First using the version that does not have free threading:

uv run --python=3.14 python
# Python 3.14.0rc2 (main, Aug 18 2025, 19:19:22) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# True

Then using the version with free threading (note the t suffix)

uv run --python=3.14t python
# ...
# Python 3.14.0rc2 free-threading build (main, Aug 18 2025, 19:19:12) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# False

Project creation and management with uv

uv is capable of much more than allowing us to switch between different versions of Python. The following commands initialise a Python project with uv:

# From ~/temp/blog-py3.14

# Indicate the default python version for the project
uv python pin 3.14

# Initialise a project in the current directory
uv init .

# Check the Python version
uv run python --version
# Python 3.14.0rc2

This adds some files for project metadata (pyproject.toml, README.md) and version control:

tree -a -L 1
# .
# ├── .git
# ├── .gitignore
# ├── main.py
# ├── pyproject.toml
# ├── .python-version
# ├── README.md
# ├── uv.lock
# └── .venv
#
# 2 directories, 6 files

Now we can add package dependencies using uv add <packageName> and other standard project-management tasks. But one thing I wanted to highlight is that uv allows us to start a Jupyter notebook, using the project’s Python interpreter, without either adding jupyter as a dependency or explicitly defining a kernel for jupyter:

uv run --with jupyter jupyter lab

Creating a new notebook using the default Python 3 kernel in the JupyterLab session that starts, should ensure you are using the currently active Python 3.14 environment.

Threading

Python 3.13 introduced an experimental feature, ‘Free-threading’, that is now officially supported as of 3.14.

First though, what is a ’thread’? When a program runs on your computer, there are lots of different tasks going on. Some of those tasks could run independently of each other. You, as the programmer, may need to explain to the computer which tasks can run independently. A thread is a way of cordoning-off one of those tasks; it’s a way of telling the computer that your software is running on, that this task here can run separately from those tasks there, and the logic for running this task too. (Basically).

Python has allowed developers to define threads for a while. If you have a few tasks that are largely independent of each other, each of these tasks can run in a separate thread. Threads can access the same memory space, meaning that they can access and modify shared variables in a Python session. In general, this also means that a computation in one thread could update a value that is used by another thread, or that two different threads could make conflicting updates to the same variable. This freedom can lead to bugs. The CPython interpreter was originally written with a locking mechanism (the Global Interpreter Lock, GIL) that prevented different threads from running at the same time (even when multiple processors were available) and limited the reach of these bugs.

Traditionally, you would have used threads for “non-CPU-bound tasks” in Python. These are the kinds of tasks that would be unaffected by having more, or faster, processors available to the Python instance: network traffic, file access, waiting for user input. For CPU-bound tasks, like calculations and data-processing, you could use Python’s ‘multiprocessing’ library (although some libraries like ‘numpy’ have their own low-level mechanisms for splitting work across cores). This starts multiple Python instances, each doing a portion of the processing, and allows a workload to be partitioned across multiple processors.

The main other differences between threading and multiprocessing in Python are in memory and data management. With threading, you have one Python instance, with each thread having access to the same memory space. With multiprocessing, you have multiple Python instances that work independently: the instances do not share memory, so to partition a workload using multiprocessing, Python has to send copies of (subsets of) your data to the new instances. This could mean that you need to store two or more copies of a large dataset in memory when using multiprocessing upon it.

Simultaneous processing across threads that share memory-space is now possible using the free-threaded build of Python. Many third-party packages have been rewritten to accommodate this new build and you can learn more about free-threading and the progress of the changes in the “Python Free-Threading Guide”.

As a simple-ish example, lets consider natural language processing. There is a wonderful blog post about parallel processing with the nltk package on the “WZB Data Science Blog”. We will extend that example to use free-threading.

ntlk provides access to some of the Project Gutenberg books, and we can access this data as follows:

# main.py
import nltk

def setup():
 nltk.download("gutenberg")
 nltk.download("punkt_tab")
 nltk.download('averaged_perceptron_tagger_eng')
 corpus = { f_id: nltk.corpus.gutenberg.raw(f_id)
 for f_id in nltk.corpus.gutenberg.fileids()
 }
 return corpus

corpus = setup()

The key-value pairs in corpus are the abbreviated book-title and contents for 18 books. For example:

corpus["austen-emma.txt"]
# [Emma by Jane Austen 1816]
#
# VOLUME I
#
# CHAPTER I
#
#
# Emma Woodhouse, handsome, clever, and rich, with a comfortable home ...

A standard part of a text-processing workflow is to tokenise and tag the “parts-of-speech” (POS) in a document. We can do this using two nltk functions:

# main.py ... continued
def tokenise_and_pos_tag(doc):
 return nltk.pos_tag(nltk.word_tokenize(doc))

A function to sequentially tokenise and POS-tag the contents of a corpus of books can be written:

# main.py ... continued
def tokenise_seq(corpus):
 tokens = {
 f_id: tokenise_and_pos_tag(doc)
 for f_id, doc in corpus.items()
 }
 return tokens

You need to install or build Python in a particular way to make use of “Free-threaded” Python. In the above, we installed Python “3.14t” using uv, so we can compare the speed of free-threaded and sequential, single-core, processing.

We will use the timeit package to analyse processing speed, from the command line.

# Activate the threaded version of Python 3.14
uv python pin 3.14t

# Install the dependencies for our main.py script
uv add timeit nltk

# Time the `tokenise_seq()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
 uv run python -m timeit \
 --setup "import main; corpus = main.setup()" \
 "main.tokenise_seq(corpus)"

# [lots of output messages]
# 1 loop, best of 5: 53.1 sec per loop

After some initial steps where the nltk datasets were downloaded and the corpus object was created (neither of which were timed, because these steps were part of the timeit --setup block), tokenise_seq(corpus) was run multiple times and the fastest speed was around 53 seconds.

A small note: we have used the environment variable PYTHON_GIL=0 here. This makes it explicit that we are using free-threading (turning off the GIL). This wouldn’t normally be necessary to take advantage of free-threading (in Python “3.14t”), but was needed because one of the dependencies of nltk hasn’t been validated for the free-threaded build yet.

To write a threaded-version of the same, we introduce two functions. The first is a helper that takes (filename, document-content) pairs and returns (filename, processed-document) pairs:

def tupled_tokeniser(pair):
 file_id, doc = pair
 return file_id, tokenise_and_pos_tag(doc)

The second function creates a Thread-pool, taking advantage of as many CPUs as there are available on my machine (16, counted by multiprocessing.cpu_count()). Each document is processed as a separate thread and we wait for all of the documents to be processed before returning results to the caller:

import multiprocessing as mp
from concurrent.futures import ThreadPoolExecutor, wait
# ...
def tokenise_threaded(corpus):
 with ThreadPoolExecutor(max_workers=mp.cpu_count()) as tpe:
 try:
 futures = [
 tpe.submit(tupled_tokeniser, pair)
 for pair in corpus.items()
 ]
 wait(futures)
 finally:
 # output is a list of (file-id, data) pairs
 tokens = [f.result() for f in futures]
 return tokens

# Time the `tokenise_threaded()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
 uv run python -m timeit \
 --setup "import main; corpus = main.setup()" \
 "main.tokenise_threaded(corpus)"
# [lots of output messages]
# 1 loop, best of 5: 32.5 sec per loop

I could see that every core was used when processing the documents, using the htop tool on Ubuntu. At points during the run, each of the 16 CPUs was at near to 100% use (whereas only one or two CPUs were busy at any time during the sequential run):

Visual demonstration that 16 processors were busy

But, despite using 16x as many CPUs, the multithreaded version of the processing script was only about 40% faster. There was only 18 books in the dataset and some disparity between the book lengths (the bible, containing millions of words was processed much slower than the others). Maybe the speed up would be greater with a larger or more balanced dataset.

In the post on the WZB Data Science blog, there is a multiprocessing implementation of the above. Running their multiprocessing code with 16 CPUs gave a similar speed up to multithreading (minimum time 31.2 seconds). Indeed, if I was writing this code for a real project, multiprocessing would remain my choice, because the analysis for one book can proceed independently of that for any other book and data volumes aren’t that big.

Other News

Python 3.14 has also introduced some improvements to exception-handling, a new approach to string templating and improvements to the use of concurrent interpreters. See the Python 3.14 release notes for further details.

In the wider Python Data Science ecosystem, a few other developments have occurred or are due before the end of 2025:

  • The first stable release of the Positron IDE was made in August;
  • Pandas 3.0 is due before the end of the year, and will introduce strings as a data-type, copy-on-write behaviour, and implicit access to columns in DataFrame-modification code;
  • Tools that ingest DataFrames are becoming agnostic to DataFrame library through the Narwahls project. See the Plotly write-up on this subject.

Python data science progresses at such a speed that we can only really scratch the surface here. Have we missed anything in the wider Python ecosystem (2025 edition) that will make a huge difference to your data work? Let us know on LinkedIn or Bluesky.

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)