Automatic pitch extraction from speech recordings

Eryk Walczak

7 years ago

[This article was first published on R-bloggers – Eryk Walczak, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I needed to extract mean pitch values from audio recordings of human speech, but I wanted to automate it and easily recreate my analyses so I wrote a couple of scripts that can do it much faster.

Here is a recipe for extracting pitch from voice recordings.

Cleaning audio files

My audio files were stereo recordings of a participant saying /a/ while hearing (near) real-time pitch shifts in their own productions. The left channel contains the shifted pitch (heard by participants) and the right channel contains the original speech productions.

The first step is to examine the audio recordings for any non-speech sounds. I used Audacity for that. Any grunts or sights can mess up the outcome of scripts used in the analysis. Irrelevant parts of the audio track can be silenced (CTRL+L in Audacity). Once the audio track is cleaned, I split the channels and save them in separate wav files.

Acoustic signal used in the analysis. Highlighted part is showing noise that should be removed.

Splitting continuous recordings using SFS

My pitch-extracting scripts expects each utterance to be saved in a separate wav file so I need to split the continuous recordings. It could be done manually but for longer recordings it’s cumbersome. Speech Filing System (SFS) has an option that allows splitting the continuous files on silence.

Manual:

1. Load a sound file

2. Create multiple annotations

Tools > Speech > Annotate > Find multiple endpoints

Specify the values of npoint. More information can be found here. You don’t need to know the exact number of utterances, but a close approximation should work.

Visualise the results of automatic annotation:

Check if the annotations are correct. If not, then tweak the npoint settings to get the effect you need.

3. Chop the files on annotations

Tools > Speech > Export > Chop signal into annotated regions

This will save the files in the sfs format, but PraatR can’t work with these files. They need to be transformed into wav.

4. Convert sfs into wav files

Load the files you want to convert, highlight them, and go to:

File > Export > Speech

Automatic:

If you don’t want to spend hours doing what I’ve just described then a simpler solution is using a program that runs all the commands described above.

Use the batch script that follows the steps described above (plus some extras).

View the code on Gist.

Extracting mean pitch using PraatR

Pitch could be extracted manually in Praat by going to

View & Edit > Pitch > Get pitch

but doing this for many files would take a lot of time and would be error-prone.

Luckily, there is a connection between Praat and R (PraatR) which can speed up this task.

I extracted mean pitch and duration of files. The latter can be used to reject any non-speech files. Here’s the script:

View the code on Gist.

Now you should get a nicely formatted csv file.

I hope this will save you a lot of time.

To leave a comment for the author, please follow the link and comment on their blog: R-bloggers – Eryk Walczak.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.