Podcasts have become more popular with the rise of mobile devices and improved recording software. Editing a podcast, however, can be a time-consuming endeavor.

Richard Kalling, co-host of the ALH podcast, needed to cut down on the amount of time he spent editing the podcast.

Originally, his editing process involved making notes of what sections to remove or leave in. This required him to review the entire podcast at least twice.

First, he would make a label track in Audacity, where he would annotate the audio with topics and key phrases concerning what was said in the podcast. Then, he would use these annotations to make cuts and move sections in a second pass through the audio file. He also used these annotations to find edit points to make modifications based on suggestions from his co-host.  After these edits, the podcast would be uploaded publicly.

“This was taking an obscenely long amount of time,” Richard says.

Automating the annotation part of the process could help resolve this issue. Richard decided to seek out a speech-recognition API to create a podcast transcription to use for editing.

A developer himself, Richard had several qualities in mind for the ideal ASR:

  • Easy to implement
  • Low price
  • Quick turnaround
  • Automated, no human transcription
  • Able to import into Audacity, the open-source audio software used to record and edit ALH

Testing Automatic Speech Recognition Options

Richard tested several options for transcribing their audio files. One contender was the Google Speech API, but versioning issues made setting up the environment and building the sample code time-consuming and complex. He also looked at IBM’s Watson and tested a sample with Amazon Transcribe, but found that they also did not suit his needs.

Richard has also tried Rev’s Temi software, which helped with transcribing. However, it still involved more manual steps than he was looking for and didn’t have enough flexibility on label formatting. On a recommendation, he turned to Rev.ai to help create a more automated way to transcribe the ALH podcast.

Implementing Rev.ai to Transcribe and Edit Podcast Files

Creating a solution using the Rev.ai API only took about two hours, Richard says.

He looked at the GitHub page for Rev.ai’s Python API project to browse the code and decide which classes to import. After downloading the Rev.ai Python SDK, he wrote a simple wrapper script to submit and jobs and retrieve results.

Setup was far simpler than the other ASR services he tried, requiring only instantiating a RevAiAPIClient class and passing in an API key generated on his Rev.ai account page.

Richard chose to use Rev.ai’s JSON transcript so he could control the Audacity labels.

“The JSON format is flexible enough that I can write a script to put it in exactly the format that I need it, and that the transcription accuracy is more than good enough for what I need to use it for,” Richard says. “I’m sure, with some massaging of the input audio file, I can get even better accuracy.”

Improved Podcast Editing With Rev.ai

Rather than working with his own, abbreviated notes, Richard now has the full transcript of the podcast to work with. This helped him eliminate the first pass of his process. “Now it takes about half to two-thirds the time it did before,” he says.

Having a transcript of the podcast also helps Mark El-Wakil, Richard’s co-host and the co-creator of ALH. Mark writes the podcast’s accompanying show notes, which lists topics discussed and resources mentioned on an episode. Once Richard has finished his editing, he sends the transcript to Mark, who can quickly scan the transcript to determine what should be included as supplemental information for the podcast.

Experience Rev.ai for Yourself

Rev.ai is an advanced speech recognition API from the makers of Temi and Rev.com. Power up your application with our best-in-class proprietary speech models.

Ready to tackle your program’s transcription needs more effectively? Sign up to try Rev.ai now, with 5 free hours of credit and no credit card needed!

Try it free