Local AI transcription with whisper.cpp

Christoph Dähne03.02.2023

While transcribing podcasts I ran the transcription on a server. Otherwise I would not be able to use my laptop for hours at a time. A college of mine hinted me to whisper.cpp, a nice Github project. It is a C++ based implementation of Whisper and is much more performant.

Also it can compile to WASM and run directly in the browser and do other cool stuff. I recommend you to check out the examples.

Back to local speech-to-text, I want to share a small shell script I use as my local whisper command. Please install ffmpeg. Also, you need to checkout and compile whisper.cpp and download the language model. Just see the hints in the script.

#!/usr/bin/env bash set -e _log_success() { printf "\033[0;32m%s\033[0m\n" "${1}" } _log_error() { printf "\033[0;31m%s\033[0m\n" "${1}" } INSTALL_PATH=~/src/github/ggerganov/whisper.cpp if [ ! -d "$INSTALL_PATH" ]; then _log_error "Failed to locate whisper.cpp" echo "Please execute" echo "mkdir -p ~/src/github/ggerganov" echo "cd ~/src/github/ggerganov" echo "git clone https://github.com/ggerganov/whisper.cpp.git" exit 1 fi BIN_PATH=$INSTALL_PATH/main if [ ! -f "$BIN_PATH" ]; then _log_error "Failed to locate binary" echo "Please execute" echo "cd $INSTALL_PATH" echo "make" exit 1 fi MODEL_PATH=$INSTALL_PATH/models/ggml-medium.bin if [ ! -f "$MODEL_PATH" ]; then _log_error "Failed to locate language model" echo "Please execute" echo "cd $INSTALL_PATH" echo "bash ./models/download-ggml-model.sh medium" exit 1 fi # last parameter is the source file INPUT_PATH="${@: -1}" # others are the ffmpeg flags FFMPEG_FLAGS="${@:1:$(($#-1))}" if [ ! -f "$INPUT_PATH" ]; then _log_error "Input file not found at '$INPUT_PATH'" echo "Usage: whisper [ffmpeg flag 1] [ffmpeg flag 2] […] path/to/audio.file" echo "Example: whisper -ss 12 path/to/audio.file # start at second 12" echo "Example: whisper -ss 12 -t 30 path/to/audio.file # start at second 12 and transcribe 30 seconds" exit 1 fi INPUT_FILE=$(basename "$INPUT_PATH") INPUT_NAME=${INPUT_FILE%.*} _log_success "Converting $INPUT_PATH …" ffmpeg $FFMPEG_FLAGS -i "$INPUT_PATH" -acodec pcm_s16le -ac 1 -ar 16000 "$INPUT_NAME.wav" _log_success "Transcribing $INPUT_PATH …" $BIN_PATH -m $MODEL_PATH -l auto -otxt -of "$INPUT_NAME.raw" "$INPUT_NAME.wav" _log_success cat "$INPUT_NAME.raw.txt" | sed 's/^ *//' | sed 's/\[.*\]//' | sed '/^$/d' > "$INPUT_NAME.txt" _log_success "Cleaning up …" rm "$INPUT_NAME.wav" rm "$INPUT_NAME.raw.txt" _log_success "Done" echo "See $INPUT_NAME.txt"

As always, license is MIT. So, feel free to use and share the script or your feedback.

Dein Besuch auf unserer Website produziert laut der Messung auf websitecarbon.com nur 0,28 g CO₂.