top of page

How to Build an Offline AI Transcriber | How to Use Whisper.cpp Locally

  • Writer: Brock Daily
    Brock Daily
  • Jan 27
  • 3 min read

(AI-Generated: An AI Spectral Whisper Evolving) - Made with Midjourney by Bitforge Dynamics
(AI-Generated: An AI Spectral Whisper Evolving)

Introduction:


In this guide, we’ll show you how to build and run a local speech recognition system using OpenAI’s Whisper models, in particular, the whisper.cpp implementation. Whisper.cpp is known for efficient CPU-based inference, lower memory usage, and its ability to run offline, which means:


  • No internet required

  • No cloud API costs

  • No data shared


We’ll also walk you through a basic Graphical User Interface (GUI) included in this repository so you can easily upload audio files and retrieve transcriptions with timestamps.


 

All details can be found on the README.md for this Github Project.

 

Installation Guide:


  • Offline Transcription: Run OpenAI Whisper on your local CPU or GPU.

  • Timestamped Output: Pinpoint each phrase’s timing in the audio.

  • GUI Support: A simple interface to select models, upload audio files, and view results.

  • Cross-Platform: Windows, macOS (including M1/M2), and Linux.


1. Prerequisites


  • Python 3.10+

    While Python 3.7+ may work, this guide is tested with Python 3.10+.

  • Virtual environment (highly recommended)

    This keeps your system dependencies clean.

  • M1/M2 Mac

    Special steps are required when installing PyTorch. See Installation on macOS M1/M2 below.


2. Repository Setup


Clone the repository:


git clone https://github.com/bitforgeagi/offline-ai-transcriber.git
cd offline-ai-transcriber

Create and activate a virtual environment (recommended):

macOS/Linux:


python3 -m venv venv
source venv/bin/activate

Windows (CMD/PowerShell):


python -m venv venv
venv\Scripts\activate

macOS M1/M2 Only: Install PyTorch with a special command. Skip if you’re on Windows/Linux or an Intel Mac:


pip3 install --pre torch torchaudio --extra-index-url 
https://download.pytorch.org/whl/nightly/cpu

Install the remaining requirements:


pip install -r requirements.txt

If you see a zsh: no matches found: datasets[audio] error on macOS, add quotes:


pip install 'datasets[audio]'

3. Handling Installation on macOS M1/M2


Common Errors

  • ERROR: Could not find a version that satisfies the requirement torch

  • ERROR: No matching distribution found for torch


This typically means your system’s default PyTorch build doesn’t support Apple Silicon out of the box.


The remedy:


Remove torch from your requirements.txt (or skip it by installing everything else first).

Install PyTorch using the special command:


pip3 install --pre torch torchaudio --extra-index-url
https://download.pytorch.org/whl/nightly/cpu

Install the rest of the requirements as normal:


pip install -r requirements.txt

4. Installing Whisper Models


You have two options for Whisper model files:


Download models beforehand (faster runtime, manual file management)

• Download from Hugging Face or GitHub.

• Extract them into the ./models/ direct


Let the script download them automatically (simpler, but slower first run)

• Skip manual downloading.

• The code will download required model files on first run.


If you ever want to re-download or clear out your existing models:


rm -rf models/*

5. Usage from the Command Line


5.1 Transcribing Audio


  • Place an audio file (e.g., test_audio_file.mp3) in the project directory.

  • Edit transcribe.py (if needed) to point to your file, or specify via command line:


python transcribe.py --audio_file test_audio_file.mp3


3. Select a Whisper model (optional). Defaults to small:


python transcribe.py --model tiny
#or
python transcribe.py --model large

5.2 Output Format


After a successful run, you’ll see:

1. Full transcription of the entire audio.

2. Timestamped chunks, showing which snippet of text belongs to which segment of the audio. For example, in the demo screenshot we have a clip being segmented:


6. Using the GUI


For a more user-friendly approach:

1. Run the GUI:


python gui.py

2. Select a Model from the dropdown (e.g., small, tiny, large).


Whisper.cpp GUI example
Models will automatically download in the GUI

3. Click “Select Audio File” to upload your audio file.

4. Transcription Output will appear in the scrollable text box, including timestamps.


Whisper.cpp GUI example
Example of a clip being segmented and transcribed

This GUI offers:

Progress indication while the model processes your audio.

Status updates (e.g., “Downloading model files…”).


6. Project Structure


offline-ai-transcriber/
├── gui.py              # The Graphical User Interface entry point
├── transcribe.py       # The main script for CLI-based transcription
├── requirements.txt    # Python dependencies
├── README.md           # Documentation
├── models/             # Place model files here if manually downloading (otherwise models folder will be created)
└── ... (we may add a few other things in the future)

  • gui.py: Runs a simple Flask or tkinter-based interface.

  • transcribe.py: Handles audio loading, model selection, and transcription logic.

  • models/: Default location for downloaded or manually placed Whisper models.


If you’re using Git, you can store a minimal .gitkeep file in models/ to maintain the folder structure in the repo.


License


This project is under the MIT License. Feel free to use, modify, and distribute it in your own applications.


Acknowledgements:


Whisper: OpenAI Whisper for advanced speech recognition.

whisper.cpp: Whisper.cpp GitHub Repo.


 

About Bitforge Dynamics: We are a US-based startup focused on deep-tech research for Private Industries & the U.S. Government. We are currently building offline AI systems like Dark Engine.


Thank you for reading our Blog ~ Make sure to follow us on X and stay updated!


bottom of page