OpenAI Whisper CLI

Name: OpenAI Whisper CLI
Rating: 4.9 (237 reviews)
Author: openclaw

whisperaudiotranscriptionspeech-to-textoffline

★ 4.9 (237)⭐ 379.0k📄 NOASSERTION🕒 2026-06-16Source ↗

Install this skill

npx skills add openclaw/openclaw

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

What this skill does

•Transcribe audio files into local text documents
•Perform real-time speech translation into English
•Output generated content as structured SRT files
•Configure inference models from turbo to high-accuracy variants
•Manage local model caching within the user directory

When to use it

✓Converting long project meeting recordings into searchable notes
✓Generating accurate subtitles for video content
✓Processing sensitive client interviews where data privacy is mandatory
✓Translating foreign language voice notes during development

When not to use it

✕High-throughput production apps requiring massive parallel processing
✕Low-power edge devices lacking sufficient RAM or GPU for model loading

How to invoke it

Example prompts that trigger this skill:

“Transcribe this meeting recording to a text file.”
“Convert the audio file at path/to/audio.mp3 into an SRT subtitle file.”
“Translate this Spanish voice memo into English text.”
“Run the whisper medium model on my latest audio file.”
“Process all audio files in the current directory and save outputs.”

Example workflow

Store the raw audio file in your local project folder.
Invoke the whisper command pointing to the audio file.
Select the appropriate model size based on desired speed versus accuracy.
Define the output format as either plain text or SRT.
Review the generated transcription file created in your working directory.

Prerequisites

–Python 3.8+
–FFmpeg installed on the system path
–Sufficient local storage for model cache (~2GB+)

Pitfalls & limitations

!Large models significantly increase RAM usage during initial execution
!First-time runs require a one-time download of the model weights
!Accuracy varies heavily depending on background noise levels

FAQ

Do I need an OpenAI API key?

No. The CLI operates entirely locally using the open-source Whisper models.

Where are the models stored?

Models are automatically downloaded and cached in ~/.cache/whisper.

Which model should I use for speed?

The default turbo model is optimized for faster inference times.

Can I output subtitles directly?

Yes, specify --output_format srt in your command to generate time-coded subtitles.

How it compares

Unlike cloud APIs that require network connectivity and billing, this local approach keeps your data entirely within your infrastructure and avoids usage costs.

Source & trust

⭐ 379k stars📄 NOASSERTION🕒 Updated 2026-06-16🛡 network

View original skill on GitHub →

From the source: “# Whisper (CLI) Use `whisper` to transcribe audio locally. Quick start - `whisper /path/audio.mp3 --model medium --output_format txt --output_dir .` - `whisper /path/audio.m4a --task translate --output_format srt` Notes - Models download to `~/.cache/whisper` on first run. - `--model` defaults to `t…”

View the full SKILL.md source


# Whisper (CLI)

Use `whisper` to transcribe audio locally.

Quick start

- `whisper /path/audio.mp3 --model medium --output_format txt --output_dir .`
- `whisper /path/audio.m4a --task translate --output_format srt`

Notes

- Models download to `~/.cache/whisper` on first run.
- `--model` defaults to `turbo` on this install.
- Use smaller models for speed, larger for accuracy.

Quoted from openclaw/openclaw for reference — see the original for the authoritative, latest version.

📄 Full skill instructions — original source: openclaw/openclaw

The OpenAI Whisper CLI skill provides local, offline speech-to-text transcription capabilities within your development environment. By running the inference directly on your machine, you eliminate the need for API keys or transmitting private audio data to external cloud providers. It handles a wide range of audio formats, enabling you to generate transcriptions or translations without latency issues or external subscription costs. This tool is ideal for developers who need to process meeting recordings, video transcripts, or voice memos while maintaining data privacy. Since the model executes locally, your hardware resources handle the compute, making it a reliable solution for offline automation workflows where consistent performance matters more than cloud-based convenience. It is part of the OpenClaw assistant ecosystem, focus-tuned for private, high-fidelity transcription tasks.

By openclaw

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/openai-whisper/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/openclaw/openclaw/openai-whisper/SKILL.md
Cursor: ~/.cursor/skills/openclaw/openclaw/openai-whisper/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/openclaw/openclaw/openai-whisper/SKILL.md

🚀 Install with CLI:
npx skills add openclaw/openclaw

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

Setup Service Worker for Offline

PWAOfflineService Worker

--- description: Enable offline functionality --- 1. **Install Workbox**: // turbo - Run `npm install next-pwa` 2. **Configure**: ```js ...

Automatic commit message generator

GitAIAutomation

--- description: Automatic commit message generator and fast AI-powered commit for all current changes --- // turbo-all This workflow automatically ...

Fix Next.js Hydration Errors

Next.jsDebuggingHydration

--- description: Systematically debug and fix 'Text content does not match server-rendered HTML' errors --- 1. **Check for Invalid HTML Nesting**: ...

Recommended MCP Servers

View more MCP servers →

Transcribe

Community

An MCP server provides fast and reliable transcriptions for audio/video files and voice memos. It allows LLMs to interact with the text content of audio/video file.

Tencent RTC

Official

The MCP Server enables AI IDEs to more effectively understand and use [Tencent's Real-Time Communication](https://trtc.io/) SDKs and APIs, which significantly streamlines the process for developers to build audio/video call applications.

Bluetooth MCP Server

Community

Control Bluetooth devices and manage connections through natural language commands, including device discovery, pairing, and audio controls.

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library

📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide