Back to AI Tools & Agents

OpenAI Whisper CLI

whisperaudiotranscriptionspeech-to-textoffline
4.9 (237)379.0k📄 NOASSERTION🕒 2026-06-16Source ↗

Install this skill

npx skills add openclaw/openclaw

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

What this skill does

  • Transcribe audio files into local text documents
  • Perform real-time speech translation into English
  • Output generated content as structured SRT files
  • Configure inference models from turbo to high-accuracy variants
  • Manage local model caching within the user directory

When to use it

  • Converting long project meeting recordings into searchable notes
  • Generating accurate subtitles for video content
  • Processing sensitive client interviews where data privacy is mandatory
  • Translating foreign language voice notes during development

When not to use it

  • High-throughput production apps requiring massive parallel processing
  • Low-power edge devices lacking sufficient RAM or GPU for model loading

How to invoke it

Example prompts that trigger this skill:

  • Transcribe this meeting recording to a text file.
  • Convert the audio file at path/to/audio.mp3 into an SRT subtitle file.
  • Translate this Spanish voice memo into English text.
  • Run the whisper medium model on my latest audio file.
  • Process all audio files in the current directory and save outputs.

Example workflow

  1. Store the raw audio file in your local project folder.
  2. Invoke the whisper command pointing to the audio file.
  3. Select the appropriate model size based on desired speed versus accuracy.
  4. Define the output format as either plain text or SRT.
  5. Review the generated transcription file created in your working directory.

Prerequisites

  • Python 3.8+
  • FFmpeg installed on the system path
  • Sufficient local storage for model cache (~2GB+)

Pitfalls & limitations

  • !Large models significantly increase RAM usage during initial execution
  • !First-time runs require a one-time download of the model weights
  • !Accuracy varies heavily depending on background noise levels

FAQ

Do I need an OpenAI API key?
No. The CLI operates entirely locally using the open-source Whisper models.
Where are the models stored?
Models are automatically downloaded and cached in ~/.cache/whisper.
Which model should I use for speed?
The default turbo model is optimized for faster inference times.
Can I output subtitles directly?
Yes, specify --output_format srt in your command to generate time-coded subtitles.

How it compares

Unlike cloud APIs that require network connectivity and billing, this local approach keeps your data entirely within your infrastructure and avoids usage costs.

Source & trust

379k stars📄 NOASSERTION🕒 Updated 2026-06-16🛡 network

From the source: “# Whisper (CLI) Use `whisper` to transcribe audio locally. Quick start - `whisper /path/audio.mp3 --model medium --output_format txt --output_dir .` - `whisper /path/audio.m4a --task translate --output_format srt` Notes - Models download to `~/.cache/whisper` on first run. - `--model` defaults to `t…”

View the full SKILL.md source

# Whisper (CLI)

Use `whisper` to transcribe audio locally.

Quick start

- `whisper /path/audio.mp3 --model medium --output_format txt --output_dir .`
- `whisper /path/audio.m4a --task translate --output_format srt`

Notes

- Models download to `~/.cache/whisper` on first run.
- `--model` defaults to `turbo` on this install.
- Use smaller models for speed, larger for accuracy.

Quoted from openclaw/openclaw for reference — see the original for the authoritative, latest version.

📄 Full skill instructions — original source: openclaw/openclaw
The OpenAI Whisper CLI skill provides local, offline speech-to-text transcription capabilities within your development environment. By running the inference directly on your machine, you eliminate the need for API keys or transmitting private audio data to external cloud providers. It handles a wide range of audio formats, enabling you to generate transcriptions or translations without latency issues or external subscription costs. This tool is ideal for developers who need to process meeting recordings, video transcripts, or voice memos while maintaining data privacy. Since the model executes locally, your hardware resources handle the compute, making it a reliable solution for offline automation workflows where consistent performance matters more than cloud-based convenience. It is part of the OpenClaw assistant ecosystem, focus-tuned for private, high-fidelity transcription tasks.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/openai-whisper/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/openclaw/openclaw/openai-whisper/SKILL.md
  • Cursor: ~/.cursor/skills/openclaw/openclaw/openai-whisper/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/openclaw/openclaw/openai-whisper/SKILL.md

🚀 Install with CLI:
npx skills add openclaw/openclaw

Read the Master Guide: Mastering Agent Skills

Recommended Rules

View more rules

Recommended Workflows

View more workflows

Recommended MCP Servers

View more MCP servers

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid ai tools & agents issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under AI Tools & Agents and is published by openclaw, maintained in openclaw/openclaw.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.