Back to Security & Vulnerability Analysis

fuzzing-dictionary

fuzzingsecurity testingvulnerabilitycode analysisquality assurancesoftware testingcybersecuritydevsecops
⭐ 5.7kπŸ“„ CC-BY-SA-4.0πŸ•’ 2026-06-15Source β†—

Install this skill

npx skills add trailofbits/skills

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

A fuzzing dictionary serves as a curated set of input tokens that help guide mutation-based fuzzing tools toward hard-to-reach code paths. By injecting domain-specific constants, magic bytes, and keywords, the fuzzer spends less time guessing basic structure and more time exercising complex parsing logic. Without these hints, fuzzers often struggle to pass early conditional checks that gatekeep deeper functionality, such as protocol headers or file format signatures. The dictionary acts as a compass, forcing the generation of inputs that look syntactically valid to the target, which significantly increases the likelihood of finding memory corruption or logic bugs in parsers, protocol handlers, and serialized data structures. It bridges the gap between random bit-flipping and intelligent test case generation, transforming blind fuzzing into a more targeted search for software vulnerabilities.

When to Use This Skill

  • β€’Fuzzing network protocol implementations like DNS, HTTP, or custom binary protocols
  • β€’Testing complex file format parsers such as PDF, PNG, or multimedia containers
  • β€’Identifying vulnerabilities in configuration file loaders and environment variable parsers
  • β€’Targeting command-line argument processing logic

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œcreate a dictionary file for my fuzzer
  • β€œhow to guide AFL++ to explore deeper code paths
  • β€œextracting magic bytes from binary for fuzzing
  • β€œimprove coverage for a custom protocol fuzzer
  • β€œwhat format should my fuzzer dictionary follow

Pro Tips

  • πŸ’‘Regularly update your fuzzing dictionaries with newly discovered edge cases, common vulnerabilities, and domain-specific keywords relevant to your evolving codebase.
  • πŸ’‘Combine dictionary-based fuzzing with structural fuzzing (grammar-based) for highly complex inputs, allowing the fuzzer to both understand the format and inject meaningful tokens.
  • πŸ’‘Prioritize creating dictionaries for the most critical or attack-surface-exposed components of your application, such as external APIs, network services, or file processing modules.

What this skill does

  • β€’Injects domain-specific constants and protocol keywords into test cases
  • β€’Bypasses early validation gates and syntax checks
  • β€’Supports hex-encoded non-printable byte sequences
  • β€’Provides a standardized interface compatible with libFuzzer and AFL++
  • β€’Facilitates structural depth by guiding the fuzzer through complex state machines

When not to use it

  • βœ•Fuzzing simple arithmetic or computational algorithms without structured input
  • βœ•Scenarios where high code coverage is already achieved through pure mutation

Example workflow

  1. Analyze target source or binary to identify key constants and magic bytes
  2. Extract identified values into a structured text file using one-per-line format
  3. Apply escaping rules for non-ASCII bytes and special characters
  4. Initialize the fuzzing session with the dictionary via the -dict command-line argument
  5. Monitor code coverage progression to verify dictionary effectiveness
  6. Refine the dictionary size by removing low-impact or redundant tokens

Prerequisites

  • –Access to the target source code or binary
  • –Basic understanding of the target input protocol or format
  • –A working fuzzing environment like libFuzzer or AFL++

Pitfalls & limitations

  • !Including too many entries can bloat the input space and degrade performance
  • !Over-reliance on dictionaries may cause the fuzzer to miss bugs that occur due to invalid syntax
  • !Incorrectly formatted hex escapes can lead to unexpected behavior or silent failures

FAQ

How large should my dictionary file be?
Keep it focused; 50 to 200 high-quality, relevant entries are usually more effective than thousands of generic ones.
Can I automate the creation of these dictionaries?
Yes, you can extract strings from binaries using the strings utility or generate them from header files via grep.
Does a dictionary replace the need for an initial corpus?
No, it complements it. The dictionary helps the fuzzer construct valid tokens, while the corpus provides the foundational structure.
Will my fuzzer ignore dictionary entries if they aren't useful?
The fuzzer will continue to mutate inputs; the dictionary simply ensures that those specific tokens are injected frequently enough to trigger deeper code paths.

How it compares

While manual fuzzing might involve crafting individual test cases, a dictionary provides a systematic way to inject structural components automatically, ensuring the fuzzer iterates through various keyword combinations programmatically.

Source & trust

⭐ 5.7k starsπŸ“„ CC-BY-SA-4.0πŸ•’ Updated 2026-06-15
πŸ“„ Full skill instructions β€” original source: trailofbits/skills
# Fuzzing Dictionary

A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.

## Overview

Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.

### Key Concepts

| Concept | Description |
|---------|-------------|
| **Dictionary Entry** | A quoted string (e.g., "keyword") or key-value pair (e.g., kw="value") |
| **Hex Escapes** | Byte sequences like "\xF7\xF8" for non-printable characters |
| **Token Injection** | Fuzzer inserts dictionary entries into generated inputs |
| **Cross-Fuzzer Format** | Dictionary files work with libFuzzer, AFL++, and cargo-fuzz |

## When to Apply

**Apply this technique when:**
- Fuzzing parsers (JSON, XML, config files)
- Fuzzing protocol implementations (HTTP, DNS, custom protocols)
- Fuzzing file format handlers (PNG, PDF, media codecs)
- Coverage plateaus early without reaching deeper logic
- Target code checks for specific keywords or magic values

**Skip this technique when:**
- Fuzzing pure algorithms without format expectations
- Target has no keyword-based parsing
- Corpus already achieves high coverage

## Quick Reference

| Task | Command/Pattern |
|------|-----------------|
| Use with libFuzzer | ./fuzz -dict=./dictionary.dict ... |
| Use with AFL++ | afl-fuzz -x ./dictionary.dict ... |
| Use with cargo-fuzz | cargo fuzz run fuzz_target -- -dict=./dictionary.dict |
| Extract from header | grep -o '".*"' header.h > header.dict |
| Generate from binary | strings ./binary \| sed 's/^/"&/; s/$/&"/' > strings.dict |

## Step-by-Step

### Step 1: Create Dictionary File

Create a text file with quoted strings on each line. Use comments (#) for documentation.

**Example dictionary format:**

# Lines starting with '#' and empty lines are ignored.

# Adds "blah" (w/o quotes) to the dictionary.
kw1="blah"
# Use \\ for backslash and \" for quotes.
kw2="\"ac\\dc\""
# Use \xAB for hex values
kw3="\xF7\xF8"
# the name of the keyword followed by '=' may be omitted:
"foo\x0Abar"


### Step 2: Generate Dictionary Content

Choose a generation method based on what's available:

**From LLM:** Prompt ChatGPT or Claude with:
A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.


**From header files:**
grep -o '".*"' header.h > header.dict


**From man pages (for CLI tools):**
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict


**From binary strings:**
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict


### Step 3: Pass Dictionary to Fuzzer

Use the appropriate flag for your fuzzer (see Quick Reference above).

## Common Patterns

### Pattern: Protocol Keywords

**Use Case:** Fuzzing HTTP or custom protocol handlers

**Dictionary content:**
# HTTP methods
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"

# Headers
"Content-Type"
"Authorization"
"Host"

# Protocol markers
"HTTP/1.1"
"HTTP/2.0"


### Pattern: Magic Bytes and File Format Headers

**Use Case:** Fuzzing image parsers, media decoders, archive handlers

**Dictionary content:**
# PNG magic bytes and chunks
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"

# JPEG markers
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"


### Pattern: Configuration File Keywords

**Use Case:** Fuzzing config file parsers (YAML, TOML, INI)

**Dictionary content:**
# Common config keywords
"true"
"false"
"null"
"version"
"enabled"
"disabled"

# Section headers
"[general]"
"[network]"
"[security]"


## Advanced Usage

### Tips and Tricks

| Tip | Why It Helps |
|-----|--------------|
| Combine multiple generation methods | LLM-generated keywords + strings from binary covers broad surface |
| Include boundary values | "0", "-1", "2147483647" trigger edge cases |
| Add format delimiters | :, =, {, } help fuzzer construct valid structures |
| Keep dictionaries focused | 50-200 entries perform better than thousands |
| Test dictionary effectiveness | Run with and without dict, compare coverage |

### Auto-Generated Dictionaries (AFL++)

When using afl-clang-lto compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.

**Enable auto-dictionary:**
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# Dictionary saved to auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target


### Combining Multiple Dictionaries

Some fuzzers support multiple dictionary files:

# AFL++ with multiple dictionaries
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target


## Anti-Patterns

| Anti-Pattern | Problem | Correct Approach |
|--------------|---------|------------------|
| Including full sentences | Fuzzer needs atomic tokens, not prose | Break into individual keywords |
| Duplicating entries | Wastes mutation budget | Use sort -u to deduplicate |
| Over-sized dictionaries | Slows fuzzer, dilutes useful tokens | Keep focused: 50-200 most relevant entries |
| Missing hex escapes | Non-printable bytes become mangled | Use \xXX for binary values |
| No comments | Hard to maintain and audit | Document sections with # comments |

## Tool-Specific Guidance

### libFuzzer

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/


**Integration tips:**
- Dictionary tokens are inserted/replaced during mutations
- Combine with -max_len to control input size
- Use -print_final_stats=1 to see dictionary effectiveness metrics
- Dictionary entries longer than -max_len are ignored

### AFL++

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@


**Integration tips:**
- AFL++ supports multiple -x flags for multiple dictionaries
- Use AFL_LLVM_DICT2FILE with afl-clang-lto for auto-generated dictionaries
- Dictionary effectiveness shown in fuzzer stats UI
- Tokens are used during deterministic and havoc stages

### cargo-fuzz (Rust)

cargo fuzz run fuzz_target -- -dict=./dictionary.dict


**Integration tips:**
- cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
- Place dictionary file in fuzz/ directory alongside harness
- Reference from harness directory: cargo fuzz run target -- -dict=../dictionary.dict

### go-fuzz (Go)

go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:

# Convert dictionary to corpus files
grep -o '".*"' dict.txt | while read line; do
echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done

go-fuzz -bin=./target-fuzz.zip -workdir=.


## Troubleshooting

| Issue | Cause | Solution |
|-------|-------|----------|
| Dictionary file not loaded | Wrong path or format error | Check fuzzer output for dict parsing errors; verify file format |
| No coverage improvement | Dictionary tokens not relevant | Analyze target code for actual keywords; try different generation method |
| Syntax errors in dict file | Unescaped quotes or invalid escapes | Use \\ for backslash, \" for quotes; validate with test run |
| Fuzzer ignores long entries | Entries exceed -max_len | Keep entries under max input length, or increase -max_len |
| Too many entries slow fuzzer | Dictionary too large | Prune to 50-200 most relevant entries |

## Related Skills

### Tools That Use This Technique

| Skill | How It Applies |
|-------|----------------|
| **libfuzzer** | Native dictionary support via -dict= flag |
| **aflpp** | Native dictionary support via -x flag; auto-generation with AUTODICTIONARIES |
| **cargo-fuzz** | Uses libFuzzer backend, inherits -dict= support |

### Related Techniques

| Skill | Relationship |
|-------|--------------|
| **fuzzing-corpus** | Dictionaries complement corpus: corpus provides structure, dictionary provides keywords |
| **coverage-analysis** | Use coverage data to validate dictionary effectiveness |
| **harness-writing** | Harness structure determines which dictionary tokens are useful |

## Resources

### Key External Resources

**[AFL++ Dictionaries](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries)**
Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.

**[libFuzzer Dictionary Documentation](https://llvm.org/docs/LibFuzzer.html#dictionaries)**
Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.

### Additional Examples

**[OSS-Fuzz Dictionaries](https://github.com/google/oss-fuzz/tree/master/projects)**
Real-world dictionaries from Google's continuous fuzzing service. Search project directories for *.dict files to see production examples.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/fuzzing-dictionary/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/trailofbits/skills/fuzzing-dictionary/SKILL.md
  • Cursor: ~/.cursor/skills/trailofbits/skills/fuzzing-dictionary/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/trailofbits/skills/fuzzing-dictionary/SKILL.md

πŸš€ Install with CLI:
npx skills add trailofbits/skills

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid security & vulnerability analysis issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under Security & Vulnerability Analysis and is published by Trail of Bits, maintained in trailofbits/skills.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.