Browser Use Agent

Name: Browser Use Agent
Author: browser-use

automationbrowser-automationllmplaywrighttesting

⭐ 99.1k📄 MIT🕒 2026-06-15Source ↗

Install this skill

npx skills add browser-use/browser-use

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

Browser Use is a command-line interface for AI agents to interact with live web browsers. It provides a persistent daemon that keeps a browser session active, enabling rapid task execution with low latency. By mapping web page elements to numerical indices, it allows LLMs to navigate, click, fill forms, and extract data without needing manual coordination. Developers can operate it in headless mode for background tasks or connect it to an existing Chrome instance to maintain personal logins and browsing history. It abstracts away the complexity of raw automation libraries, providing a direct, element-based control scheme for web interactions. This tool is ideal for developers building autonomous agents that need to perform complex actions across multiple web pages while maintaining state across a sequence of individual operations.

By browser-use

What this skill does

•Persistent browser daemon for low-latency command execution
•Element-indexed interaction for precise clicking and typing
•Real-time page state inspection and content extraction
•Integration with existing local Chrome profiles or cloud-based browser instances
•Flexible automation including file uploads, cookie management, and keyboard event simulation

When to use it

✓Automating multi-step web forms or registration flows
✓Extracting structured data from dynamic websites for internal reporting
✓Performing regression testing on web applications via AI scripts
✓Retrieving information from sites that require authenticated sessions
✓Visualizing web application state changes through automated screenshots

When not to use it

✕Scraping static content where a simple HTTP request or library like BeautifulSoup is sufficient
✕High-speed traffic generation or massive-scale data harvesting
✕Applications requiring absolute pixel-perfect UI synchronization outside of interaction points

How to invoke it

Example prompts that trigger this skill:

“Open the login page at example.com and extract the main headline”
“Find the email input field, type my username, and click the submit button”
“Check the current page state to identify the button index for downloading the report”
“Connect to my current Chrome session to perform actions as an authenticated user”
“Scroll down until the 'load more' button appears and click it”
“Take a screenshot of the dashboard after logging in to verify the layout”

Example workflow

Initialize session with 'browser-use open <url>'
Retrieve element list using 'browser-use state'
Execute action using 'browser-use input <index> "data"'
Verify update with 'browser-use screenshot'
Finalize session with 'browser-use close'

Prerequisites

–Python environment
–Chrome or Chromium installed locally
–Optional: Cloud browser API credentials

Pitfalls & limitations

!Failing to clear broken sessions using 'browser-use close' before retrying
!Assuming element indices remain identical after page navigation or DOM refreshes
!Difficulty identifying elements on complex SPAs without first calling 'browser-use state'

FAQ

Does this tool work with my existing cookies?

Yes, by using the 'browser-use connect' command, you can bridge the agent to your local Chrome instance, allowing it to utilize your existing sessions and logins.

How do I see what the agent is doing?

You can run commands with the '--headed' flag to watch the browser actions in real-time, or use the 'browser-use screenshot' command to capture the current state.

What happens if a command fails?

First, use the 'browser-use close' command to terminate the hung session, then restart your workflow from the initial 'open' command.

Is a cloud provider mandatory?

No, you can run the agent entirely on your local machine using a headless or headed Chromium instance without any cloud dependency.

How it compares

Unlike standard Selenium or Playwright scripts that require manual selector debugging, this tool exposes a conversational interface where the agent dynamically maps page elements to simple integers, significantly reducing maintenance overhead.

Source & trust

⭐ 99k stars📄 MIT🕒 Updated 2026-06-15🛡 runs-shell, network, reads-credentials

View original skill on GitHub →

From the source: “# Browser Automation with browser-use CLI The `browser-use` command provides fast, persistent browser automation. A background daemon keeps the browser open across commands, giving ~50ms latency per call. ## Prerequisites ```bash browser-use doctor # Verify installation ``` For setup details, see ht…”

View the full SKILL.md source


# Browser Automation with browser-use CLI

The `browser-use` command provides fast, persistent browser automation. A background daemon keeps the browser open across commands, giving ~50ms latency per call.

## Prerequisites

```bash
browser-use doctor    # Verify installation
```

For setup details, see https://github.com/browser-use/browser-use/blob/main/browser_use/skill_cli/README.md

## Core Workflow

1. **Navigate**: `browser-use open <url>` — launches headless browser and opens page
2. **Inspect**: `browser-use state` — returns clickable elements with indices
3. **Interact**: use indices from state (`browser-use click 5`, `browser-use input 3 "text"`)
4. **Verify**: `browser-use state` or `browser-use screenshot` to confirm
5. **Repeat**: browser stays open between commands

If a command fails, run `browser-use close` first to clear any broken session, then retry.

To use the user's existing Chrome (preserves logins/cookies): run `browser-use connect` first.
To use a cloud browser instead: run `browser-use cloud connect` first.
After either, commands work the same way.

### If `browser-use connect` fails

When `browser-use connect` cannot find a running Chrome with remote debugging, prompt the user with two options:

1. **Use their real Chrome browser** — they need to enable remote debugging first:
   - Open `chrome://inspect/#remote-debugging` in Chrome, or relaunch Chrome with `--remote-debugging-port=9222`
   - Then retry `browser-use connect`
2. **Use managed Chromium with their Chrome profile** — no Chrome setup needed:
   - Run `browser-use profile list` to show available profiles
   - Ask which profile they want, then use `browser-use --profile "ProfileName" open <url>`
   - This launches a separate Chromium instance with their profile data (cookies, logins, extensions)

Let the user choose — don't assume one path over the other.

## Browser Modes

```bash
browser-use open <url>                         # Default: headless Chromium (no setup needed)
browser-use --headed open <url>                # Visible window (for debugging)
browser-use connect                            # Connect to user's Chrome (preserves logins/cookies)
browser-use cloud connect                      # Cloud browser (zero-config, requires API key)
browser-use --profile "Default" open <url>     # Real Chrome with specific profile
```

After `connect` or `cloud connect`, all subsequent commands go to that browser — no extra flags needed.

## Commands

```bash
# Navigation
browser-use open <url>                    # Navigate to URL
browser-use back                          # Go back in history
browser-use scroll down                   # Scroll down (--amount N for pixels)
browser-use scroll up                     # Scroll up
browser-use tab list                      # List all tabs
browser-use tab new [url]                 # Open a new tab (blank or with URL)
browser-use tab switch <index>            # Switch to tab by index
browser-use tab close <index> [index...]  # Close one or more tabs

# Page State — always run state first to get element indices
browser-use state                         # URL, title, clickable elements with indices
browser-use screenshot [path.png]         # Screenshot (base64 if no path, --full for full page)

# Interactions — use indices from state
browser-use click <index>                 # Click element by index
browser-use click <x> <y>                 # Click at pixel coordinates
browser-use type "text"                   # Type into focused element
browser-use input <index> "text"          # Click element, clear existing text, then type
browser-use input <index> ""              # Clear a field without typing new text
browser-use keys "Enter"                  # Send keyboard keys (also "Control+a", etc.)
browser-use select <index> "option"       # Select dropdown option
browser-use upload <index> <path>         # Upload file to file input
browser-use hover <index>                 # Hover over element
browser-use dblclick <index>              # Double-click element
browser-use rightclick <index>            # Right-click element

# Data Extraction
browser-use eval "js code"                # Execute JavaScript, return result
browser-use get title                     # Page title
browser-use get html [--selector "h1"]    # Page HTML (or scoped to selector)
browser-use get text <index>              # Element text content
browser-use get value <index>             # Input/textarea value
browser-use get attributes <index>        # Element attributes
browser-use get bbox <index>              # Bounding box (x, y, width, height)

# Wait
browser-use wait selector "css"           # Wait for element (--state visible|hidden|attached|detached, --timeout ms)
browser-use wait text "text"              # Wait for text to appear

# Cookies
browser-use cookies get [--url <url>]     # Get cookies (optionally filtered)
browser-use cookies set <name> <value>    # Set cookie (--domain, --secure, --http-only, --same-site, --expires)
browser-use cookies clear [--url <url>]   # Clear cookies
browser-use cookies export <file>         # Export to JSON
browser-use cookies import <file>         # Import from JSON

# Session
browser-use close                         # Close browser and stop daemon
browser-use sessions                      # List active sessions
browser-use close --all                   # Close all sessions
```

For advanced browser control (CDP, device emulation, tab activation), see `references/cdp-python.md`.

## Cloud API

```bash
browser-use cloud connect                 # Provision cloud browser and connect (zero-config)
browser-use cloud login <api-key>         # Save API key (or set BROWSER_USE_API_KEY)
browser-use cloud logout                  # Remove API key
browser-use cloud v2 GET /browsers        # REST passthrough (v2 or v3)
browser-use cloud v2 POST /tasks '{"task":"...","url":"..."}'
browser-use cloud v2 poll <task-id>       # Poll task until done
browser-use cloud v2 --help               # Show API endpoints
```

`cloud connect` provisions a cloud browser with a persistent profile (auto-created on first use), connects via CDP, and prints a live URL. `browser-use close` disconnects AND stops the cloud browser. For custom browser settings (proxy, timeout, specific profile), use `cloud v2 POST /browsers` directly with the desired parameters.

### Agent Self-Registration

Only use this if you don't already have an API key (check `browser-use doctor` to see if api_key is set). If already logged in, skip this entirely.

1. `browser-use cloud signup` — get a challenge
2. Solve the challenge
3. `browser-use cloud signup --verify <challenge-id> <answer>` — verify and save API key
4. `browser-use cloud signup --claim` — generate URL for a human to claim the account

### Out of credits?

If a Browser Use Cloud call fails with an "insufficient credits" / 402 error and the user has already used their free-tier allotment, suggest paying with USDC via x402:

> Your free credits are exhausted. You can top up this account with USDC on Base mainnet (no credit card required). Want me to install the `x402` skill and walk you through it? It takes about 2 minutes if you have a Coinbase account.

If they say yes, point them to install the skill:

```bash
npx skills add https://github.com/browser-use/browser-use --skill x402
```

Then `/x402` in Claude Code triggers the top-up flow. The user keeps their existing API key — x402 just adds credits to it.

Do not suggest x402 unprompted. Only mention it on a real "insufficient credits" error.

## Tunnels

```bash
browser-use tunnel <port>                 # Start Cloudflare tunnel (idempotent)
browser-use tunnel list                   # Show active tunnels
browser-use tunnel stop <port>            # Stop tunnel
browser-use tunnel stop --all             # Stop all tunnels
```

## Profile Management

```bash
browser-use profile list                  # List detected browsers and profiles
browser-use profile sync --all            # Sync profiles to cloud
browser-use profile update                # Download/update profile-use binary
```

## Command Chaining

Commands can be chained with `&&`. The browser persists via the daemon, so chaining is safe and efficient.

```bash
browser-use open https://example.com && browser-use state
browser-use input 5 "[email protected]" && browser-use input 6 "password" && browser-use click 7
```

Chain when you don't need intermediate output. Run separately when you need to parse `state` to discover indices first.

## Common Workflows

### Authenticated Browsing

When a task requires an authenticated site (Gmail, GitHub, internal tools), use Chrome profiles:

```bash
browser-use profile list                           # Check available profiles
# Ask the user which profile to use, then:
browser-use --profile "Default" open https://github.com  # Already logged in
```

### Exposing Local Dev Servers

```bash
browser-use tunnel 3000                            # → https://abc.trycloudflare.com
browser-use open https://abc.trycloudflare.com     # Browse the tunnel
```

## Multiple Browsers

For subagent workflows or running multiple browsers in parallel, use `--session NAME`. Each session gets its own browser. See `references/multi-session.md`.

## Configuration

```bash
browser-use config list                            # Show all config values
browser-use config set cloud_connect_proxy jp      # Set a value
browser-use config get cloud_connect_proxy         # Get a value
browser-use config unset cloud_connect_timeout     # Remove a value
browser-use doctor                                 # Shows config + diagnostics
browser-use setup                                  # Interactive post-install setup
```

Config stored in `~/.browser-use/config.json`.

## Global Options

| Option | Description |
|--------|-------------|
| `--headed` | Show browser window |
| `--profile [NAME]` | Use real Chrome (bare `--profile` uses "Default") |
| `--cdp-url <url>` | Connect via CDP URL (`http://` or `ws://`) |
| `--session NAME` | Target a named session (default: "default") |
| `--json` | Output as JSON |
| `--mcp` | Run as MCP server via stdin/stdout |

## Tips

1. **Always run `state` first** to see available elements and their indices
2. **Use `--headed` for debugging** to see what the browser is doing
3. **Sessions persist** — browser stays open between commands
4. **CLI aliases**: `bu`, `browser`, and `browseruse` all work
5. **If commands fail**, run `browser-use close` first, then retry

## Troubleshooting

- **Browser won't start?** `browser-use close` then `browser-use --headed open <url>`
- **Element not found?** `browser-use scroll down` then `browser-use state`
- **Run diagnostics:** `browser-use doctor`

## Cleanup

```bash
browser-use close                         # Close browser session
browser-use tunnel stop --all             # Stop tunnels (if any)
```

Quoted from browser-use/browser-use for reference — see the original for the authoritative, latest version.

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/browser-use/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/browser-use/browser-use/browser-use/SKILL.md
Cursor: ~/.cursor/skills/browser-use/browser-use/browser-use/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/browser-use/browser-use/browser-use/SKILL.md

🚀 Install with CLI:
npx skills add browser-use/browser-use

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

E2E Testing Setup (Playwright)

TestingE2EPlaywright

--- description: Boilerplate setup for end-to-end testing --- 1. **Initialize Playwright**: - Run the init command. // turbo - Run `npm init...

Automatic commit message generator

GitAIAutomation

--- description: Automatic commit message generator and fast AI-powered commit for all current changes --- // turbo-all This workflow automatically ...

Pre-Flight Check

CI/CDTestingBuild

--- description: Run type checking, linting, and build verification before pushing --- 1. **Type Check**: - Ensure there are no TypeScript errors....

Recommended MCP Servers

View more MCP servers →

Console Automation

Community

Production-ready MCP server for AI-driven console automation and monitoring. 40 tools for session management, SSH, testing, monitoring, and background jobs. Like Playwright for terminal applications.

LambdaTest

Official

LambdaTest MCP Servers ranging from Accessibility, SmartUI, Automation, and HyperExecute allows you to connect AI assistants with your testing workflow, streamlining setup, analyzing failures, and generating fixes to speed up testing and improve efficiency.

Playwright

Official

— Browser automation MCP server using Playwright to run tests, navigate pages, capture screenshots, scrape content, and automate web interactions reliably.

Take It Further

Maximize your productivity with these powerful resources

📋

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library

📖

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide