grafana-dashboards

Name: grafana-dashboards
Author: W. Shobson

GrafanadashboardsmonitoringobservabilitymetricsPrometheusdevopsvisualization

⭐ 36.8k📄 MIT🕒 2026-06-16Source ↗

Install this skill

npx skills add wshobson/agents

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The grafana-dashboards skill automates the definition, layout, and configuration of monitoring interfaces through JSON-based schemas. It implements standard observability frameworks like RED for service request analysis and USE for infrastructure resource health. Agents using this skill generate structured panels including stat counters, time-series graphs, and interactive heatmaps, while mapping Prometheus queries to dynamic dashboard variables. This approach ensures consistent visual standards across production environments, moving from raw telemetry to actionable insights. By programmatically constructing dashboard definitions, users maintain version-controlled observability that mirrors current system architecture. The skill simplifies the creation of multi-layered views, ensuring critical high-level metrics are immediately visible, while detailed supporting data remains accessible through intuitive table or graph panels.

When to Use This Skill

•Building real-time observability views for microservices architectures
•Creating infrastructure monitoring boards for node CPU and memory health
•Setting up SLO tracking dashboards with automated alert thresholds
•Developing executive summaries for business KPI tracking
•Automating the provisioning of monitoring views for new environments

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

“Create a Grafana dashboard for my API services
“Generate a panel showing p95 latency for my cluster
“Add a status table to the monitoring dashboard
“Build a heatmap for HTTP response durations
“Configure a dashboard variable for Kubernetes namespaces

Pro Tips

💡Start by applying the RED (Rate, Errors, Duration) or USE (Utilization, Saturation, Errors) methods to structure your initial panels for immediate impact and clarity.
💡Utilize Grafana's templating variables extensively to create dynamic dashboards that can adapt to different services, environments, or instances with minimal duplication.
💡Integrate alert rules directly within your Grafana panels to ensure that critical thresholds or anomalies automatically trigger notifications to your team.

What this skill does

•Generate standardized JSON dashboard definitions for Grafana
•Configure panel types including Stat, Table, Heatmap, and Time Series
•Integrate Prometheus query variables for dynamic dashboard filtering
•Embed alerting logic directly within dashboard panel configurations
•Apply industry-standard monitoring frameworks like RED and USE
•Implement threshold-based color coding for instantaneous health status

When not to use it

✕Performing one-off ad-hoc debugging queries inside the Prometheus browser
✕Managing large-scale log aggregation requiring specialized SIEM tools
✕Generating static reports for long-term compliance archiving

Example workflow

Identify target metrics based on service-level requirements
Define a list of template variables to filter data
Construct individual panel JSON structures for each required metric
Assemble panels into a consolidated dashboard schema
Validate the schema against the target Grafana instance API

Prerequisites

–Access to a running Grafana instance
–Configured Prometheus data source
–Basic knowledge of PromQL queries

Pitfalls & limitations

!Overloading single dashboards with too many complex panels impacts performance
!Incorrect variable scope can cause queries to return empty results
!Hard-coding dashboard layouts makes them difficult to adapt for different clusters

FAQ

Can I use this for non-Prometheus data sources?

The provided patterns focus on Prometheus query syntax, but the underlying JSON structure remains valid for other data sources if you adjust the query expressions.

How do I handle version control for these dashboards?

Because the output is raw JSON, you should store these files in a Git repository and sync them to Grafana using the API or provisioning files.

What is the difference between the RED and USE methods?

RED focuses on service request monitoring (Rate, Errors, Duration), while USE targets infrastructure resource health (Utilization, Saturation, Errors).

How it compares

Unlike manual point-and-click dashboard building which is prone to drift, this skill generates reproducible, versionable code that ensures infrastructure-as-code consistency.

Source & trust

⭐ 37k stars📄 MIT🕒 Updated 2026-06-16

View original skill on GitHub →

📄 Full skill instructions — original source: wshobson/agents

# Grafana Dashboards

Create and manage production-ready Grafana dashboards for comprehensive system observability.

## Purpose

Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

## When to Use

- Visualize Prometheus metrics
- Create custom dashboards
- Implement SLO dashboards
- Monitor infrastructure
- Track business KPIs

## Dashboard Design Principles

### 1. Hierarchy of Information

┌─────────────────────────────────────┐
│  Critical Metrics (Big Numbers)     │
├─────────────────────────────────────┤
│  Key Trends (Time Series)           │
├─────────────────────────────────────┤
│  Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘

### 2. RED Method (Services)

- **Rate** - Requests per second
- **Errors** - Error rate
- **Duration** - Latency/response time

### 3. USE Method (Resources)

- **Utilization** - % time resource is busy
- **Saturation** - Queue length/wait time
- **Errors** - Error count

## Dashboard Structure

### API Monitoring Dashboard

{
  "dashboard": {
    "title": "API Monitoring",
    "tags": ["api", "production"],
    "timezone": "browser",
    "refresh": "30s",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(http_requests_total[5m])) by (service)",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Error Rate %",
        "type": "graph",
        "targets": [
          {
            "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
            "legendFormat": "Error Rate"
          }
        ],
        "alert": {
          "conditions": [
            {
              "evaluator": { "params": [5], "type": "gt" },
              "operator": { "type": "and" },
              "query": { "params": ["A", "5m", "now"] },
              "type": "query"
            }
          ]
        },
        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "P95 Latency",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
      }
    ]
  }
}

**Reference:** See assets/api-dashboard.json

## Panel Types

### 1. Stat Panel (Single Value)

{
  "type": "stat",
  "title": "Total Requests",
  "targets": [
    {
      "expr": "sum(http_requests_total)"
    }
  ],
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": ["lastNotNull"]
    },
    "orientation": "auto",
    "textMode": "auto",
    "colorMode": "value"
  },
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "value": 0, "color": "green" },
          { "value": 80, "color": "yellow" },
          { "value": 90, "color": "red" }
        ]
      }
    }
  }
}

### 2. Time Series Graph

{
  "type": "graph",
  "title": "CPU Usage",
  "targets": [
    {
      "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
    }
  ],
  "yaxes": [
    { "format": "percent", "max": 100, "min": 0 },
    { "format": "short" }
  ]
}

### 3. Table Panel

{
  "type": "table",
  "title": "Service Status",
  "targets": [
    {
      "expr": "up",
      "format": "table",
      "instant": true
    }
  ],
  "transformations": [
    {
      "id": "organize",
      "options": {
        "excludeByName": { "Time": true },
        "indexByName": {},
        "renameByName": {
          "instance": "Instance",
          "job": "Service",
          "Value": "Status"
        }
      }
    }
  ]
}

### 4. Heatmap

{
  "type": "heatmap",
  "title": "Latency Heatmap",
  "targets": [
    {
      "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
      "format": "heatmap"
    }
  ],
  "dataFormat": "tsbuckets",
  "yAxis": {
    "format": "s"
  }
}

## Variables

### Query Variables

{
  "templating": {
    "list": [
      {
        "name": "namespace",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_pod_info, namespace)",
        "refresh": 1,
        "multi": false
      },
      {
        "name": "service",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
        "refresh": 1,
        "multi": true
      }
    ]
  }
}

### Use Variables in Queries

sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))

## Alerts in Dashboards

{
  "alert": {
    "name": "High Error Rate",
    "conditions": [
      {
        "evaluator": {
          "params": [5],
          "type": "gt"
        },
        "operator": { "type": "and" },
        "query": {
          "params": ["A", "5m", "now"]
        },
        "reducer": { "type": "avg" },
        "type": "query"
      }
    ],
    "executionErrorState": "alerting",
    "for": "5m",
    "frequency": "1m",
    "message": "Error rate is above 5%",
    "noDataState": "no_data",
    "notifications": [{ "uid": "slack-channel" }]
  }
}

## Dashboard Provisioning

**dashboards.yml:**

apiVersion: 1

providers:
  - name: "default"
    orgId: 1
    folder: "General"
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /etc/grafana/dashboards

## Common Dashboard Patterns

### Infrastructure Dashboard

**Key Panels:**

- CPU utilization per node
- Memory usage per node
- Disk I/O
- Network traffic
- Pod count by namespace
- Node status

**Reference:** See assets/infrastructure-dashboard.json

### Database Dashboard

**Key Panels:**

- Queries per second
- Connection pool usage
- Query latency (P50, P95, P99)
- Active connections
- Database size
- Replication lag
- Slow queries

**Reference:** See assets/database-dashboard.json

### Application Dashboard

**Key Panels:**

- Request rate
- Error rate
- Response time (percentiles)
- Active users/sessions
- Cache hit rate
- Queue length

## Best Practices

1. **Start with templates** (Grafana community dashboards)
2. **Use consistent naming** for panels and variables
3. **Group related metrics** in rows
4. **Set appropriate time ranges** (default: Last 6 hours)
5. **Use variables** for flexibility
6. **Add panel descriptions** for context
7. **Configure units** correctly
8. **Set meaningful thresholds** for colors
9. **Use consistent colors** across dashboards
10. **Test with different time ranges**

## Dashboard as Code

### Terraform Provisioning

resource "grafana_dashboard" "api_monitoring" {
  config_json = file("${path.module}/dashboards/api-monitoring.json")
  folder      = grafana_folder.monitoring.id
}

resource "grafana_folder" "monitoring" {
  title = "Production Monitoring"
}

### Ansible Provisioning

- name: Deploy Grafana dashboards
  copy:
    src: "{{ item }}"
    dest: /etc/grafana/dashboards/
  with_fileglob:
    - "dashboards/*.json"
  notify: restart grafana

## Reference Files

- assets/api-dashboard.json - API monitoring dashboard
- assets/infrastructure-dashboard.json - Infrastructure dashboard
- assets/database-dashboard.json - Database monitoring dashboard
- references/dashboard-design.md - Dashboard design guide

## Related Skills

- prometheus-configuration - For metric collection
- slo-implementation - For SLO dashboards

By W. Shobson

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

Click "Download" above
In your project, create the directory: .agent/skills/grafana-dashboards/
Save the file as SKILL.md
The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

Claude Code: ~/.claude/skills/wshobson/agents/grafana-dashboards/SKILL.md
Cursor: ~/.cursor/skills/wshobson/agents/grafana-dashboards/SKILL.md
Antigravity: ~/.gemini/antigravity/skills/wshobson/agents/grafana-dashboards/SKILL.md

🚀 Install with CLI:
npx skills add wshobson/agents

Read the Master Guide: Mastering Agent Skills →

Recommended Rules

View more rules →

Recommended Workflows

View more workflows →

Check SSL Certificates

SecurityDevOpsSSL

--- description: Verify SSL certificate validity and expiration --- 1. **Check Expiry**: - Use openssl to check a domain. Replace `google.com` wit...

Implement Feature Flags

Feature FlagsDeploymentA/B Testing

--- description: Safely release features with toggles for gradual rollouts --- 1. **Simple Approach: Environment Variables**: - Use env vars for b...

Implement Blue-Green Deployment

DeploymentDevOpsZero-Downtime

--- description: Zero-downtime deploys --- 1. **Setup Two Environments**: - Blue: Current (v1.0) - Green: New (v1.1) 2. **Route Traffic Gradua...

Recommended MCP Servers

View more MCP servers →

VictoriaMetrics

Official

Comprehensive integration with [VictoriaMetrics APIs](https://docs.victoriametrics.com/victoriametrics/url-examples/) and [documentation](https://docs.victoriametrics.com/) for monitoring, observability, and debugging tasks related to your VictoriaMetrics instances.