Back to DevOps & CI/CD

grafana-dashboards

GrafanadashboardsmonitoringobservabilitymetricsPrometheusdevopsvisualization
⭐ 36.8kπŸ“„ MITπŸ•’ 2026-06-16Source β†—

Install this skill

npx skills add wshobson/agents

Works across Claude Code, Cursor, Codex, Copilot & Antigravity

The grafana-dashboards skill automates the definition, layout, and configuration of monitoring interfaces through JSON-based schemas. It implements standard observability frameworks like RED for service request analysis and USE for infrastructure resource health. Agents using this skill generate structured panels including stat counters, time-series graphs, and interactive heatmaps, while mapping Prometheus queries to dynamic dashboard variables. This approach ensures consistent visual standards across production environments, moving from raw telemetry to actionable insights. By programmatically constructing dashboard definitions, users maintain version-controlled observability that mirrors current system architecture. The skill simplifies the creation of multi-layered views, ensuring critical high-level metrics are immediately visible, while detailed supporting data remains accessible through intuitive table or graph panels.

When to Use This Skill

  • β€’Building real-time observability views for microservices architectures
  • β€’Creating infrastructure monitoring boards for node CPU and memory health
  • β€’Setting up SLO tracking dashboards with automated alert thresholds
  • β€’Developing executive summaries for business KPI tracking
  • β€’Automating the provisioning of monitoring views for new environments

How to Invoke This Skill

Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:

  • β€œCreate a Grafana dashboard for my API services
  • β€œGenerate a panel showing p95 latency for my cluster
  • β€œAdd a status table to the monitoring dashboard
  • β€œBuild a heatmap for HTTP response durations
  • β€œConfigure a dashboard variable for Kubernetes namespaces

Pro Tips

  • πŸ’‘Start by applying the RED (Rate, Errors, Duration) or USE (Utilization, Saturation, Errors) methods to structure your initial panels for immediate impact and clarity.
  • πŸ’‘Utilize Grafana's templating variables extensively to create dynamic dashboards that can adapt to different services, environments, or instances with minimal duplication.
  • πŸ’‘Integrate alert rules directly within your Grafana panels to ensure that critical thresholds or anomalies automatically trigger notifications to your team.

What this skill does

  • β€’Generate standardized JSON dashboard definitions for Grafana
  • β€’Configure panel types including Stat, Table, Heatmap, and Time Series
  • β€’Integrate Prometheus query variables for dynamic dashboard filtering
  • β€’Embed alerting logic directly within dashboard panel configurations
  • β€’Apply industry-standard monitoring frameworks like RED and USE
  • β€’Implement threshold-based color coding for instantaneous health status

When not to use it

  • βœ•Performing one-off ad-hoc debugging queries inside the Prometheus browser
  • βœ•Managing large-scale log aggregation requiring specialized SIEM tools
  • βœ•Generating static reports for long-term compliance archiving

Example workflow

  1. Identify target metrics based on service-level requirements
  2. Define a list of template variables to filter data
  3. Construct individual panel JSON structures for each required metric
  4. Assemble panels into a consolidated dashboard schema
  5. Validate the schema against the target Grafana instance API

Prerequisites

  • –Access to a running Grafana instance
  • –Configured Prometheus data source
  • –Basic knowledge of PromQL queries

Pitfalls & limitations

  • !Overloading single dashboards with too many complex panels impacts performance
  • !Incorrect variable scope can cause queries to return empty results
  • !Hard-coding dashboard layouts makes them difficult to adapt for different clusters

FAQ

Can I use this for non-Prometheus data sources?
The provided patterns focus on Prometheus query syntax, but the underlying JSON structure remains valid for other data sources if you adjust the query expressions.
How do I handle version control for these dashboards?
Because the output is raw JSON, you should store these files in a Git repository and sync them to Grafana using the API or provisioning files.
What is the difference between the RED and USE methods?
RED focuses on service request monitoring (Rate, Errors, Duration), while USE targets infrastructure resource health (Utilization, Saturation, Errors).

How it compares

Unlike manual point-and-click dashboard building which is prone to drift, this skill generates reproducible, versionable code that ensures infrastructure-as-code consistency.

Source & trust

⭐ 37k starsπŸ“„ MITπŸ•’ Updated 2026-06-16
πŸ“„ Full skill instructions β€” original source: wshobson/agents
# Grafana Dashboards

Create and manage production-ready Grafana dashboards for comprehensive system observability.

## Purpose

Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

## When to Use

- Visualize Prometheus metrics
- Create custom dashboards
- Implement SLO dashboards
- Monitor infrastructure
- Track business KPIs

## Dashboard Design Principles

### 1. Hierarchy of Information

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Critical Metrics (Big Numbers) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Key Trends (Time Series) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Detailed Metrics (Tables/Heatmaps) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


### 2. RED Method (Services)

- **Rate** - Requests per second
- **Errors** - Error rate
- **Duration** - Latency/response time

### 3. USE Method (Resources)

- **Utilization** - % time resource is busy
- **Saturation** - Queue length/wait time
- **Errors** - Error count

## Dashboard Structure

### API Monitoring Dashboard

{
"dashboard": {
"title": "API Monitoring",
"tags": ["api", "production"],
"timezone": "browser",
"refresh": "30s",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "{{service}}"
}
],
"gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
},
{
"title": "Error Rate %",
"type": "graph",
"targets": [
{
"expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
"legendFormat": "Error Rate"
}
],
"alert": {
"conditions": [
{
"evaluator": { "params": [5], "type": "gt" },
"operator": { "type": "and" },
"query": { "params": ["A", "5m", "now"] },
"type": "query"
}
]
},
"gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
},
{
"title": "P95 Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
"legendFormat": "{{service}}"
}
],
"gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
}
]
}
}


**Reference:** See assets/api-dashboard.json

## Panel Types

### 1. Stat Panel (Single Value)

{
"type": "stat",
"title": "Total Requests",
"targets": [
{
"expr": "sum(http_requests_total)"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": 0, "color": "green" },
{ "value": 80, "color": "yellow" },
{ "value": 90, "color": "red" }
]
}
}
}
}


### 2. Time Series Graph

{
"type": "graph",
"title": "CPU Usage",
"targets": [
{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
}
],
"yaxes": [
{ "format": "percent", "max": 100, "min": 0 },
{ "format": "short" }
]
}


### 3. Table Panel

{
"type": "table",
"title": "Service Status",
"targets": [
{
"expr": "up",
"format": "table",
"instant": true
}
],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": { "Time": true },
"indexByName": {},
"renameByName": {
"instance": "Instance",
"job": "Service",
"Value": "Status"
}
}
}
]
}


### 4. Heatmap

{
"type": "heatmap",
"title": "Latency Heatmap",
"targets": [
{
"expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
"format": "heatmap"
}
],
"dataFormat": "tsbuckets",
"yAxis": {
"format": "s"
}
}


## Variables

### Query Variables

{
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": false
},
{
"name": "service",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
"refresh": 1,
"multi": true
}
]
}
}


### Use Variables in Queries

sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))


## Alerts in Dashboards

{
"alert": {
"name": "High Error Rate",
"conditions": [
{
"evaluator": {
"params": [5],
"type": "gt"
},
"operator": { "type": "and" },
"query": {
"params": ["A", "5m", "now"]
},
"reducer": { "type": "avg" },
"type": "query"
}
],
"executionErrorState": "alerting",
"for": "5m",
"frequency": "1m",
"message": "Error rate is above 5%",
"noDataState": "no_data",
"notifications": [{ "uid": "slack-channel" }]
}
}


## Dashboard Provisioning

**dashboards.yml:**

apiVersion: 1

providers:
- name: "default"
orgId: 1
folder: "General"
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/dashboards


## Common Dashboard Patterns

### Infrastructure Dashboard

**Key Panels:**

- CPU utilization per node
- Memory usage per node
- Disk I/O
- Network traffic
- Pod count by namespace
- Node status

**Reference:** See assets/infrastructure-dashboard.json

### Database Dashboard

**Key Panels:**

- Queries per second
- Connection pool usage
- Query latency (P50, P95, P99)
- Active connections
- Database size
- Replication lag
- Slow queries

**Reference:** See assets/database-dashboard.json

### Application Dashboard

**Key Panels:**

- Request rate
- Error rate
- Response time (percentiles)
- Active users/sessions
- Cache hit rate
- Queue length

## Best Practices

1. **Start with templates** (Grafana community dashboards)
2. **Use consistent naming** for panels and variables
3. **Group related metrics** in rows
4. **Set appropriate time ranges** (default: Last 6 hours)
5. **Use variables** for flexibility
6. **Add panel descriptions** for context
7. **Configure units** correctly
8. **Set meaningful thresholds** for colors
9. **Use consistent colors** across dashboards
10. **Test with different time ranges**

## Dashboard as Code

### Terraform Provisioning

resource "grafana_dashboard" "api_monitoring" {
config_json = file("${path.module}/dashboards/api-monitoring.json")
folder = grafana_folder.monitoring.id
}

resource "grafana_folder" "monitoring" {
title = "Production Monitoring"
}


### Ansible Provisioning

- name: Deploy Grafana dashboards
copy:
src: "{{ item }}"
dest: /etc/grafana/dashboards/
with_fileglob:
- "dashboards/*.json"
notify: restart grafana


## Reference Files

- assets/api-dashboard.json - API monitoring dashboard
- assets/infrastructure-dashboard.json - Infrastructure dashboard
- assets/database-dashboard.json - Database monitoring dashboard
- references/dashboard-design.md - Dashboard design guide

## Related Skills

- prometheus-configuration - For metric collection
- slo-implementation - For SLO dashboards

How to Use This Skill Unit

Option A: Project-Specific (Recommended)

  1. Click "Download" above
  2. In your project, create the directory: .agent/skills/grafana-dashboards/
  3. Save the file as SKILL.md
  4. The agent will automatically discover the skill based on its description.

Option B: Global Installation (All Agents)

Save the file to these locations to make it available across all projects:

  • Claude Code: ~/.claude/skills/wshobson/agents/grafana-dashboards/SKILL.md
  • Cursor: ~/.cursor/skills/wshobson/agents/grafana-dashboards/SKILL.md
  • Antigravity: ~/.gemini/antigravity/skills/wshobson/agents/grafana-dashboards/SKILL.md

πŸš€ Install with CLI:
npx skills add wshobson/agents

Read the Master Guide: Mastering Agent Skills β†’

Recommended Rules

View more rules β†’

Recommended Workflows

View more workflows β†’

Recommended MCP Servers

View more MCP servers β†’

Take It Further

Maximize your productivity with these powerful resources

πŸ“‹

Define Your Standards

Set up coding standards to ensure this workflow produces consistent, high-quality results.

Browse Rules Library
πŸ“–

Master Workflows

Learn how to create custom workflows, use Turbo Mode, and build your automation library.

Complete Guide

How to use this Skill in Claude Code & Cursor

For Claude Code (CLI)

To use this skill in Claude Code, copy the rule content into your project's custom instructions or follow our Add-Skill CLI guide. This ensures Claude follows your standards during every code generation.

For Cursor & Windsurf

For Cursor or Windsurf, individual skills are best used in the "Rules for AI" section. This specific unit helps the agent avoid devops & ci/cd issues, leading to cleaner, more efficient code.

Why the skill format matters: the standardized Agent Skills format lets your AI agent load detailed instructions only when they are relevant, keeping your prompt clean while improving results.

Source & attribution

This skill is categorized under DevOps & CI/CD and is published by W. Shobson, maintained in wshobson/agents.

← Browse All Agent Skills
Sponsored AI assistant. Recommendations may be paid.