grafana-dashboards
Install this skill
npx skills add wshobson/agentsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
The grafana-dashboards skill automates the definition, layout, and configuration of monitoring interfaces through JSON-based schemas. It implements standard observability frameworks like RED for service request analysis and USE for infrastructure resource health. Agents using this skill generate structured panels including stat counters, time-series graphs, and interactive heatmaps, while mapping Prometheus queries to dynamic dashboard variables. This approach ensures consistent visual standards across production environments, moving from raw telemetry to actionable insights. By programmatically constructing dashboard definitions, users maintain version-controlled observability that mirrors current system architecture. The skill simplifies the creation of multi-layered views, ensuring critical high-level metrics are immediately visible, while detailed supporting data remains accessible through intuitive table or graph panels.
When to Use This Skill
- β’Building real-time observability views for microservices architectures
- β’Creating infrastructure monitoring boards for node CPU and memory health
- β’Setting up SLO tracking dashboards with automated alert thresholds
- β’Developing executive summaries for business KPI tracking
- β’Automating the provisioning of monitoring views for new environments
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- βCreate a Grafana dashboard for my API services
- βGenerate a panel showing p95 latency for my cluster
- βAdd a status table to the monitoring dashboard
- βBuild a heatmap for HTTP response durations
- βConfigure a dashboard variable for Kubernetes namespaces
Pro Tips
- π‘Start by applying the RED (Rate, Errors, Duration) or USE (Utilization, Saturation, Errors) methods to structure your initial panels for immediate impact and clarity.
- π‘Utilize Grafana's templating variables extensively to create dynamic dashboards that can adapt to different services, environments, or instances with minimal duplication.
- π‘Integrate alert rules directly within your Grafana panels to ensure that critical thresholds or anomalies automatically trigger notifications to your team.
What this skill does
- β’Generate standardized JSON dashboard definitions for Grafana
- β’Configure panel types including Stat, Table, Heatmap, and Time Series
- β’Integrate Prometheus query variables for dynamic dashboard filtering
- β’Embed alerting logic directly within dashboard panel configurations
- β’Apply industry-standard monitoring frameworks like RED and USE
- β’Implement threshold-based color coding for instantaneous health status
When not to use it
- βPerforming one-off ad-hoc debugging queries inside the Prometheus browser
- βManaging large-scale log aggregation requiring specialized SIEM tools
- βGenerating static reports for long-term compliance archiving
Example workflow
- Identify target metrics based on service-level requirements
- Define a list of template variables to filter data
- Construct individual panel JSON structures for each required metric
- Assemble panels into a consolidated dashboard schema
- Validate the schema against the target Grafana instance API
Prerequisites
- βAccess to a running Grafana instance
- βConfigured Prometheus data source
- βBasic knowledge of PromQL queries
Pitfalls & limitations
- !Overloading single dashboards with too many complex panels impacts performance
- !Incorrect variable scope can cause queries to return empty results
- !Hard-coding dashboard layouts makes them difficult to adapt for different clusters
FAQ
How it compares
Unlike manual point-and-click dashboard building which is prone to drift, this skill generates reproducible, versionable code that ensures infrastructure-as-code consistency.
π Full skill instructions β original source: wshobson/agents
Create and manage production-ready Grafana dashboards for comprehensive system observability.
## Purpose
Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.
## When to Use
- Visualize Prometheus metrics
- Create custom dashboards
- Implement SLO dashboards
- Monitor infrastructure
- Track business KPIs
## Dashboard Design Principles
### 1. Hierarchy of Information
βββββββββββββββββββββββββββββββββββββββ
β Critical Metrics (Big Numbers) β
βββββββββββββββββββββββββββββββββββββββ€
β Key Trends (Time Series) β
βββββββββββββββββββββββββββββββββββββββ€
β Detailed Metrics (Tables/Heatmaps) β
βββββββββββββββββββββββββββββββββββββββ### 2. RED Method (Services)
- **Rate** - Requests per second
- **Errors** - Error rate
- **Duration** - Latency/response time
### 3. USE Method (Resources)
- **Utilization** - % time resource is busy
- **Saturation** - Queue length/wait time
- **Errors** - Error count
## Dashboard Structure
### API Monitoring Dashboard
{
"dashboard": {
"title": "API Monitoring",
"tags": ["api", "production"],
"timezone": "browser",
"refresh": "30s",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "{{service}}"
}
],
"gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
},
{
"title": "Error Rate %",
"type": "graph",
"targets": [
{
"expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
"legendFormat": "Error Rate"
}
],
"alert": {
"conditions": [
{
"evaluator": { "params": [5], "type": "gt" },
"operator": { "type": "and" },
"query": { "params": ["A", "5m", "now"] },
"type": "query"
}
]
},
"gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
},
{
"title": "P95 Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
"legendFormat": "{{service}}"
}
],
"gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
}
]
}
}**Reference:** See
assets/api-dashboard.json## Panel Types
### 1. Stat Panel (Single Value)
{
"type": "stat",
"title": "Total Requests",
"targets": [
{
"expr": "sum(http_requests_total)"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": 0, "color": "green" },
{ "value": 80, "color": "yellow" },
{ "value": 90, "color": "red" }
]
}
}
}
}### 2. Time Series Graph
{
"type": "graph",
"title": "CPU Usage",
"targets": [
{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
}
],
"yaxes": [
{ "format": "percent", "max": 100, "min": 0 },
{ "format": "short" }
]
}### 3. Table Panel
{
"type": "table",
"title": "Service Status",
"targets": [
{
"expr": "up",
"format": "table",
"instant": true
}
],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": { "Time": true },
"indexByName": {},
"renameByName": {
"instance": "Instance",
"job": "Service",
"Value": "Status"
}
}
}
]
}### 4. Heatmap
{
"type": "heatmap",
"title": "Latency Heatmap",
"targets": [
{
"expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
"format": "heatmap"
}
],
"dataFormat": "tsbuckets",
"yAxis": {
"format": "s"
}
}## Variables
### Query Variables
{
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": false
},
{
"name": "service",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
"refresh": 1,
"multi": true
}
]
}
}### Use Variables in Queries
sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))## Alerts in Dashboards
{
"alert": {
"name": "High Error Rate",
"conditions": [
{
"evaluator": {
"params": [5],
"type": "gt"
},
"operator": { "type": "and" },
"query": {
"params": ["A", "5m", "now"]
},
"reducer": { "type": "avg" },
"type": "query"
}
],
"executionErrorState": "alerting",
"for": "5m",
"frequency": "1m",
"message": "Error rate is above 5%",
"noDataState": "no_data",
"notifications": [{ "uid": "slack-channel" }]
}
}## Dashboard Provisioning
**dashboards.yml:**
apiVersion: 1
providers:
- name: "default"
orgId: 1
folder: "General"
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/dashboards## Common Dashboard Patterns
### Infrastructure Dashboard
**Key Panels:**
- CPU utilization per node
- Memory usage per node
- Disk I/O
- Network traffic
- Pod count by namespace
- Node status
**Reference:** See
assets/infrastructure-dashboard.json### Database Dashboard
**Key Panels:**
- Queries per second
- Connection pool usage
- Query latency (P50, P95, P99)
- Active connections
- Database size
- Replication lag
- Slow queries
**Reference:** See
assets/database-dashboard.json### Application Dashboard
**Key Panels:**
- Request rate
- Error rate
- Response time (percentiles)
- Active users/sessions
- Cache hit rate
- Queue length
## Best Practices
1. **Start with templates** (Grafana community dashboards)
2. **Use consistent naming** for panels and variables
3. **Group related metrics** in rows
4. **Set appropriate time ranges** (default: Last 6 hours)
5. **Use variables** for flexibility
6. **Add panel descriptions** for context
7. **Configure units** correctly
8. **Set meaningful thresholds** for colors
9. **Use consistent colors** across dashboards
10. **Test with different time ranges**
## Dashboard as Code
### Terraform Provisioning
resource "grafana_dashboard" "api_monitoring" {
config_json = file("${path.module}/dashboards/api-monitoring.json")
folder = grafana_folder.monitoring.id
}
resource "grafana_folder" "monitoring" {
title = "Production Monitoring"
}### Ansible Provisioning
- name: Deploy Grafana dashboards
copy:
src: "{{ item }}"
dest: /etc/grafana/dashboards/
with_fileglob:
- "dashboards/*.json"
notify: restart grafana## Reference Files
-
assets/api-dashboard.json - API monitoring dashboard-
assets/infrastructure-dashboard.json - Infrastructure dashboard-
assets/database-dashboard.json - Database monitoring dashboard-
references/dashboard-design.md - Dashboard design guide## Related Skills
-
prometheus-configuration - For metric collection-
slo-implementation - For SLO dashboardsHow to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/grafana-dashboards/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/wshobson/agents/grafana-dashboards/SKILL.md - Cursor:
~/.cursor/skills/wshobson/agents/grafana-dashboards/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/wshobson/agents/grafana-dashboards/SKILL.md
π Install with CLI:npx skills add wshobson/agents