AI Prompts to Automate Internal Runbook & SOP Creation
Empower your DevOps and IT Operations teams with efficient runbook creation. These expert AI prompts help you quickly generate clear, comprehensive SOPs, incident response plans, and system maintenance guides. Automate documentation to ensure consistent, actionable knowledge across your organization.
Create a standardized markdown template for a new runbook to ensure consistency across all documentation.
Act as a senior SRE. Generate a comprehensive markdown template for an internal runbook for the service '[SERVICE_NAME]'. The template should include sections for: Overview, Service Dependencies, Key Contacts (On-call), Common Alerts, Triage Steps, Escalation Procedures, Recovery Playbooks, and a Change Log. Ensure it is formatted for readability and easy navigation.
Define Service Overview
Write a concise, high-level overview for a specific service, making it easy for new team members to understand.
Write a one-paragraph overview for our internal runbook about the '[SERVICE_NAME]' service. Explain its core function, its primary users (e.g., internal teams, external customers), and its business impact. The tone should be clear and accessible to a new engineer.
List Service Dependencies
Identify and list all upstream and downstream dependencies for a service to map out potential points of failure.
Based on the following service description, identify and list potential upstream (services it relies on) and downstream (services that rely on it) dependencies for '[SERVICE_NAME]'. Description: [PASTE SERVICE DESCRIPTION HERE]. Format the output as two markdown lists: 'Upstream Dependencies' and 'Downstream Dependencies'.
Create Escalation Path
Outline a clear, multi-level escalation procedure for an incident to ensure timely response from the right people.
Generate a 3-level escalation procedure for an incident related to '[SERVICE_NAME]'. Level 1 should be the on-call engineer, escalated after '[TIME_LIMIT_L1]' minutes. Level 2 should be the team lead, escalated after '[TIME_LIMIT_L2]' minutes. Level 3 should be the engineering manager, escalated after '[TIME_LIMIT_L3]' minutes. For each level, specify the trigger for escalation (e.g., 'no resolution after 30 minutes') and the contact method (e.g., 'PagerDuty, then Slack').
Write Triage Steps for Alert
Generate initial diagnostic steps for a specific system alert to guide on-call engineers.
An alert named '[ALERT_NAME]' has fired for the '[SERVICE_NAME]' service. The alert description is: '[ALERT_DESCRIPTION]'. Generate a numbered list of the first 5 triage steps an on-call engineer should take to diagnose the issue. Focus on verification, log checking, and common failure points.
Create a 'High CPU' Playbook
Develop a step-by-step playbook for resolving a common high CPU utilization issue.
Write a detailed playbook for an incident where the '[SERVICE_NAME]' service is experiencing sustained high CPU utilization. Include steps for: 1. Confirming the issue with '[MONITORING_TOOL]' monitoring tools. 2. Identifying the problematic process/pod using '[COMMAND/TOOL]'. 3. Checking for recent deployments via '[DEPLOYMENT_SYSTEM]'. 4. Analyzing logs for errors in '[LOG_SYSTEM]'. 5. A safe restart procedure for '[SERVICE_NAME]' components.
Draft Incident Communication
Generate a template for internal communication during an active incident to keep stakeholders informed.
Create a template for an internal incident communication message to be posted in our #incidents Slack channel. The template should have placeholders for: Incident Title, Severity (SEV-1/2/3), Summary of Impact, Current Status, Lead Responder, and a link to the incident call/document.
Database Connection Failure Playbook
Outline steps to troubleshoot and resolve database connectivity issues, a frequent and critical problem.
Generate a runbook playbook for diagnosing a '[DATABASE_TYPE]' connection failure from our '[APPLICATION_NAME]' service. Include steps to check: 1. Application logs for connection errors. 2. Network connectivity between the app server and the database server. 3. Database user credentials and permissions. 4. Database server status and resource usage.
Post-Incident Review Questions
Generate a list of key questions for a post-mortem to facilitate a blameless and productive review.
Generate a list of key questions to guide a post-incident review (post-mortem). The questions should cover: Timeline of Events, Root Cause Analysis, Impact Assessment, Actions Taken, and Lessons Learned. Frame the questions to be blameless.
Rollback Procedure Steps
Create a generic but safe procedure for rolling back a recent deployment to quickly mitigate issues.
Write a generic, step-by-step procedure for rolling back a failed deployment of the '[SERVICE_NAME]' service. The procedure should emphasize safety and include steps for: 1. Declaring the rollback internally. 2. Verifying the previous stable version tag or commit in '[VERSION_CONTROL_SYSTEM]'. 3. Executing the rollback command using '[DEPLOYMENT_TOOL]' (e.g., `helm rollback`, `kubectl rollout undo`). 4. Monitoring the service health post-rollback using '[MONITORING_TOOL]'. 5. Communicating completion to stakeholders.
Server Patching SOP
Create a Standard Operating Procedure for applying OS patches to servers to ensure security and stability.
Write a Standard Operating Procedure (SOP) for applying security patches to our fleet of '[OS_TYPE]' servers. The SOP should cover: 1. Pre-patch checks (backups, health status in '[HEALTH_DASHBOARD]'). 2. Applying patches to a staging server first via '[PATCHING_TOOL]'. 3. Verification steps on the staging server, including '[VERIFICATION_TESTS]'. 4. Rolling out patches to production, considering '[ROLLOUT_STRATEGY]'. 5. Post-patch verification and monitoring.
Database Backup & Restore SOP
Document the procedure for backing up and, more importantly, restoring a database.
Create an SOP for backing up and restoring our '[DATABASE_TYPE]' database named '[DB_NAME]'. Include separate sections for: 1. The daily backup procedure (command/script to run). 2. The procedure for restoring the database from a backup to a new, non-production environment for verification.
SSL Certificate Renewal
Outline the steps required to renew an SSL certificate for a web service before it expires.
Generate a runbook procedure for renewing an SSL certificate for the domain '[DOMAIN_NAME]'. The procedure should cover: 1. Generating a new Certificate Signing Request (CSR). 2. Submitting the CSR to our Certificate Authority. 3. Validating domain ownership. 4. Installing the new certificate on our '[SERVER_TYPE]' web server. 5. Verifying the new certificate is active.
User Access Provisioning
Create a checklist for granting a new user access to a system, ensuring no steps are missed.
Create a checklist-style SOP for provisioning a new user with '[ROLE_TYPE]' access to the '[SYSTEM_NAME]' application. The checklist should include steps for: 1. Verifying manager approval. 2. Creating the user account. 3. Assigning the correct permissions/role group. 4. Sending login instructions to the user. 5. Documenting the access grant in our tracking system.
System Restart Sequence
Document the correct order for restarting a multi-component system to prevent startup errors.
Our '[APPLICATION_NAME]' application consists of three services: a web front-end, an API backend, and a database. Document the correct, ordered sequence for a full system restart to avoid startup errors. Provide a brief justification for the order.
Log Rotation Check
Create a procedure to verify that log rotation is configured and working correctly to prevent disk space issues.
Write a simple runbook procedure to verify that log rotation is functioning correctly for the '[SERVICE_NAME]' on our Linux servers. The steps should include which directory to check, what log file patterns to look for (e.g., .gz, .1), and how to confirm file sizes are being managed.
New Engineer Onboarding
Create a runbook section outlining the first week's tasks for a new engineer to streamline their onboarding.
Generate a runbook section titled 'New Engineer Onboarding: First Week Checklist' for someone joining the '[TEAM_NAME]' team. Include daily goals covering: Day 1 (account setup), Day 2 (dev environment setup), Day 3 (codebase overview), Day 4 (shadowing a deployment), Day 5 (review and planning).
Explain Codebase Architecture
Simplify a complex technical architecture for a new team member to accelerate their learning.
Explain the high-level architecture of our '[APPLICATION_NAME]' in simple terms for a new engineer. The current architecture is: [PASTE 1-2 PARAGRAPHS OF TECHNICAL DESCRIPTION]. Rewrite this to be more accessible, using an analogy if helpful, and focusing on the main components and how they interact.
How to Use Staging Env
Write a guide on how to properly use the team's staging environment to prevent conflicts.
Write a short guide for our internal runbook on how to use the '[TEAM_NAME]' team's staging environment. Cover key rules like: 1. How to claim the environment. 2. The process for deploying a feature branch. 3. The importance of resetting data after testing. 4. Who to contact if the environment is broken.
On-Call Shadowing Guide
Create a set of guidelines for an engineer's first on-call shadowing rotation to set clear expectations.
Create a runbook guide for an engineer who is shadowing the on-call rotation for the first time. The guide should include: 1. The goal of shadowing (to learn, not to fix). 2. What to observe during an incident. 3. Key questions to ask the primary on-call. 4. How to participate in post-incident reviews.
Simplify Technical Jargon
Rewrite a technical paragraph to be clearer and more accessible for a broader audience.
Rewrite the following technical paragraph from our runbook to be clearer and use simpler language. Remove unnecessary jargon and acronyms. Original paragraph: '[PASTE TECHNICAL PARAGRAPH HERE]'
Add Troubleshooting FAQs
Generate a list of Frequently Asked Questions based on a runbook section to preemptively answer common questions.
Read the following runbook section about '[TOPIC]' and generate a list of 5 potential Frequently Asked Questions (FAQs) that a new engineer might have. For each question, provide a concise answer based on the text. Runbook section: '[PASTE RUNBOOK SECTION HERE]'
Create Executive Summary
Summarize a technical runbook for a non-technical audience like management or other departments.
Summarize the following technical runbook procedure into a 3-bullet point executive summary for a non-technical manager. The summary should focus on the 'what' and 'why', not the 'how'. Runbook procedure: '[PASTE PROCEDURE STEPS HERE]'
Check for Clarity & Gaps
Ask AI to review a runbook for potential ambiguities or missing information to improve its quality.
Act as a senior technical writer. Review the following runbook procedure. Identify any steps that are ambiguous, unclear, or could be misinterpreted. Also, suggest any missing information that would make this procedure safer or more effective. Procedure: '[PASTE PROCEDURE HERE]'
Convert Steps to Checklist
Transform a paragraph of instructions into a clear, actionable checklist for easy execution.
Convert the following paragraph of instructions into a numbered markdown checklist. Each item in the checklist should be a single, clear action. Paragraph: '[PASTE INSTRUCTION PARAGRAPH HERE]'
Generate 'Common Mistakes' Section
Create a list of common pitfalls to avoid when performing a task, based on an existing procedure.
Based on the following procedure for '[TASK_NAME]', generate a 'Common Mistakes to Avoid' section for the runbook. List at least 3 potential pitfalls an engineer might encounter and how to prevent them. Procedure: '[PASTE PROCEDURE HERE]'
Draft a Change Log Entry
Create a standardized format for logging changes in the runbook to maintain a clear audit trail.
Create a markdown template for a single change log entry in a runbook. The template should include fields for: Date, Author, Change Description, and a link to the relevant ticket or pull request. Provide one example entry.
Turn these prompts into a reusable workspace
Save your favourite prompts once, reuse them with Alt+P, keep a live Table of Contents of long chats, and export conversations when you're done.