Prompt Library

25+ Essential Free ChatGPT Prompts for Data Analysts

Stop wasting time on repetitive data preparation and boilerplate code. These expertly crafted, free ChatGPT prompts for data analysts are designed to drastically cut down your prep time, from complex SQL queries to insightful narrative summaries. Leverage AI to transform raw data into actionable intelligence faster than ever before.

Add ChatBoost to ChromeSave these prompts into ChatBoost and reuse them with Alt+P.

Prompt Library (26)

26 PromptsFree

Generate Complex KPI SQL Query

Quickly create optimized SQL to calculate a specific Key Performance Indicator from sample schema descriptions.

Act as an expert PostgreSQL developer. I need a SQL query to calculate the Moving 30-Day Active User Count based on the 'user_events' table, which has columns: 'event_timestamp' (timestamp) and 'user_id' (integer). Ensure the query efficiently handles date windows and uses CTEs for readability.

Standardize Outlier Handling Strategy (Pandas)

Develop a Python script using Pandas to detect and flag potential outliers using the Interquartile Range (IQR) method on a specific column.

I have a Pandas DataFrame named 'df' with a column 'transaction_value'. Write a complete Python script that calculates the IQR method boundaries (Q1, Q3, IQR) and flags values outside these thresholds. Then, write a brief interpretation of what this flagging implies for outlier reporting.

Determine Appropriate A/B Test Statistical Test

Get advice on the correct statistical test for a specific A/B test scenario, including necessary assumptions and Python implementation.

I ran an A/B test comparing two versions of a landing page. We measured the conversion rate (binary outcome: converted or not). The analysis should use a Z-test for two proportions. Provide the exact Python code using `statsmodels.stats.proportion.proportions_ztest` including how to input the success counts and total observations for both groups.

Structure Dashboard Story Flow

Create a structured outline for a performance dashboard to tell a cohesive data story.

I am building a dashboard reviewing Q3 Marketing Spend vs. ROI, targeting executive stakeholders. Outline a logical flow for 6 key visualizations, starting with the highest-level summary metric and drilling down into channel performance. Suggest the best chart type (e.g., Gantt, Waterfall, Scatter, Bar) for each segment of the story.

Translate P-Value for Executive Audience

Generate a simple, non-technical explanation of a complex statistical finding for senior leadership.

Explain the concept of obtaining a 'p-value of 0.03' in a business context to a CEO who has no statistical background. Phrase the explanation around the risk we accept or reject when making a business decision based on this result, avoiding statistical jargon.

R Code for Data Reshaping (Wide to Long)

Get actionable R code for common data transformation tasks, like pivoting tables, using the Tidyverse package.

Write the complete R code using the `tidyverse` package, specifically the `pivot_longer` function, to transform a sample wide-format table where columns represent 'Month' into a tidy long format with columns 'Month', 'Metric', and 'Value'. Include sample data setup.

Analyze SQL Query Performance Issue

Analyze a provided SQL query's execution plan structure to suggest performance improvements.

Here is a complex SQL query involving recursive CTEs and three large tables: [Paste Query Here]. Based on this structure, suggest the top three most likely performance bottlenecks (e.g., indexing issues, poor join strategy) and propose specific DDL/DML changes to address them in MySQL.

Create Data Dictionary from Sample Rows

Generate a structured data dictionary based on analyzing a small sample of raw data structure.

Analyze the following 5 sample rows from a new Kafka stream dataset. Generate a formal data dictionary including Column Name, Suggested Data Type (e.g., UUID, Timestamp_ISO, Numeric), Nullability Assessment based on observed data, and a suggested Business Definition. Sample rows: [Insert 5 sample rows here].

Check Linear Regression Assumptions in Python

List the critical assumptions for running a multiple linear regression and suggest Python code snippets to check two of them, specifically focusing on residuals.

I am running a multiple linear regression model to predict CLV. List the four core assumptions. Then, provide the complete Python/Statsmodels code snippet required to plot the residuals vs. fitted values to test for Homoscedasticity, and explain the visual indicator of a problem.

Prepare for Data Requirement Elicitation

Generate a list of probing questions to ask stakeholders to fully define a new analytics request.

I have been tasked with analyzing 'Employee Retention effectiveness' for the first time. Generate 8 high-leverage, open-ended scoping questions targeting HR leadership to precisely define the cohorts (e.g., tenure, department) and the required causality analysis we must perform.

Decompose Time Series Data in Python

Provide Python code to decompose a time series into trend, seasonality, and residual components using standard libraries.

I suspect my monthly website traffic shows strong monthly seasonality. Using Python and the `statsmodels.tsa.seasonal.seasonal_decompose` function, write the necessary code to perform an *additive* decomposition on a dataset loaded into a Pandas Series named 'traffic'. Also, instruct me on how to isolate and use only the 'trend' component for smoothing.

Generate Power Query M Function for Merging

Request a specific, complex data transformation script using the Power Query M language for anti-join logic.

Write the Power Query M language code to perform an outer anti-join between two queries, 'SourceTable_A' and 'SourceTable_B', where the goal is to retrieve only the rows from 'SourceTable_A' that do NOT have a matching 'CustomerID' in 'SourceTable_B'. Provide the complete logic block starting from query initialization.

Design a Data Quality Assessment Framework

Create a formalized framework to score the quality of a newly onboarded dataset.

As a data governance reviewer, design a standardized 4-quadrant data quality assessment form for a new ingestion pipeline source. The quadrants must evaluate: Data Completeness (e.g., null rate), Data Consistency (e.g., cross-field validation success), Data Timeliness (e.g., average latency), and Data Conformity (e.g., format validation success). Assign a simple pass/fail metric to each quadrant.

Python Pandas Pivot Table Generation

Generate Pandas code to create a multi-indexed pivot table from raw transactional data including aggregation functions beyond summing.

Given a Pandas DataFrame named 'transaction_log' with columns ['Store_ID', 'Month', 'Item_Cost', 'Quantity_Sold'], write the single line of Python code using `.pivot_table()` to show the *average* 'Item_Cost' and the *sum* of 'Quantity_Sold', indexed by 'Store_ID' and then 'Month'.

Outline K-Means Clustering Steps

Map out the prerequisite steps before applying K-Means clustering to customer data, emphasizing feature engineering.

I want to segment customers based on transactional behavior (frequency, monetary value). Outline the five essential preparation steps needed before running K-Means in Scikit-learn, making sure to specify the technique used to handle feature scaling and the technique for handling skewness in the feature distributions.

Quantify What-If Business Scenarios

Develop a clear model to numerically assess the impact of a potential business change while contrasting against a stated baseline.

Our baseline marketing spend is $50,000/month, achieving 2,500 conversions (5% CR). If we increase spend by 20% ($10,000), and historical elasticity analysis suggests this yields a 0.5 percentage point CR lift, calculate the absolute ROI (Net Conversions minus Extra Spend) associated with that $10,000 investment.

Generate SQL Query with Lag/Lead Function

Create an advanced SQL query utilizing window functions for sequential analysis of user session data.

Write a standard SQL query (ANSI standard) that analyzes a user session log that has 'session_id', 'event_name', and 'event_timestamp'. Use the `LAG()` window function to calculate the exact time difference (in seconds) between a user completing the 'Login' event and the subsequent 'Checkout_Start' event within the same session.

Standardize Conflicting Metrics Definitions

Resolve ambiguity between different definitions of the same business metric across departments.

The Product team defines 'Daily Active User (DAU)' as any user logging in, whereas Finance defines it as any user completing a core financial transaction. Provide three specific, quantifiable tests an analyst can run on the data to demonstrate the business cost (e.g., volume difference) of using the Product team's definition versus the Finance definition for one week.

Categorize Open-Ended Rejection Feedback

Structure a process for grouping unstructured text feedback into analyzable categories using iterative coding.

I have 500 qualitative rejection survey responses regarding loan applications. Outline a three-step, iterative process for manually coding these responses into 10 themes: Step 1 (Initial Codebook Creation), Step 2 (Application of Codes and Reconciliation), and Step 3 (Final Theme Consolidation and Theme Definition Refinement).

Draft Schema Check Script for ETL Load

Generate a basic Python script to validate incoming data payload schema integrity focusing on type and range constraints.

Write a Python script utilizing the `jsonschema` library to validate an incoming JSON payload against a strict schema definition. The schema must enforce that 'user_id' is an integer >= 1000, 'status' is one of ['ACTIVE', 'PENDING', 'ARCHIVED'], and 'timestamp' is a valid ISO 8601 string. Log the first error encountered.

Structure an Insight-Driven Presentation

Create a strong narrative arc for presenting complex analytical findings to influence a business decision.

I have discovered that Feature X causes a 15% drop in retention for users who engage with it within their first week. Structure a 10-slide executive presentation outline designed to influence a product roadmap decision. Include placeholders for 'The Urgency (Problem Size)', 'Methodology Credibility', 'Risk of Inaction', and a clear 'Single Sentence Call to Action' slide.

Create Multi-Step Funnel SQL Analysis

Produce SQL code to calculate drop-off rates and conversion rates across a sequential business process using conditional aggregation.

Using standard SQL (not window functions), write a query against an 'events' table (columns: 'user_id', 'event_name', 'timestamp') to calculate the percentage drop-off between three stages: View Product, Add to Cart, and Purchase. Display the results showing the count at each stage and the calculated conversion rate from the previous stage.

Differentiate Correlation and Causation Examples

Generate concrete examples to illustrate the difference between correlation and causation for training purposes, focusing on confounders.

Provide three distinct, realistic e-commerce examples demonstrating correlation without causation. For each example, explicitly name a potential confounding variable (a 'Z' variable) that may be explaining the observed relationship between X and Y.

Recommend Sampling Method for Survey

Get advice on the most appropriate sampling technique given budget constraints and population knowledge, specifically recommending Stratified Sampling.

We need to survey 1,000 users regarding platform satisfaction. Our population of 50,000 users is heavily skewed: 70% are 'Basic Tier' and 30% are 'Premium Tier'. We must ensure proportional representation. Recommend and justify the use of Stratified Random Sampling. Then, calculate exactly how many samples to draw from the 'Premium Tier' to meet the 1,000 total target proportionally.

Generate Advanced Excel Formula for Multi-Criteria Aggregation

Create a robust, multi-criteria lookup formula for spreadsheet work using SUMIFS for aggregation.

I need an Excel formula that calculates the SUM of values in Column C only if the corresponding entry in Column A is 'East Region' AND the date in Column B falls within the current calendar month (dynamically determined). Use the `SUMIFS` function and volatile date functions like `EOMONTH` to achieve this.

Draft Plan for Monitoring Model Drift

Outline the basic procedural steps to monitor a predictive model deployed in production for performance degradation using population stability index (PSI).

We deployed a classification model 6 months ago. Outline a 3-stage monitoring plan to detect performance decay. Specifically, detail how to calculate the Population Stability Index (PSI) by comparing the target variable's distribution in the production data (last 30 days) versus the validation set. What PSI score indicates immediate model retraining is necessary?

Turn these prompts into a reusable workspace

Save your favourite prompts once, reuse them with Alt+P, keep a live Table of Contents of long chats, and export conversations when you're done.

Add ChatBoost to Chrome — It's free

Related Prompt Packs