Mastering the Top 12 SQL Interview Patterns for Data Engineers
These articles are AI-generated summaries. Please check the original sources for full details.
Top 12 SQL Interview Problems for Data Engineers, With Answers
DataDriven outlines the recurring patterns used in FAANG and fintech SQL interviews. Analysis shows that 32% of these interview questions specifically test GROUP BY functionality.
Why This Matters
There is a significant gap between knowing basic syntax and understanding data grain. Many candidates fail aggregation problems—specifically at Meta interviews—because they join tables at the wrong grain, leading to double-counting errors where revenue figures can be inflated by 3x.
Key Insights
- Execution Order: SQL runs FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY; filtering aggregates in WHERE causes parse errors in PostgreSQL.
- Window Functions vs Aggregation: ROW_NUMBER preserves full row data whereas GROUP BY requires all non-aggregated columns to be grouped.
- NULL Handling: NOT IN returns zero rows if the subquery contains a single NULL, making NOT EXISTS the semantically safer production choice.
- Gaps and Islands: The subtraction of a sequential ROW_NUMBER from a date creates a constant group identifier for consecutive sequences.
Working Examples
Filtering aggregated spend using HAVING to avoid WHERE clause parse errors.
SELECT customer_id,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 500
ORDER BY total_spent DESC;
Using CTEs to aggregate at different grains to prevent double-counting revenue.
WITH order_totals AS (
SELECT order_id,
customer_id,
SUM(quantity * price) AS order_revenue
FROM orders
JOIN order_items USING (order_id)
GROUP BY order_id, customer_id
)
SELECT customer_id,
COUNT(*) AS num_orders,
SUM(order_revenue) AS total_revenue
FROM order_totals
GROUP BY customer_id;
Retrieving the latest record per entity using window functions.
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY updated_at DESC
) AS rn
FROM customer_updates
)
SELECT customer_id, updated_at, email, status
FROM ranked
WHERE rn = 1;
Practical Applications
-
Funnel Leak Analysis: Using LEFT JOIN with IS NULL (Anti-Join) to identify users who signed up but never converted.
-
Sessionization: Combining LAG and cumulative SUM to assign session IDs based on a time threshold (e.g., 30 minutes), avoiding off-by-one errors caused by NULL lags.
-
Hierarchy Mapping: Implementing Recursive CTEs for org charts while managing circular references to prevent infinite loops.
References:
Continue reading
Next article
Lessons in Data Normalization: Avoiding Over-Abstraction in Production Migrations
Related Content
Mastering SQL Data Retrieval: A Guide to Joins and Window Functions
Master SQL Joins and Window Functions to optimize data retrieval across multiple tables while maintaining granular row-level calculations for analytics.
Mastering SQL: A Deep Dive into Joins and Window Functions
Technical guide to 6 SQL join types and essential window functions like DENSE_RANK and ROW_NUMBER for advanced data analytics and relational database management.
Six SQL Patterns for Scalable Transaction Fraud Detection
Program Integrity Analyst Fixel Smith shares six essential SQL patterns to identify transaction fraud, including impossible travel signals exceeding 600 mph thresholds.