[Intermediate SQL] Conquer Subqueries! A Complete Guide to SELECTs within SELECTs
"I want to find only the employees who earn more than the company's average salary..."
"Who is the highest-paid employee in each department?"
As a web creator working with databases, have you ever wanted to use the result of one query as a condition for another? This kind of slightly complex data extraction can't be done in a single step. The elegant solution to this problem is what we'll be covering today: the subquery.
In short, a subquery is a "query within a query." It's a technique where you embed one SELECT statement inside another, like a set of Matryoshka dolls. Once you master this, you can consolidate data extraction that used to take multiple steps into a single query, which will significantly level up your SQL skills!
This article is packed with sample code that even beginners can copy and paste to get it working. We want you to experience that "it works!" moment to truly feel the power and convenience of subqueries.
Preparation: Our Practice Tables for Today
To fully experience the power of subqueries, this time we'll prepare two tables: an "employees" table and a "departments" table. Please copy the SQL statement below and run it in your own database environment.
Of course, it's okay if you don't have an environment ready! Later in the article, we've prepared an "SQL execution environment" where you can try out SQL using just your browser.
-- Drop tables if they exist
DROP TABLE IF EXISTS employees;
DROP TABLE IF EXISTS departments;
-- Create the departments table
CREATE TABLE departments (
id INT PRIMARY KEY,
department_name VARCHAR(50)
);
-- Insert department data
INSERT INTO departments (id, department_name) VALUES
(1, '営業部'),
(2, '開発部'),
(3, '人事部');
-- Create the employees table
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
department_id INT,
salary INT,
FOREIGN KEY (department_id) REFERENCES departments(id)
);
-- Insert employee data
INSERT INTO employees (id, name, department_id, salary) VALUES
(1, '山田 太郎', 1, 600000), -- Sales Dept.
(2, '鈴木 花子', 2, 550000), -- Development Dept.
(3, '佐藤 次郎', 1, 400000), -- Sales Dept.
(4, '高橋 三郎', 3, 650000), -- HR Dept.
(5, '田中 恵子', 2, 500000), -- Development Dept.
(6, '伊藤 四郎', 1, 700000), -- Sales Dept.
(7, '渡辺 久美', 2, 750000), -- Development Dept.
(8, '山本 五郎', 3, 380000); -- HR Dept.
All set? Then let's step into the powerful world of subqueries!
1. The Basics: Subqueries in the `WHERE` Clause
The most basic and intuitive way to use a subquery is as a condition in the WHERE clause. Think of it as dynamically generating the filtering criteria for the outer query (the main query) using the result of an inner query (the subquery).
It's a rule that subqueries must always be enclosed in () (parentheses). SQL first executes the subquery inside the parentheses and then uses its result as a condition for the main query.
Example 1: Extracting Employees with Salaries Higher Than the Average
Let's try to find the "employees earning more than the average salary of all employees," which we mentioned at the beginning. To achieve this, you need two steps:
- First, calculate the "average salary of all employees."
- Next, use that average salary as a condition to extract employee information.
With a subquery, you can do this in a single query.
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
How about that? First, (SELECT AVG(salary) FROM employees) is executed to calculate the average salary (547500 in this example), and that result is then used as the condition for the main query, like WHERE salary > 547500. You've brilliantly achieved the goal with just one query!
Example 2: Extracting Employees in Specific Departments (with the IN Operator)
A subquery can return not just a single value, but also a list of multiple values. By combining that list with the IN operator, you can achieve incredibly flexible conditions.
For example, let's say you want to extract all employees belonging to the 'Sales' or 'HR' departments. Let's use a subquery to get a list of department IDs from the department names and use that as the condition for the main query.
SELECT *
FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE department_name IN ('営業部', '人事部')
);
In this query, the subquery runs first and returns a list of IDs for '営業部' (Sales) and '人事部' (HR), which is (1, 3). The main query is then interpreted as WHERE department_id IN (1, 3) and extracts the data for the corresponding employees. This is very convenient because you don't need to modify the SQL even if the department names change.
2. Advanced Usage: Subqueries in `SELECT` and `FROM` Clauses
Subqueries aren't just for the WHERE clause. They are especially useful in the SELECT and FROM clauses, greatly expanding the possibilities of your data analysis.
Subqueries in the `SELECT` Clause (Scalar Subqueries)
When you use a subquery in the SELECT clause, you can retrieve a related value for each row of the main query and display it as a new column. However, this type of subquery must always return a single value (a scalar value) from one row and one column.
As an example, let's display each employee's information along with "the average salary of the department they belong to."
SELECT
name,
salary,
(SELECT department_name FROM departments WHERE id = e.department_id) AS department_name,
(SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS department_avg_salary
FROM
employees AS e;
This looks a bit complex, doesn't it? This is called a correlated subquery, where the inner subquery references data from each row of the outer query (in this case, e.department_id) to run. For every row the main query processes, the department name and average department salary are calculated using that row's department ID and added as columns to the result.
Subqueries in the `FROM` Clause (Inline Views)
When you use a subquery in the FROM clause, you can treat the result of that subquery as a "temporary, virtual table (an inline view)." This is useful for two-stage processing, where you first aggregate or process data with a subquery, and then run another query on that result.
As an example, let's "calculate the average salary for each department, and then display only those departments where the average salary is 550,000 or more."
SELECT
d.department_name,
dept_avg.avg_salary
FROM
(
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_avg
JOIN
departments AS d ON dept_avg.department_id = d.id
WHERE
dept_avg.avg_salary >= 550000;
In this query, the subquery in the FROM clause runs first, creating a temporary table called dept_avg that contains the average salary for each department_id. Then, the main query joins this dept_avg table with the departments table to display the names and average salaries for the departments that meet the condition. This is a very effective technique when you want to use a WHERE clause on the result of a GROUP BY operation.
Practice! Let's Run SQL in Your Browser
Thanks for waiting! Here is an environment where you can get hands-on and try out the subqueries you've learned so far.
Copy the entire code block below, save it as a file named something like subquery_practice.html, and open it in your browser. Try out the various subqueries from this article to get a feel for how they work!
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>SQL Subquery Playground</title>
<script src="https://cdn.jsdelivr.net/npm/alasql@4"></script>
<style>
body { font-family: sans-serif; padding: 2rem; background-color: #f9f9f9; }
.container { max-width: 800px; margin: auto; background: white; padding: 2rem; border-radius: 8px; box-shadow: 0 4px 8px rgba(0,0,0,0.1); }
h1 { color: #333; }
textarea { width: 100%; height: 180px; font-family: monospace; font-size: 16px; padding: 10px; border: 1px solid #ccc; border-radius: 4px; box-sizing: border-box; margin-bottom: 1rem; }
button { background-color: #007bff; color: white; padding: 10px 20px; border: none; border-radius: 4px; cursor: pointer; font-size: 16px; }
button:hover { background-color: #0056b3; }
#result-area { margin-top: 2rem; }
table { width: 100%; border-collapse: collapse; margin-top: 1rem; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
.error { color: red; font-weight: bold; }
</style>
</head>
<body>
<div class="container">
<h1>Let's Try Subqueries!</h1>
<p>Enter an SQL statement in the text area below and press the "Execute SQL" button.</p>
<textarea id="sql-input">-- Write your SQL in here
SELECT
d.department_name,
dept_avg.avg_salary
FROM
(
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_avg
JOIN
departments AS d ON dept_avg.department_id = d.id
WHERE
dept_avg.avg_salary >= 550000;</textarea>
<button onclick="executeSQL()">Execute SQL</button>
<div id="result-area"></div>
</div>
<script>
// Initialize database and prepare data
const db = new alasql.Database();
db.exec(`
CREATE TABLE departments (id INT PRIMARY KEY, department_name VARCHAR(50));
INSERT INTO departments VALUES (1, '営業部'), (2, '開発部'), (3, '人事部');
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50), department_id INT, salary INT);
INSERT INTO employees VALUES
(1, '山田 太郎', 1, 600000), (2, '鈴木 花子', 2, 550000), (3, '佐藤 次郎', 1, 400000),
(4, '高橋 三郎', 3, 650000), (5, '田中 恵子', 2, 500000), (6, '伊藤 四郎', 1, 700000),
(7, '渡辺 久美', 2, 750000), (8, '山本 五郎', 3, 380000);
`);
function executeSQL() {
const sql = document.getElementById('sql-input').value;
const resultArea = document.getElementById('result-area');
resultArea.innerHTML = '';
try {
const result = db.exec(sql);
if (result.length > 0) {
resultArea.appendChild(createTable(result));
} else {
resultArea.innerHTML = '<p>The query returned 0 rows.</p>';
}
} catch (e) {
resultArea.innerHTML = `<p class="error">Error: ${e.message}</p>`;
}
}
function createTable(data) {
const table = document.createElement('table');
const thead = table.createTHead();
const tbody = table.createTBody();
const headerRow = thead.insertRow();
for (const key in data[0]) {
const th = document.createElement('th');
th.textContent = key;
headerRow.appendChild(th);
}
data.forEach(rowData => {
const row = tbody.insertRow();
for (const key in rowData) {
const cell = row.insertCell();
cell.textContent = rowData[key];
}
});
return table;
}
// Initial display
executeSQL();
</script>
</body>
</html>
Points to Watch Out For and Related Techniques
Subqueries are extremely powerful, but there are a few things to keep in mind. Knowing these will help you write more sophisticated queries.
Performance Issues
Subqueries, especially the correlated subqueries we just introduced, can significantly degrade performance as the amount of data grows because they are executed for each row of the main query. In many cases, a subquery can be rewritten using a JOIN, which often runs faster.
Subquery Return Values
There are rules about the shape of the data that a subquery can return.
- When used with comparison operators like
=,>,<, the subquery must return a single value (a scalar value). - When used with the
INoperator, it can return a single column list. - When used in the
SELECTclause, it must also be a single value.
Breaking these rules will cause an error, so it's important to always be aware of what kind of result your subquery will return.
Related Technique: Rewriting with `JOIN`
From a performance perspective, it's recommended to rewrite many subqueries using a JOIN. Let's try rewriting the query "extracting employees in specific departments" using a JOIN.
SELECT
e.*
FROM
employees AS e
JOIN
departments AS d ON e.department_id = d.id
WHERE
d.department_name IN ('営業部', '人事部');
This query using JOIN returns the same result as the one using a subquery. Generally, a database's optimizer (the feature that optimizes queries) can process JOINs more efficiently, so you should consider using a JOIN, especially when dealing with large amounts of data.
Summary
Great work! Today, we took a deep dive into the powerful SQL technique of subqueries.
- Subqueries in the
WHEREclause: The most basic use case for dynamically generating query conditions. - Subqueries in the
SELECTclause: To retrieve a related value for each row and add it as a new column (correlated subquery). - Subqueries in the
FROMclause: To treat the result of a query as a temporary table for more complex queries. - Performance: Subqueries are convenient, but it's important to always consider rewriting them with
JOINs.
Mastering subqueries allows you to perform data retrieval that used to take multiple steps in a single go, dramatically improving your work efficiency. Please use the SQL execution environment we provided to try out various patterns and make subqueries your own!