·14 min read

What is SQL? A Complete Guide to Structured Query Language

Almost every application you use — from social media to banking to e-commerce — stores its data in a relational database and retrieves it using SQL. Understanding SQL is one of the most valuable skills in software development, data science, and IT.

What is SQL?

SQL (Structured Query Language) is a standardized programming language used to manage and manipulate relational databases. It allows you to create, read, update, and delete data — commonly known as CRUD operations — as well as define database schemas, control access, and manage transactions.

SQL is declarative, meaning you describe what data you want rather than howto retrieve it. The database engine's query optimizer figures out the most efficient way to execute your query. This differs from imperative languages like JavaScript or Python where you write step-by-step instructions.

Despite being over 50 years old, SQL remains the de facto standard for data manipulation. It's used by virtually every relational database system and is one of the most in-demand skills across software engineering, data analysis, data science, and business intelligence.

Pronunciation

SQL is pronounced both as "sequel" (the original IBM name) and as the initialism "S-Q-L." Both are widely accepted. The ANSI standard uses "S-Q-L," while "sequel" is more common in conversation.

A Brief History of SQL

SQL's story begins with Edgar F. Codd's groundbreaking 1970 paper "A Relational Model of Data for Large Shared Data Banks," which introduced the concept of organizing data into tables (relations) with rows and columns. IBM researchers Donald Chamberlin and Raymond Boyce then created SEQUEL(Structured English Query Language) in 1974 to interact with Codd's relational model.

YearMilestone
1970Edgar F. Codd publishes the relational model paper
1974IBM develops SEQUEL (later renamed SQL)
1979Oracle (then Relational Software Inc.) releases the first commercial SQL database
1986SQL becomes an ANSI standard (SQL-86)
1992SQL-92 — major revision with JOIN syntax, CASE expressions
1999SQL:1999 — adds regular expressions, triggers, CTEs, recursive queries
2003SQL:2003 — window functions, XML support, auto-generated values
2011SQL:2011 — temporal databases, period-for-time support
2016–2023SQL:2016 adds JSON support; SQL:2023 adds property graph queries

Relational Databases Explained

A relational database organizes data into tables (also called relations). Each table has:

  • Columns (attributes) — define the type of data (name, email, price)
  • Rows (records/tuples) — each row is one entry
  • Primary Key — a column (or combination) that uniquely identifies each row
  • Foreign Key — a column that references a primary key in another table, creating relationships
Table: users
┌────┬───────────┬─────────────────────┬────────────┐
│ id │ name      │ email               │ created_at │
├────┼───────────┼─────────────────────┼────────────┤
│  1 │ Alice     │ alice@example.com   │ 2025-01-15 │
│  2 │ Bob       │ bob@example.com     │ 2025-02-20 │
│  3 │ Charlie   │ charlie@example.com │ 2025-03-10 │
└────┴───────────┴─────────────────────┴────────────┘

Table: orders
┌────┬─────────┬─────────┬────────────┐
│ id │ user_id │ total   │ order_date │
├────┼─────────┼─────────┼────────────┤
│  1 │       1 │  49.99  │ 2025-03-01 │
│  2 │       1 │  29.50  │ 2025-03-15 │
│  3 │       2 │ 199.00  │ 2025-03-20 │
└────┴─────────┴─────────┴────────────┘

orders.user_id → users.id  (foreign key relationship)

This relational structure eliminates data duplication. Instead of repeating Alice's name and email in every order, you store her info once in the users table and reference her id in the orders table. This is called normalization.

SQL Syntax Basics

SQL statements are divided into several categories:

CategoryAbbreviationCommandsPurpose
Data Query LanguageDQLSELECTRetrieve data
Data Manipulation LanguageDMLINSERT, UPDATE, DELETEModify data
Data Definition LanguageDDLCREATE, ALTER, DROPDefine schema
Data Control LanguageDCLGRANT, REVOKEControl access
Transaction ControlTCLBEGIN, COMMIT, ROLLBACKManage transactions

SQL keywords are case-insensitive (SELECT, select, and Selectall work), but it's conventional to write them in uppercase for readability. Table and column names may be case-sensitive depending on the database system and operating system.

CRUD Operations

The four fundamental database operations — Create, Read, Update, Delete — map directly to SQL statements:

CREATE TABLE & INSERT (Create)

-- Define a table
CREATE TABLE products (
    id         INT PRIMARY KEY AUTO_INCREMENT,
    name       VARCHAR(100) NOT NULL,
    price      DECIMAL(10,2) NOT NULL,
    category   VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert a single row
INSERT INTO products (name, price, category)
VALUES ('Wireless Mouse', 29.99, 'Electronics');

-- Insert multiple rows
INSERT INTO products (name, price, category)
VALUES
    ('Keyboard', 49.99, 'Electronics'),
    ('Desk Lamp', 34.50, 'Office'),
    ('Notebook', 5.99, 'Stationery');

SELECT (Read)

-- Select all columns
SELECT * FROM products;

-- Select specific columns
SELECT name, price FROM products;

-- Select with alias
SELECT name AS product_name,
       price * 1.1 AS price_with_tax
FROM products;

UPDATE (Update)

-- Update a single record
UPDATE products
SET price = 24.99
WHERE id = 1;

-- Update multiple columns
UPDATE products
SET price = price * 0.9,
    category = 'Sale'
WHERE category = 'Electronics';

⚠ Always use WHERE with UPDATE and DELETE

Running UPDATE products SET price = 0 without a WHERE clause will set everyproduct's price to zero. Always include a WHERE condition to target specific rows, or use a transaction so you can roll back if something goes wrong.

DELETE (Delete)

-- Delete specific rows
DELETE FROM products
WHERE id = 3;

-- Delete all rows (use with caution!)
DELETE FROM products;

-- TRUNCATE is faster for deleting all rows
-- (resets auto-increment, cannot be rolled back in some DBs)
TRUNCATE TABLE products;

Filtering, Sorting & Aggregation

WHERE — Filtering Rows

-- Comparison operators
SELECT * FROM products WHERE price > 20;
SELECT * FROM products WHERE category = 'Electronics';
SELECT * FROM products WHERE price BETWEEN 10 AND 50;

-- Pattern matching
SELECT * FROM products WHERE name LIKE 'Wire%';   -- starts with "Wire"
SELECT * FROM products WHERE name LIKE '%mouse%';  -- contains "mouse"

-- NULL checks
SELECT * FROM products WHERE category IS NULL;
SELECT * FROM products WHERE category IS NOT NULL;

-- Logical operators
SELECT * FROM products
WHERE price < 30
  AND category = 'Electronics';

SELECT * FROM products
WHERE category IN ('Electronics', 'Office');

ORDER BY — Sorting

-- Sort ascending (default)
SELECT * FROM products ORDER BY price;

-- Sort descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple sort columns
SELECT * FROM products ORDER BY category, price DESC;

Aggregate Functions & GROUP BY

-- Common aggregate functions
SELECT
    COUNT(*) AS total_products,
    AVG(price) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive,
    SUM(price) AS total_value
FROM products;

-- GROUP BY — aggregate per category
SELECT
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price
FROM products
GROUP BY category;

-- HAVING — filter groups (like WHERE, but for aggregates)
SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) >= 2;

SQL Query Execution Order

SQL doesn't run in the order you write it. The logical execution order is:

  1. FROM & JOIN — identify the tables
  2. WHERE — filter rows
  3. GROUP BY — group rows
  4. HAVING — filter groups
  5. SELECT — choose columns & compute expressions
  6. DISTINCT — remove duplicates
  7. ORDER BY — sort results
  8. LIMIT / OFFSET — paginate

JOINs — Combining Tables

JOINs are one of SQL's most powerful features, allowing you to combine rows from two or more tables based on a related column.

JOIN TypeReturns
INNER JOINOnly rows that have matching values in both tables
LEFT JOINAll rows from the left table + matching rows from the right (NULL if no match)
RIGHT JOINAll rows from the right table + matching rows from the left (NULL if no match)
FULL OUTER JOINAll rows from both tables (NULL where no match on either side)
CROSS JOINCartesian product — every row from table A paired with every row from table B
-- INNER JOIN: users who have placed orders
SELECT
    u.name,
    u.email,
    o.id AS order_id,
    o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

-- LEFT JOIN: all users, including those with no orders
SELECT
    u.name,
    COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.name;

-- Multi-table JOIN
SELECT
    u.name,
    o.id AS order_id,
    p.name AS product_name,
    oi.quantity
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id;

Subqueries & CTEs

A subquery is a query nested inside another query. A CTE (Common Table Expression) is a named temporary result set that makes complex queries more readable.

-- Subquery in WHERE
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- Subquery in FROM
SELECT category, avg_price
FROM (
    SELECT category, AVG(price) AS avg_price
    FROM products
    GROUP BY category
) AS category_stats
WHERE avg_price > 30;

-- CTE (Common Table Expression) — cleaner alternative
WITH category_stats AS (
    SELECT category, AVG(price) AS avg_price
    FROM products
    GROUP BY category
)
SELECT category, avg_price
FROM category_stats
WHERE avg_price > 30;

-- Recursive CTE — e.g., org chart hierarchy
WITH RECURSIVE org_tree AS (
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.id, e.name, e.manager_id, ot.level + 1
    FROM employees e
    JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY level;

Indexes & Performance

An index is a data structure (typically a B-tree) that speeds up data retrieval at the cost of additional storage and slower writes. Without an index, the database must scan every row in a table (a full table scan) to find matches.

-- Create an index on a frequently-queried column
CREATE INDEX idx_products_category ON products(category);

-- Composite index (for queries filtering on multiple columns)
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Check query performance with EXPLAIN
EXPLAIN SELECT * FROM products WHERE category = 'Electronics';

Index Trade-offs

Indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE) because the index must be updated too. Don't index every column — focus on columns used in WHERE, JOIN, and ORDER BY clauses. Use EXPLAIN to verify that your queries actually use the index.

Transactions & ACID

A transaction is a sequence of SQL operations that are executed as a single unit of work. Transactions follow the ACID properties:

PropertyMeaning
AtomicityAll operations succeed or all are rolled back — no partial changes
ConsistencyThe database moves from one valid state to another, respecting all constraints
IsolationConcurrent transactions don't interfere with each other
DurabilityOnce committed, changes survive system crashes
-- Transfer $100 from account A to account B
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- If both succeed:
COMMIT;

-- If something goes wrong:
-- ROLLBACK;

Without transactions, a system crash between the two UPDATE statements could deduct money from one account without adding it to the other. Transactions guarantee that either both updates happen or neither does.

SQL Dialects Compared

While SQL is standardized by ANSI/ISO, every database engine adds its own extensions and has minor syntax differences. Here's how the major dialects compare:

FeatureMySQLPostgreSQLSQLiteSQL Server
Auto-incrementAUTO_INCREMENTSERIAL / GENERATEDAUTOINCREMENTIDENTITY
Limit rowsLIMIT nLIMIT nLIMIT nTOP n / OFFSET-FETCH
String concatCONCAT()|| operator|| operator+ operator
UpsertON DUPLICATE KEYON CONFLICTON CONFLICTMERGE
JSON supportJSON type + functionsJSONB (advanced)JSON functions (3.38+)JSON functions
LicenseGPL / CommercialPostgreSQL LicensePublic domainCommercial

These dialect differences are why formatting your SQL consistently matters — especially when working across multiple database systems. A good SQL formatter can standardize your queries regardless of dialect.

SQL Injection & Security

SQL injection is one of the most dangerous and common web security vulnerabilities. It occurs when user input is directly concatenated into SQL queries, allowing an attacker to execute arbitrary SQL code.

-- ❌ VULNERABLE: string concatenation
query = "SELECT * FROM users WHERE name = '" + userInput + "'";

-- If userInput = "'; DROP TABLE users; --"
-- The query becomes:
-- SELECT * FROM users WHERE name = ''; DROP TABLE users; --'
-- This deletes the entire users table!

-- ✅ SAFE: parameterized query (prepared statement)
query = "SELECT * FROM users WHERE name = ?";
db.execute(query, [userInput]);

-- The database treats userInput as a value, never as SQL code.

Always Use Parameterized Queries

Never concatenate user input into SQL strings. Use parameterized queries (also called prepared statements) in every language: ? in Java/PHP, $1 in PostgreSQL, @param in SQL Server, :name in Python. This is the single most important SQL security practice.

Additional security best practices include using the principle of least privilege (grant only the permissions each user needs), keeping your database software updated, encrypting connections with TLS, and auditing access logs regularly.

Best Practices

  • Format your SQL consistently — use uppercase keywords, consistent indentation, and one clause per line. This makes queries readable and maintainable. Use a SQL formatter tool to automate this.
  • Use meaningful names — name tables and columns descriptively (e.g., order_date instead of od or col3).
  • Avoid SELECT * — specify only the columns you need. This reduces network traffic, improves performance, and makes your intent clear.
  • Add indexes strategically — index columns used in WHERE, JOIN, and ORDER BY. Use EXPLAIN to verify effectiveness.
  • Use transactions for related changes — wrap multi-step operations in BEGIN/COMMIT to maintain data integrity.
  • Always use parameterized queries — never concatenate user input into SQL to prevent injection attacks.
  • Normalize your data — eliminate redundancy by splitting data into related tables. Denormalize only when you have measured performance needs.
  • Comment complex queries — use -- for single-line and /* */ for multi-line comments to explain the "why" behind complex logic.
  • Test with EXPLAIN — before deploying, check query plans to catch full table scans and missing indexes.
  • Back up regularly — use automated backups and test your restore process periodically.

SQL vs NoSQL

NoSQL databases (MongoDB, Redis, Cassandra, DynamoDB) emerged to handle use cases where relational databases are less ideal. Here's how they compare:

AspectSQL (Relational)NoSQL
Data modelTables with rows and columnsDocuments, key-value, graph, wide-column
SchemaFixed schema (defined upfront)Flexible / schema-less
ScalingPrimarily vertical (bigger server)Primarily horizontal (more servers)
ConsistencyStrong (ACID)Often eventual (BASE)
Query languageSQL (standardized)Varies per database
Best forStructured data, complex queries, transactionsUnstructured data, high throughput, rapid prototyping

In practice, many modern applications use both — SQL for transactional data (users, orders, inventory) and NoSQL for caching (Redis), full-text search (Elasticsearch), or document storage (MongoDB). The choice depends on your data structure, consistency requirements, and scale needs.

Format Your SQL Instantly

Use our free SQL Formatter tool to beautify, indent, and standardize your SQL queries. Supports multiple dialects including MySQL, PostgreSQL, SQLite, and SQL Server. Everything runs in your browser with no data uploaded.

Try SQL Formatter →

References

  1. Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, Vol. 13, No. 6, June 1970. https://dl.acm.org/doi/10.1145/362384.362685
  2. ISO/IEC 9075:2023. Information technology — Database languages — SQL. International Organization for Standardization.
  3. OWASP Foundation. SQL Injection. OWASP Top Ten. https://owasp.org/www-community/attacks/SQL_Injection
  4. PostgreSQL Global Development Group. PostgreSQL Documentation. https://www.postgresql.org/docs/
  5. MySQL. MySQL Reference Manual. https://dev.mysql.com/doc/