DATABASE

MySQL: High-Performance Relational Databases

A deep technical guide to MySQL performance at scale. Covers schema design, normalization, indexing strategies, query optimization with EXPLAIN, connection pooling, replication, backup, partitioning, stored procedures, transactions, isolation levels, TypeORM and Prisma ORM integration, JSON columns, window functions, and CTEs.

1. Schema Design and Normalization

Good schema design is the foundation of database performance. A poorly designed schema cannot be fixed by adding indexes or tuning queries. Start with proper normalization and denormalize only when you have measured evidence that it is needed.

  • 1NF: Every column holds atomic values. No repeating groups. Every row is unique (has a primary key)
  • 2NF: No partial dependencies. Every non-key column depends on the entire primary key, not just part of it
  • 3NF: No transitive dependencies. Non-key columns depend only on the primary key, not on other non-key columns
  • Data types matter: Use the smallest type that fits. INT (4 bytes) vs BIGINT (8 bytes). VARCHAR(255) vs VARCHAR(50). TIMESTAMP (4 bytes, UTC-converted) vs DATETIME (5 bytes in 5.6+, no timezone conversion)
  • UUIDs vs auto-increment: Auto-increment INTs are faster for indexing (sequential inserts). UUIDs cause random B-tree page splits. If you need UUIDs, use ordered UUIDs (uuid_to_bin with swap flag)
  • Soft deletes: Add deleted_at DATETIME NULL instead of hard deletes. Always include WHERE deleted_at IS NULL in queries. Consider partial indexes
-- Well-designed schema example
CREATE TABLE users (
  id          INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  email       VARCHAR(255) NOT NULL,
  name        VARCHAR(100) NOT NULL,
  status      ENUM('active','inactive','suspended') NOT NULL DEFAULT 'active',
  created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  deleted_at  DATETIME NULL,
  UNIQUE KEY uk_email (email),
  KEY idx_status_created (status, created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE bookings (
  id          INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  user_id     INT UNSIGNED NOT NULL,
  class_id    INT UNSIGNED NOT NULL,
  booked_at   DATETIME NOT NULL,
  status      ENUM('confirmed','cancelled','completed') NOT NULL DEFAULT 'confirmed',
  created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  KEY idx_user_status (user_id, status),
  KEY idx_class_booked (class_id, booked_at),
  CONSTRAINT fk_bookings_user FOREIGN KEY (user_id) REFERENCES users(id),
  CONSTRAINT fk_bookings_class FOREIGN KEY (class_id) REFERENCES classes(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

2. Indexing Strategies

B-tree Indexes

B-tree is MySQL's default index type. It maintains sorted order and supports range queries, ordering, and grouping. Understanding how B-tree indexes work is critical for writing fast queries.

  • Leftmost prefix rule: A composite index on (a, b, c) can be used for queries filtering on (a), (a, b), or (a, b, c) -- but NOT (b), (c), or (b, c)
  • Column order matters: Put the most selective column first (highest cardinality). Exception: if one column is always used in equality and another in range, put equality first
  • Covering indexes: If an index contains all columns needed by a query, MySQL reads only the index (no table lookup). EXPLAIN shows "Using index" in Extra column
  • Index selectivity: ratio of distinct values to total rows. A boolean column (selectivity ~0.5) is a poor index candidate alone. Combine with other columns
  • Index overhead: Every index slows down INSERT, UPDATE, DELETE operations. Write-heavy tables should have minimal indexes. Read-heavy tables benefit from more indexes
-- Composite index for a common query pattern
-- Query: SELECT * FROM bookings WHERE user_id = ? AND status = 'confirmed' ORDER BY booked_at DESC
-- Optimal index:
CREATE INDEX idx_user_status_booked ON bookings (user_id, status, booked_at);
-- The index covers: WHERE (equality on user_id, status) + ORDER BY (booked_at)
-- MySQL can satisfy the entire query from this single index scan

-- Covering index example
-- Query: SELECT id, email, name FROM users WHERE status = 'active' ORDER BY created_at DESC LIMIT 20
CREATE INDEX idx_status_created_covering ON users (status, created_at, id, email, name);
-- All selected columns are in the index = no table lookup needed

Hash Indexes

Hash indexes use a hash function for O(1) equality lookups. In InnoDB, the adaptive hash index (AHI) is built automatically on frequently accessed B-tree pages. MEMORY tables support explicit hash indexes.

  • Equality only: Hash indexes support = and IN() but not range queries (<, >, BETWEEN), ordering, or partial matches
  • Adaptive Hash Index (AHI): InnoDB automatically builds hash indexes on hot B-tree leaf pages. Monitor with SHOW ENGINE INNODB STATUS. Disable with innodb_adaptive_hash_index=OFF if contention is high
  • MEMORY engine: Explicit hash indexes with USING HASH. Useful for lookup tables that fit entirely in memory. Data is lost on restart
-- MEMORY table with hash index (useful for session/cache lookup)
CREATE TABLE session_cache (
  session_id VARCHAR(64) NOT NULL,
  user_id    INT UNSIGNED NOT NULL,
  data       TEXT,
  PRIMARY KEY (session_id) USING HASH
) ENGINE=MEMORY;

-- Check adaptive hash index status
SHOW ENGINE INNODB STATUS\G
-- Look for "Hash table size", "hash searches/s", "non-hash searches/s"

Partial and Prefix Indexes

Partial indexes (functional indexes in MySQL 8.0+) index only a subset of rows or a computed expression. Prefix indexes index only the first N characters of a string column, reducing index size.

  • Functional indexes (8.0+): Index an expression like ((CAST(metadata->>'$.type' AS CHAR(20)))). Useful for JSON fields or computed values
  • Prefix indexes: CREATE INDEX idx_email ON users (email(20)) indexes only the first 20 characters. Smaller index but cannot be used as a covering index and may have lower selectivity
  • Simulated partial indexes: MySQL does not natively support WHERE clause in CREATE INDEX. Simulate with a generated column: ALTER TABLE orders ADD is_pending TINYINT GENERATED ALWAYS AS (IF(status='pending',1,NULL)) STORED, ADD INDEX idx_pending (is_pending)
-- Prefix index for long string columns
CREATE INDEX idx_url_prefix ON pages (url(50));

-- Functional index on JSON field (MySQL 8.0+)
ALTER TABLE events ADD INDEX idx_event_type ((CAST(metadata->>'$.event_type' AS CHAR(30))));

-- Simulated partial index for soft deletes
ALTER TABLE users
  ADD active_flag TINYINT GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) STORED,
  ADD INDEX idx_active_users (active_flag, created_at);

Common Indexing Pitfalls

  • Functions on indexed columns: WHERE YEAR(created_at) = 2026 cannot use an index on created_at. Rewrite as: WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
  • Implicit type casting: WHERE phone = 12345 when phone is VARCHAR forces a full table scan. Always match types
  • LIKE with leading wildcard: WHERE name LIKE '%smith' cannot use an index. Only LIKE 'smith%' uses the index
  • OR conditions: WHERE a = 1 OR b = 2 often cannot use a composite index on (a, b). Use UNION instead, or create separate indexes
  • Too many indexes: Each index is a separate B-tree that must be maintained on every write. More than 5-6 indexes on a table is a code smell
Never add indexes blindly. Always measure with EXPLAIN before and after. An unused index wastes disk space and slows writes for zero benefit.

3. Query Optimization

EXPLAIN and Slow Query Log

EXPLAIN is the most important tool for MySQL performance. It shows how MySQL will execute a query: which indexes it uses, how many rows it examines, and the join strategy.

  • type column: const (PK lookup) > eq_ref (unique join) > ref (non-unique index) > range (index range) > index (full index scan) > ALL (full table scan). Anything worse than "range" is suspect
  • rows column: Estimated number of rows MySQL will examine. Compare with actual result set. Large ratio = inefficient query
  • Extra column: "Using index" = covering index (good). "Using filesort" = sorting without index (often bad). "Using temporary" = temp table created (bad for large sets)
  • EXPLAIN ANALYZE (8.0+): Actually executes the query and shows real timing per step. Best for comparing query plans
  • Slow query log: Enable with slow_query_log=1, set long_query_time=0.5 (500ms). Use pt-query-digest (Percona Toolkit) to analyze patterns
-- EXPLAIN example
EXPLAIN SELECT b.id, b.booked_at, u.name, u.email
FROM bookings b
JOIN users u ON u.id = b.user_id
WHERE b.class_id = 42
  AND b.status = 'confirmed'
  AND b.booked_at >= '2026-03-01'
ORDER BY b.booked_at DESC
LIMIT 20;

-- Ideal EXPLAIN output:
-- +----+-------+------+--------------------------+------+----------+-------------+
-- | id | table | type | key                      | rows | filtered | Extra       |
-- +----+-------+------+--------------------------+------+----------+-------------+
-- |  1 | b     | ref  | idx_class_status_booked  |   85 |   100.00 | Using where |
-- |  1 | u     | eq_ref | PRIMARY                |    1 |   100.00 | NULL        |
-- +----+-------+------+--------------------------+------+----------+-------------+

-- For this query, the optimal index is:
CREATE INDEX idx_class_status_booked ON bookings (class_id, status, booked_at);

Optimization Patterns

  • Pagination: Avoid OFFSET 10000 (MySQL reads and discards 10000 rows). Use keyset pagination: WHERE id > last_seen_id ORDER BY id LIMIT 20
  • COUNT optimization: SELECT COUNT(*) FROM large_table WHERE status = 'active' is slow without covering index. Add index on (status) or maintain counters in a summary table
  • JOIN optimization: Ensure joined columns are indexed. Smaller table should be the driving table. Avoid joining more than 3-4 tables in one query
  • Subquery vs JOIN: In MySQL 8.0+, the optimizer handles most subqueries well. But correlated subqueries (referencing outer query) can still be slow -- rewrite as JOINs
  • SELECT only needed columns: SELECT * fetches all columns, preventing covering index usage. List only the columns you need
  • Batch operations: Instead of 1000 individual INSERTs, use multi-row INSERT: INSERT INTO t VALUES (1,'a'), (2,'b'), .... 10-50x faster

4. Connection Pooling

Each MySQL connection consumes ~10MB of memory on the server. Without connection pooling, a microservice architecture with 20 services x 10 instances = 200 connections can exhaust the database. Connection pool exhaustion is one of the most common production incidents.

  • Pool size formula: Total connections = number_of_services x instances_per_service x pool_size_per_instance. Must be less than MySQL's max_connections (default: 151)
  • Connection lifecycle: Idle connections are kept alive in the pool. Set waitForConnections: true so requests queue instead of failing when pool is full
  • Connection timeout: Set connectTimeout (time to establish connection) and acquireTimeout (time to get connection from pool). Both should be short (5-10 seconds)
  • Idle timeout: Close connections idle longer than wait_timeout (MySQL default: 28800s / 8 hours). Reduce to 300-600s to reclaim resources
  • Connection validation: Enable SELECT 1 ping before checkout to detect stale connections. Small overhead but prevents "connection lost" errors
  • Read/write splitting: Route reads to replica endpoint, writes to primary endpoint. Doubles effective connection capacity
// Node.js connection pool configuration (mysql2)
import mysql from 'mysql2/promise';

const writePool = mysql.createPool({
  host: process.env.DB_WRITER_HOST,    // Aurora cluster endpoint
  port: 3306,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: 'myapp',
  connectionLimit: 10,                  // Max connections in this pool
  waitForConnections: true,             // Queue when pool is full
  queueLimit: 50,                       // Max queued requests (0 = unlimited)
  connectTimeout: 5000,                 // 5s to establish connection
  enableKeepAlive: true,
  keepAliveInitialDelay: 30000          // TCP keepalive every 30s
});

const readPool = mysql.createPool({
  host: process.env.DB_READER_HOST,    // Aurora reader endpoint
  port: 3306,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: 'myapp',
  connectionLimit: 20,                  // More connections for reads
  waitForConnections: true,
  connectTimeout: 5000,
  enableKeepAlive: true
});

// Monitor pool health
setInterval(() => {
  const stats = writePool.pool;
  console.log({
    active: stats._allConnections.length,
    idle: stats._freeConnections.length,
    queued: stats._connectionQueue.length
  });
}, 30000);
The most common pool exhaustion cause: long-running transactions holding connections. Always use transactions with a timeout. Never hold a connection while waiting for external API calls or user input.

5. Replication

Replication copies data from a primary (master) to one or more replicas (slaves). It enables read scaling, high availability, and geographic distribution.

  • Asynchronous replication: Default MySQL mode. Primary does not wait for replicas to acknowledge. Risk: data loss if primary fails before replica catches up
  • Semi-synchronous replication: Primary waits for at least one replica to acknowledge before committing. Reduces data loss risk. Adds ~1ms latency per commit
  • GTID replication (MySQL 5.6+): Global Transaction Identifiers make failover simpler. Each transaction has a unique ID. Replicas can auto-position without specifying binlog file/position
  • Row-based vs statement-based: Row-based (ROW) replicates actual data changes -- safer and more predictable. Statement-based (STATEMENT) replicates SQL statements -- less bandwidth but can produce different results on replicas
  • Replication lag: Monitor with SHOW SLAVE STATUS\G (Seconds_Behind_Master). Read-after-write consistency requires reading from primary if lag is unacceptable
  • Aurora replication: Shared storage layer eliminates replication lag issues. All replicas read from the same storage volume. Failover in <30 seconds without data loss
  • Group replication (MySQL 5.7.17+): Multi-primary or single-primary mode. All members replicate via a group communication protocol (Paxos-based). Automatic conflict detection and resolution. Foundation for MySQL InnoDB Cluster
  • Aurora Global Database: Cross-region replication with <1 second lag. Dedicated infrastructure for disaster recovery. Managed failover promotes a secondary region to primary in under 1 minute
Aurora's replication model is fundamentally different from traditional MySQL replication. Because Aurora replicas share the same storage volume, replication lag is typically under 100ms (vs seconds or minutes with traditional replication). For strict read-after-write consistency, read from the writer endpoint; for most read workloads, the lag is negligible.

6. Backup and Recovery

  • Logical backups (mysqldump): Exports SQL statements. Portable across versions. Slow for large databases (>10GB). Use --single-transaction for InnoDB (consistent snapshot without locking)
  • Physical backups (xtrabackup): Copies data files directly. Fast for large databases. Use Percona XtraBackup for hot backups without locking. Supports incremental backups
  • Aurora backups: Continuous backup to S3. Automated with 1-35 day retention. Point-in-time recovery (PITR) to any second within retention window. Backtrack for instant rewind without restore
  • Backup validation: Regularly restore backups to a test environment. An untested backup is not a backup. Automate this with a weekly cron job
  • Binary log retention: Keep binlogs for at least 7 days for point-in-time recovery. Set binlog_expire_logs_seconds=604800
  • Cross-region backups: For disaster recovery, copy snapshots to another region. Aurora supports cross-region snapshot copy and Global Database
# mysqldump with best practices
mysqldump \
  --single-transaction \      # Consistent snapshot (InnoDB only)
  --routines \                # Include stored procedures
  --triggers \                # Include triggers
  --events \                  # Include scheduled events
  --set-gtid-purged=OFF \     # Avoid GTID issues on import
  --max-allowed-packet=256M \
  --databases myapp \
  | gzip > myapp_$(date +%Y%m%d_%H%M%S).sql.gz

# Percona XtraBackup (full + incremental)
xtrabackup --backup --target-dir=/backup/full --user=backup --password=$PW
xtrabackup --backup --target-dir=/backup/inc1 --incremental-basedir=/backup/full

7. Partitioning

Partitioning splits a large table into smaller physical pieces while presenting it as a single logical table. It improves query performance on very large tables by enabling partition pruning -- MySQL only scans the relevant partitions.

  • RANGE partitioning: Rows are assigned to partitions based on a column value falling within a range. Ideal for time-series data: partition by month or year. Drop old partitions instantly instead of DELETE
  • LIST partitioning: Rows are assigned based on column values matching a list. Good for categorical data like country or status codes
  • HASH partitioning: Distributes rows evenly across N partitions using a modulus function. Useful when there is no natural range or list boundary
  • KEY partitioning: Like HASH but MySQL manages the hash function internally. Works with any column type including strings
  • Partition pruning: Queries that include the partition key in the WHERE clause only scan matching partitions. Always include the partition key in queries for maximum benefit
  • Limitations: Foreign keys are not supported on partitioned tables. All unique keys (including primary) must include the partition key. Maximum 8192 partitions per table
-- RANGE partitioning by month (ideal for booking/event data)
CREATE TABLE booking_logs (
  id          BIGINT UNSIGNED AUTO_INCREMENT,
  user_id     INT UNSIGNED NOT NULL,
  action      VARCHAR(50) NOT NULL,
  created_at  DATETIME NOT NULL,
  PRIMARY KEY (id, created_at),
  KEY idx_user (user_id, created_at)
) PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
  PARTITION p202601 VALUES LESS THAN (202602),
  PARTITION p202602 VALUES LESS THAN (202603),
  PARTITION p202603 VALUES LESS THAN (202604),
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Drop old partition (instant, no row-by-row DELETE)
ALTER TABLE booking_logs DROP PARTITION p202601;

-- Add new partition for next month
ALTER TABLE booking_logs REORGANIZE PARTITION p_future INTO (
  PARTITION p202604 VALUES LESS THAN (202605),
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Verify partition pruning with EXPLAIN
EXPLAIN SELECT * FROM booking_logs
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01';
-- Should show "partitions: p202603" (only one partition scanned)

8. Transactions and Isolation Levels

InnoDB supports full ACID transactions. The isolation level determines how concurrent transactions see each other's changes, with direct impact on performance and data consistency.

  • READ UNCOMMITTED: Transactions can see uncommitted changes from other transactions (dirty reads). Almost never appropriate in production. Highest concurrency, lowest consistency
  • READ COMMITTED: Each SELECT sees only committed data as of the moment the SELECT executes. No dirty reads, but non-repeatable reads are possible. Used by PostgreSQL default and Oracle
  • REPEATABLE READ (MySQL default): All SELECTs within a transaction see the snapshot taken at the first read. No dirty reads, no non-repeatable reads. Phantom reads are prevented by gap locks in InnoDB
  • SERIALIZABLE: Fully serialized execution. All SELECTs are implicitly SELECT ... LOCK IN SHARE MODE. Highest consistency but lowest concurrency. Frequent deadlocks under load
  • Gap locks (InnoDB): In REPEATABLE READ, InnoDB locks gaps between index records to prevent phantom reads. This can cause unexpected deadlocks on range queries. Switch to READ COMMITTED if gap lock contention is high
  • Deadlock handling: InnoDB automatically detects deadlocks and rolls back the smaller transaction. Always handle deadlock errors (error 1213) in application code with a retry loop (3 attempts with backoff)
-- Check current isolation level
SELECT @@transaction_isolation;

-- Set session isolation level
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Locking reads for inventory/booking systems
START TRANSACTION;
-- Lock the row to prevent concurrent overbooking
SELECT capacity, current_bookings
FROM classes WHERE id = 42 FOR UPDATE;

-- Application checks: if current_bookings < capacity, proceed
UPDATE classes SET current_bookings = current_bookings + 1 WHERE id = 42;
INSERT INTO bookings (user_id, class_id, status) VALUES (7, 42, 'confirmed');
COMMIT;

-- Deadlock retry pattern (application-side pseudocode)
-- for attempt in range(3):
--   try: execute_transaction(); break
--   catch DeadlockError: sleep(0.1 * 2^attempt); continue
Keep transactions as short as possible. Long transactions hold locks, increase replication lag, and bloat the undo log. Never do HTTP calls, file I/O, or user interaction inside a transaction.

9. Stored Procedures

Stored procedures execute SQL logic on the database server, reducing network round trips. They are useful for complex multi-step operations, data migrations, and enforcing business rules at the database level.

  • When to use: Multi-step operations that benefit from reduced round trips (batch processing, data cleanup). Enforcing invariants that must hold regardless of which application writes data
  • When to avoid: Business logic that changes frequently (harder to version/deploy than application code). Complex computations better suited for application-layer processing
  • Error handling: Use DECLARE HANDLER for exception handling. Always handle SQLEXCEPTION to roll back partial work. Use SIGNAL to raise custom errors
  • Security: DEFINER vs INVOKER security context. DEFINER runs with the privileges of the user who created it. Use INVOKER when callers should use their own privileges
  • Performance: Stored procedures are compiled and cached in the query cache. First execution may be slower; subsequent calls reuse the cached plan
DELIMITER //

-- Stored procedure for batch archiving old bookings
CREATE PROCEDURE archive_old_bookings(IN cutoff_date DATE, OUT archived_count INT)
SQL SECURITY DEFINER
BEGIN
  DECLARE EXIT HANDLER FOR SQLEXCEPTION
  BEGIN
    ROLLBACK;
    RESIGNAL;
  END;

  START TRANSACTION;

  -- Move old completed bookings to archive table
  INSERT INTO bookings_archive
  SELECT * FROM bookings
  WHERE status = 'completed' AND booked_at < cutoff_date;

  SET archived_count = ROW_COUNT();

  DELETE FROM bookings
  WHERE status = 'completed' AND booked_at < cutoff_date;

  COMMIT;
END //

DELIMITER ;

-- Call the procedure
CALL archive_old_bookings('2025-01-01', @count);
SELECT @count AS archived_rows;
In production, we kept stored procedures to a minimum -- mostly for data migrations and batch cleanup jobs. Core business logic lived in the TypeORM application layer for easier testing, version control, and deployment.

10. JSON Columns, Window Functions, and CTEs

JSON Columns

MySQL 5.7+ supports native JSON columns with validation, indexing, and a rich set of functions. JSON columns are ideal for semi-structured data that varies per row -- metadata, preferences, feature flags -- without requiring schema changes.

  • JSON vs TEXT: JSON columns are validated on INSERT/UPDATE (rejects malformed JSON). Stored in an optimized binary format for faster access. TEXT has no validation and requires parsing on every read
  • Accessing values: -> returns JSON, ->> returns unquoted text. Example: metadata->>'$.plan' returns the plan value as a string
  • Indexing JSON: Create a generated column and index it, or use functional indexes (8.0+): ALTER TABLE t ADD INDEX ((CAST(data->>'$.type' AS CHAR(30))))
  • Useful functions: JSON_EXTRACT, JSON_SET, JSON_REMOVE, JSON_ARRAYAGG, JSON_OBJECTAGG, JSON_TABLE (8.0+ -- converts JSON array to rows)
-- Table with JSON column for flexible metadata
CREATE TABLE users (
  id       INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  email    VARCHAR(255) NOT NULL UNIQUE,
  metadata JSON DEFAULT NULL,
  KEY idx_plan ((CAST(metadata->>'$.plan' AS CHAR(20))))
);

-- Insert with JSON
INSERT INTO users (email, metadata) VALUES
('user@example.com', '{"plan": "pro", "features": ["analytics", "exports"], "trial_ends": "2026-04-01"}');

-- Query JSON fields
SELECT email, metadata->>'$.plan' AS plan
FROM users WHERE metadata->>'$.plan' = 'pro';

-- Update nested JSON value
UPDATE users SET metadata = JSON_SET(metadata, '$.plan', 'enterprise')
WHERE id = 1;

-- JSON_TABLE: expand JSON array into rows (8.0+)
SELECT u.email, f.feature
FROM users u,
JSON_TABLE(u.metadata, '$.features[*]' COLUMNS (feature VARCHAR(50) PATH '$')) AS f;

Window Functions (MySQL 8.0+)

Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY. Essential for rankings, running totals, and comparative analytics.

  • ROW_NUMBER(): Assigns a unique sequential number. Useful for pagination, deduplication, and top-N per group queries
  • RANK() / DENSE_RANK(): Ranking with ties. RANK skips numbers after ties; DENSE_RANK does not
  • LAG() / LEAD(): Access values from previous or next rows. Useful for period-over-period comparisons
  • SUM() OVER / AVG() OVER: Running totals and moving averages without self-joins
-- Top 3 most active users per gym (by booking count)
SELECT * FROM (
  SELECT
    u.name,
    g.gym_name,
    COUNT(b.id) AS booking_count,
    ROW_NUMBER() OVER (PARTITION BY g.id ORDER BY COUNT(b.id) DESC) AS rn
  FROM bookings b
  JOIN users u ON u.id = b.user_id
  JOIN classes c ON c.id = b.class_id
  JOIN gyms g ON g.id = c.gym_id
  WHERE b.booked_at >= '2026-01-01'
  GROUP BY u.id, g.id
) ranked WHERE rn <= 3;

-- Month-over-month booking growth using LAG
SELECT
  DATE_FORMAT(booked_at, '%Y-%m') AS month,
  COUNT(*) AS bookings,
  LAG(COUNT(*)) OVER (ORDER BY DATE_FORMAT(booked_at, '%Y-%m')) AS prev_month,
  ROUND((COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY DATE_FORMAT(booked_at, '%Y-%m')))
    / LAG(COUNT(*)) OVER (ORDER BY DATE_FORMAT(booked_at, '%Y-%m')) * 100, 1) AS growth_pct
FROM bookings
GROUP BY DATE_FORMAT(booked_at, '%Y-%m')
ORDER BY month;

Common Table Expressions (CTEs)

CTEs (WITH clause, MySQL 8.0+) define temporary named result sets that exist for the duration of a single query. They improve readability, enable recursive queries, and can replace complex subqueries and temporary tables.

  • Non-recursive CTEs: Named subqueries that make complex queries readable. MySQL may materialize or merge them depending on the optimizer
  • Recursive CTEs: Self-referencing CTEs for hierarchical data (org charts, category trees, graph traversal). Use WITH RECURSIVE
  • Performance: In MySQL 8.0, CTEs referenced multiple times may be materialized (computed once). Use EXPLAIN to verify. For single-use CTEs, the optimizer typically merges them into the main query
-- CTE for readable multi-step reporting
WITH monthly_bookings AS (
  SELECT
    DATE_FORMAT(booked_at, '%Y-%m') AS month,
    class_id,
    COUNT(*) AS total
  FROM bookings
  WHERE status = 'completed'
  GROUP BY month, class_id
),
class_rankings AS (
  SELECT
    month, class_id, total,
    RANK() OVER (PARTITION BY month ORDER BY total DESC) AS rnk
  FROM monthly_bookings
)
SELECT month, class_id, total
FROM class_rankings WHERE rnk <= 5;

-- Recursive CTE: category tree traversal
WITH RECURSIVE category_tree AS (
  SELECT id, name, parent_id, 0 AS depth
  FROM categories WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.name, c.parent_id, ct.depth + 1
  FROM categories c
  JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT CONCAT(REPEAT('  ', depth), name) AS tree_view
FROM category_tree ORDER BY depth, name;

11. TypeORM Integration with Node.js

Configuration and Entity Design

TypeORM is the most popular TypeScript ORM for Node.js. It supports Active Record and Data Mapper patterns, migrations, and query builder. Proper configuration is essential for production performance.

// TypeORM data source configuration for Aurora MySQL
import { DataSource } from 'typeorm';

export const AppDataSource = new DataSource({
  type: 'mysql',
  replication: {
    master: {
      host: process.env.DB_WRITER_HOST,
      port: 3306,
      username: process.env.DB_USER,
      password: process.env.DB_PASSWORD,
      database: 'myapp'
    },
    slaves: [{
      host: process.env.DB_READER_HOST,
      port: 3306,
      username: process.env.DB_USER,
      password: process.env.DB_PASSWORD,
      database: 'myapp'
    }]
  },
  entities: ['dist/entities/**/*.js'],
  migrations: ['dist/migrations/**/*.js'],
  logging: ['error', 'warn', 'migration'],  // Never log 'query' in production
  maxQueryExecutionTime: 1000,               // Log queries taking >1s
  extra: {
    connectionLimit: 10,
    waitForConnections: true,
    connectTimeout: 5000
  }
});

// Entity example with proper decorators
@Entity('bookings')
@Index('idx_user_status', ['userId', 'status'])
export class Booking {
  @PrimaryGeneratedColumn('increment')
  id: number;

  @Column({ type: 'int', unsigned: true })
  userId: number;

  @Column({ type: 'enum', enum: BookingStatus, default: BookingStatus.CONFIRMED })
  status: BookingStatus;

  @Column({ type: 'datetime' })
  bookedAt: Date;

  @CreateDateColumn()
  createdAt: Date;

  @ManyToOne(() => User, user => user.bookings)
  @JoinColumn({ name: 'user_id' })
  user: User;
}

Production Patterns

  • Query builder over find: Use QueryBuilder for complex queries. The find() API generates inefficient SQL for joins and conditions
  • Transactions: Always use queryRunner for transactions. Release the runner in a finally block to prevent connection leaks
  • Migrations: Never use synchronize: true in production. Generate migrations with typeorm migration:generate and review SQL before running
  • N+1 queries: Use leftJoinAndSelect or relations option to eager-load. Never access lazy relations in a loop
  • Raw queries: For complex reporting or bulk operations, drop down to raw SQL via query() method. TypeORM's abstractions are not always optimal
// Transaction pattern with proper cleanup
async function createBooking(userId: number, classId: number): Promise<Booking> {
  const queryRunner = AppDataSource.createQueryRunner();
  await queryRunner.connect();
  await queryRunner.startTransaction();

  try {
    // Check class capacity (SELECT ... FOR UPDATE to lock the row)
    const classEntity = await queryRunner.manager
      .createQueryBuilder(GymClass, 'c')
      .setLock('pessimistic_write')
      .where('c.id = :classId', { classId })
      .getOne();

    if (!classEntity || classEntity.currentBookings >= classEntity.capacity) {
      throw new Error('Class is full');
    }

    // Create booking
    const booking = queryRunner.manager.create(Booking, {
      userId, classId: classEntity.id, bookedAt: new Date(), status: BookingStatus.CONFIRMED
    });
    await queryRunner.manager.save(booking);

    // Increment counter
    await queryRunner.manager.increment(GymClass, { id: classId }, 'currentBookings', 1);

    await queryRunner.commitTransaction();
    return booking;
  } catch (err) {
    await queryRunner.rollbackTransaction();
    throw err;
  } finally {
    await queryRunner.release(); // ALWAYS release the connection
  }
}
The most common TypeORM production bug: forgetting to release queryRunner in error paths, causing connection pool exhaustion. Always use try/catch/finally with queryRunner.release() in the finally block.

12. Prisma ORM

Prisma is a modern TypeScript ORM with a declarative schema, auto-generated type-safe client, and built-in migration system. It takes a schema-first approach that differs fundamentally from TypeORM's decorator-based model.

  • Prisma schema: Single schema.prisma file defines models, relations, and datasource. The generated client provides full type safety -- impossible to query a column that does not exist
  • Prisma Migrate: Schema-driven migrations. Change the schema file, run prisma migrate dev, and Prisma generates the SQL migration. Review before applying to production with prisma migrate deploy
  • Connection pooling: Prisma uses its own connection pool. Configure connection_limit in the datasource URL. For serverless environments, use Prisma Accelerate or PgBouncer-style external pooling
  • Transactions: Interactive transactions with prisma.$transaction(async (tx) => { ... }). Supports timeout and isolation level configuration. Automatic rollback on error
  • Raw queries: prisma.$queryRaw for typed raw SQL and prisma.$executeRaw for mutations. Use tagged template literals for safe parameterization
  • TypeORM vs Prisma: TypeORM offers more flexibility (query builder, Active Record). Prisma offers better type safety and developer experience. Choose Prisma for new projects; TypeORM when you need complex dynamic queries
// schema.prisma
datasource db {
  provider = "mysql"
  url      = env("DATABASE_URL")  // mysql://user:pass@host:3306/myapp?connection_limit=10
}

generator client {
  provider = "prisma-client-js"
}

model User {
  id        Int       @id @default(autoincrement())
  email     String    @unique @db.VarChar(255)
  name      String    @db.VarChar(100)
  status    UserStatus @default(ACTIVE)
  bookings  Booking[]
  createdAt DateTime  @default(now()) @map("created_at")
  deletedAt DateTime? @map("deleted_at")

  @@index([status, createdAt])
  @@map("users")
}

model Booking {
  id       Int           @id @default(autoincrement())
  userId   Int           @map("user_id")
  classId  Int           @map("class_id")
  bookedAt DateTime      @map("booked_at")
  status   BookingStatus @default(CONFIRMED)
  user     User          @relation(fields: [userId], references: [id])

  @@index([userId, status])
  @@index([classId, bookedAt])
  @@map("bookings")
}

enum UserStatus { ACTIVE INACTIVE SUSPENDED }
enum BookingStatus { CONFIRMED CANCELLED COMPLETED }
// Prisma Client usage with type-safe queries
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();

// Type-safe query -- TypeScript errors if column doesn't exist
const activeUsers = await prisma.user.findMany({
  where: { status: 'ACTIVE', deletedAt: null },
  include: { bookings: { where: { status: 'CONFIRMED' } } },
  orderBy: { createdAt: 'desc' },
  take: 20
});

// Interactive transaction with timeout
const booking = await prisma.$transaction(async (tx) => {
  const gymClass = await tx.gymClass.findUniqueOrThrow({
    where: { id: classId }
  });
  if (gymClass.currentBookings >= gymClass.capacity) {
    throw new Error('Class is full');
  }
  const newBooking = await tx.booking.create({
    data: { userId, classId, bookedAt: new Date(), status: 'CONFIRMED' }
  });
  await tx.gymClass.update({
    where: { id: classId },
    data: { currentBookings: { increment: 1 } }
  });
  return newBooking;
}, { timeout: 5000, isolationLevel: 'Serializable' });

13. Real-World Experience

In production, I managed Aurora RDS MySQL as the primary database for all services, handling millions of bookings, payments, and user records across multiple Latin American countries.

  • Aurora RDS MySQL: Single Aurora cluster with writer + reader endpoints. Serverless v2 for cost-efficient scaling during off-peak hours. 14-day automated backup retention with point-in-time recovery
  • Connection pool debugging: Identified and fixed pool exhaustion caused by unreleased TypeORM queryRunners in error paths. Reduced connection-related errors by 95% by adding proper finally blocks and pool monitoring
  • Query optimization: Used Performance Insights and slow query log to identify top 20 slowest queries. Added composite indexes and rewrote N+1 patterns, reducing P95 API latency from 800ms to 120ms
  • TypeORM migrations: Established migration discipline: generated migrations, code review of SQL, staged rollout (dev > staging > production). Zero data-loss incidents from schema changes
  • Read/write splitting: Configured TypeORM replication with writer and reader endpoints. Moved all report queries to reader endpoint, reducing primary CPU by ~30%
Book free 1-hour consult All Guides Home

14. MySQL Version Strategy and Modern Features

MySQL Version Strategy

MySQL now follows a dual-track release model. MySQL 8.4 is the Long-Term Support (LTS) release with extended support and stability guarantees. MySQL 9.x is the Innovation track with a 3-month release cadence, introducing new features that eventually land in the next LTS. MySQL 8.0 reaches End of Life in April 2026 — plan your migration now.

  • MySQL 8.4 LTS: Recommended for production workloads. Receives bug fixes and security patches for the full support lifecycle. Upgrade path from 8.0 is straightforward with mysql_upgrade or in-place upgrade
  • MySQL 9.x Innovation: 3-month release cadence. New features like VECTOR type, JavaScript stored programs, and enhanced optimizer. Upgrade required to each new Innovation release. Best for development, testing, and non-critical workloads
  • MySQL 8.0 EOL (April 2026): No more bug fixes or security patches after April 2026. Remaining on 8.0 after EOL exposes your database to unpatched vulnerabilities. Migrate to 8.4 LTS for stability or 9.x for latest features
MySQL 8.0 End of Life: April 2026. If you are still running MySQL 8.0 in production, begin planning your migration to 8.4 LTS immediately. Test with mysqlcheck --check-upgrade, review deprecated features you depend on, and validate your ORM compatibility. Aurora MySQL 8.0-compatible will also require migration to Aurora MySQL 8.4-compatible.

VECTOR Data Type (MySQL 9.0+)

MySQL 9.0 introduces a native VECTOR data type for storing AI/ML embeddings. Vectors are stored as arrays of 4-byte floats, optimized for similarity search operations. This is essential for building semantic search, recommendation engines, and Retrieval-Augmented Generation (RAG) pipelines directly in MySQL without requiring a separate vector database.

-- VECTOR data type (MySQL 9.0+)
CREATE TABLE documents (
  id          INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  title       VARCHAR(500) NOT NULL,
  content     TEXT NOT NULL,
  embedding   VECTOR(1536) NOT NULL,  -- OpenAI ada-002 dimension
  created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_created (created_at)
);

-- Insert with vector literal
INSERT INTO documents (title, content, embedding)
VALUES ('Node.js Guide', 'Event loop internals...', TO_VECTOR('[0.0123, -0.0456, ...]'));

-- Cosine similarity search for RAG
SELECT id, title,
       1 - DISTANCE(embedding, TO_VECTOR(?), 'COSINE') AS similarity
FROM documents
ORDER BY DISTANCE(embedding, TO_VECTOR(?), 'COSINE')
LIMIT 10;

JavaScript Stored Programs (MySQL 9.0+, Enterprise)

MySQL 9.0 Enterprise introduces the ability to write stored procedures and functions in JavaScript using a GraalVM-based ES2023 runtime. This allows developers to use familiar JavaScript syntax, modern language features, and npm-style module patterns for server-side database logic, while maintaining the performance benefits of executing code close to the data.

-- JavaScript stored function (MySQL 9.0+ Enterprise)
CREATE FUNCTION calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE JAVASCRIPT AS $$
  if (height_m <= 0) throw new Error('Height must be positive');
  const bmi = weight_kg / (height_m * height_m);
  return Math.round(bmi * 10) / 10;
$$;

-- Use like any SQL function
SELECT name, calculate_bmi(weight_kg, height_m) AS bmi
FROM users
WHERE calculate_bmi(weight_kg, height_m) > 25;

MySQL 9.7.0 LTS and HeatWave AI (April 2026)

MySQL 9.7.0 LTS was released on April 21, 2026, making it the new Long-Term Support release on the 9.x track. Key additions include time-based audit log rotation (rotate logs by interval rather than file size), the replica_allow_higher_version_source variable that enables replicas to accept replication from higher-version sources during rolling upgrades, and expanded window function optimizations. MySQL 9.6.0 (January 2026) introduced the modular Audit Log component that splits the monolithic audit log into smaller, more manageable pieces. MySQL 8.0 has now reached End of Life as of April 2026 — all 8.0 users must migrate to 8.4 LTS or the 9.x track.

HeatWave AI integration in MySQL 9.4+ adds Natural Language to Machine Learning (NL2ML), enabling an intuitive natural-language interface to ML workflows. Enhanced AutoML memory management automatically checks usage and releases resources based on severity levels. AutoML now also records hyperparameters of trained models, improving transparency and reproducibility. These features are available in HeatWave on OCI, including the Always Free tier.

More Guides