Guide

SQL Formatting, Explanation & Complexity Scoring: A Complete Guide

Bill Crawford — Developer Guide — 2026 · Last updated March 2, 2026

Most SQL tools do one thing. A formatter cleans up whitespace. An explainer tells you what a query does. A complexity meter scores how tangled it is. But these three functions are most useful when they work together — formatting makes the structure visible, explanation makes the logic clear, and complexity scoring tells you whether the structure needs simplifying. This guide covers how each function works, how they interact, and when to use them.

Try it now: Paste any SQL into the combined formatter, explainer, and complexity meter — one tool, three outputs.

Open SQL Formatter + Explainer →

Table of Contents

Why Formatting, Explanation, and Complexity Belong Together
How SQL Formatting Works Under the Hood
How Query Explanation Works: AST vs Heuristic
Understanding Complexity Scores
Dialect Differences That Matter
Real-World Use Cases
When to Refactor Based on Complexity

Why Formatting, Explanation, and Complexity Belong Together

Consider a query you've inherited from a colleague who left the company. It's a single line of 400 characters. You don't know which tables it reads from, what it filters on, or why it takes 12 seconds to run. Running it through a formatter gives you readable structure. Running it through an explainer gives you a sentence that says what the query does and a fact list of every table, join, and filter. Running it through a complexity meter tells you whether the 12-second runtime is expected for a query this complex, or a sign of something wrong.

Each tool answers a different question. Formatting answers "what does the structure look like?" Explanation answers "what does this do in plain language?" Complexity answers "how tangled is this, and should I simplify it?" Used together, they give you full situational awareness of any SQL you encounter.

How SQL Formatting Works Under the Hood

A SQL formatter takes raw SQL text and applies consistent whitespace rules. The core operations are: placing each major clause (SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN variants) on its own line, indenting subordinate elements like column lists, ON conditions, and AND/OR predicates, and optionally converting keywords to uppercase for visual contrast with identifiers.

The formatting engine in our tool uses the sql-formatter library, which tokenizes the SQL, identifies clause boundaries, and rebuilds the output with consistent indentation. It doesn't change any logic — SQL is whitespace-insensitive, so the formatted output executes identically to the original. The user can configure keyword casing (uppercase or preserve), indent size (2 or 4 spaces), and whether to add blank lines between major clauses.

What formatting reveals

Formatting alone can expose problems that are invisible in compressed SQL. A missing JOIN condition becomes obvious when every join has its own line and ON clause. A WHERE clause with 15 conditions arranged on individual lines reveals which filters are doing the heaviest lifting. A subquery nested inside a subquery inside a third subquery becomes visually alarming when indented properly — which is exactly the point.

How Query Explanation Works: AST vs Heuristic

The explanation engine has two modes. The primary mode attempts to parse the SQL into an Abstract Syntax Tree (AST) using node-sql-parser. An AST is a structured representation of every clause, table reference, column, join condition, and expression in the query. From an AST, the tool can extract precise facts: the exact statement type, all tables with their aliases, every join type and condition, all WHERE predicates, GROUP BY fields, aggregate functions, ORDER BY expressions, LIMIT values, subquery count, CTE names, and window functions.

If AST parsing fails — which happens with non-standard syntax, vendor-specific extensions, or incomplete fragments — the tool falls back to regex-based heuristic extraction. This mode scans the raw SQL text for patterns: tables appearing after FROM and JOIN keywords, the number of AND/OR operators in the WHERE clause, the presence of GROUP BY, ORDER BY, and LIMIT, the count of JOIN keywords, and the use of aggregate and window function names. The heuristic mode is less precise but still produces useful results for most real-world queries.

The plain English summary

From the extracted facts, the tool constructs a plain English sentence describing the query's purpose. A SELECT query with two joins, three filters, and a LIMIT becomes: "This query reads data from customers, joining 2 tables, applying 3 filter conditions, returning the top 50 rows." The language is intentionally friendly and non-technical — it's designed for the developer who opens a 200-line query at 9 AM and needs to understand what it does before their first coffee.

Understanding Complexity Scores

The complexity meter assigns a score from 0 to 100 based on a weighted sum of structural elements. The scoring model is deliberately simple and transparent — every point is traceable to a specific element of the query.

Element	Points	Cap
Tables referenced	+5 each	20
Joins	+10 each	30
Subqueries	+12 each	24
GROUP BY clause	+10	10
Window functions	+15 each	—
Filter conditions	+2 each	10
CTEs	+8 each	—

The caps prevent common elements from dominating the score. A query with 10 tables doesn't get 50 points for tables alone — it caps at 20. This keeps the score proportional and ensures that the presence of structurally complex elements like subqueries and window functions has meaningful weight.

Interpreting the score

Scores below 20 are Simple — single-table lookups, basic inserts, straightforward filters. Scores from 20 to 39 are Moderate — multi-table queries with joins and some aggregation. Scores from 40 to 59 are Complex — reporting queries with multiple joins, GROUP BY, HAVING, and sorting. Scores of 60 and above are Very Complex — ETL transformations, analytics queries with CTEs, window functions, and nested subqueries.

A high score is not inherently bad. A data warehouse transformation that scores 75 may be doing exactly what it needs to do. A simple customer lookup that scores 65 is almost certainly overengineered. Context matters.

Dialect Differences That Matter

SQL dialects share a common core but diverge in syntax for limiting results, quoting identifiers, string functions, and procedural extensions. The tool supports five dialects: Generic (ANSI), PostgreSQL, MySQL, SQL Server (T-SQL), and SQLite.

Limiting results is the most visible divergence. PostgreSQL, MySQL, and SQLite use LIMIT n. SQL Server uses TOP n or FETCH FIRST n ROWS ONLY. The explainer and complexity meter recognize all three patterns and normalize them to a single "limit" fact.

Quoting identifiers differs too. PostgreSQL uses double quotes ("column_name"), MySQL uses backticks (`column_name`), and SQL Server uses brackets ([column_name]). The parser handles all three when the correct dialect is selected.

Selecting the right dialect from the dropdown gives the parser and formatter the best chance of handling your SQL correctly. If you're unsure, use Generic — it covers standard ANSI SQL and degrades gracefully on vendor-specific syntax.

Real-World Use Cases

Onboarding onto an unfamiliar codebase

When joining a team with an existing SQL codebase, paste the key queries into the tool. The formatted output shows the structure at a glance. The explanation tells you what each query does. The complexity score helps you identify which queries are the most critical to understand first — they're usually the highest-scoring ones.

Code review

During review, paste the PR's SQL changes into the tool. If the complexity score jumped from 30 to 65 in a single commit, that's worth discussing. The breakdown shows exactly which new elements drove the increase — did someone add three subqueries where a CTE would be cleaner?

Documentation

The plain English summary and fact list can be pasted directly into query documentation, wiki pages, or code comments. This is faster than writing the explanation manually and more likely to stay up to date if the query changes.

Learning SQL

For students or analysts learning SQL, the tool provides immediate feedback on what a query does and how complex it is. Writing a query and then reading the plain English explanation back helps build intuition about how SQL clauses combine.

When to Refactor Based on Complexity

The complexity score is a starting point for refactoring decisions, not a rule. Some guidelines that experienced teams converge on:

If a simple lookup scores above 30, you're probably over-joining or filtering on columns that could be eliminated. Review whether all tables are necessary.

If subqueries account for most of the score, consider rewriting them as CTEs. CTEs are easier to read, easier to debug (you can run them independently), and in some databases they're also easier for the optimizer to handle.

If a single query scores above 70, consider whether it should be broken into multiple queries or views. A query that does everything — joins, aggregates, window functions, subqueries, and sorting — in one statement may be correct, but it's going to be hard for the next person to modify safely.

If the window function count is high, make sure each one is necessary. Window functions are powerful but easy to over-apply, and each one adds a sorting operation to the execution plan.

Try the combined tool: Format, explain, and score any SQL query in one click.