How to Generate Fake Relational Tables: A Step-by-Step Tutorial
This tutorial walks through building three related fake database tables from scratch using the Relational Data Generator — with screenshots of every single step. By the end you'll have a complete, importable dataset with a customers table, an address_book table, and an invoices table, all wired together with real foreign keys.
We'll also cover the theory behind relational tables so you understand why the setup works the way it does — not just how to click through it.
- What are relational tables and why do they matter?
- The schema we're building: customers, address_book, invoices
- Step 1 — Open the tool and choose Start from Scratch
- Step 2 — Build the customers table
- Step 3 — Build the address_book table with a Foreign Key
- Step 4 — Build the invoices table with a Foreign Key
- Step 5 — Set record counts
- Step 6 — Choose SQL format and dialect
- Step 7 — Generate and review the output
- Understanding the generated SQL
- Exporting as CSV and JSON instead
What Are Relational Tables and Why Do They Matter?
A relational database stores data across multiple tables, and those tables are connected to each other. Rather than cramming every piece of information about a customer into one giant row, a relational database splits the data into logical groups and links them together.
The connection mechanism is the foreign key. A foreign key is a column in one table that stores the ID value of a row in another table. It creates a reference — a pointer — that says "this row belongs to that row over there."
Here's a simple mental model. Imagine three filing cabinets:
- The Customers cabinet has one card per customer, each with a unique ID number
- The Address Book cabinet has address cards — but instead of writing the customer's name on each card, you write their ID number from the Customers cabinet
- The Invoices cabinet has invoice slips — and again, each slip has the customer's ID number written on it, not their full details
When you want to know everything about customer #4821, you look up ID 4821 in the Customers cabinet, then search the other cabinets for anything with that same ID. Every cabinet stays lean — no duplicated names, addresses repeated across dozens of invoices — and the whole system stays consistent by design.
What is referential integrity?
Referential integrity is the rule that says: every foreign key value must exist in the table it references. You cannot have an invoice for customer #9999 if there is no row with ID 9999 in the customers table. This sounds obvious but it's the most common thing that breaks when you generate fake data naively — you end up with invoices referencing customer IDs that don't exist anywhere.
The Relational Data Generator handles this automatically. It generates the customers table first, captures all the real customer IDs it created, and then uses only those real IDs when building the address_book and invoices tables. The result is a dataset that will import cleanly into any database with FK constraints enabled.
The three-table schema we're building. Both address_book and invoices reference customers via a foreign key on customer_id.
The Schema We're Building
Our three tables and what each one stores:
| Table | What it stores | Primary Key | Foreign Key |
|---|---|---|---|
| customers | One row per customer: name, email, phone, join date | customer_id (UUID) | None — this is the root table |
| address_book | One or more addresses per customer | address_id (UUID) | customer_id → customers.customer_id |
| invoices | One or more invoices per customer | invoice_id (UUID) | customer_id → customers.customer_id |
Notice that both address_book and invoices reference customers — customers is the parent table and the other two are child tables. A customer can have multiple addresses and multiple invoices, but every address and every invoice must belong to a real customer.
The record ratio we'll use: 100 customers → 200 addresses → 400 invoices. This means roughly 2 addresses per customer and 4 invoices per customer on average — a realistic ratio for a small business billing system.
Follow along as you read. Open the tool in a new tab and build as you go.
Open Relational Data Generator ↗Navigate to the Relational Data Generator. At the top of the page you'll see the quick-start template buttons. Since we're building a custom schema that doesn't match any of the pre-built templates, click ✦ Scratch (the rightmost button).
After clicking Scratch, the right panel will show an empty first table called "table_1" with a prompt to add columns from the field picker on the left. The tool is now ready for us to start building.
The customers table is the root of our schema — it has no foreign keys, only its own primary key. Everything else will reference it. Let's set it up.
2a — Name the table
The active table panel shows a text input with the default name "table_1". Click it and type customers.
2b — Add the UUID primary key
In the field picker panel on the left, scroll to the Identifiers section and click UUID / GUID. A column appears in the table panel. Click the column name to rename it from "UUID / GUID" to customer_id.
💡 Why UUID instead of a number? UUID primary keys never conflict if you run the seed script multiple times or merge datasets from different runs. Auto-increment integers would collide. For test data, UUID is almost always the better choice.
2c — Add the remaining customer fields
Now add these fields from the field picker. For each one, click the column name after it appears and rename it:
| Click in field picker | Rename to | Why |
|---|---|---|
| UUID / GUID | customer_id | Primary key — every customer gets a unique ID |
| First Name | first_name | Customer's first name |
| Last Name | last_name | Customer's last name |
| Contact email | ||
| Phone | phone | Contact phone number |
| Timestamp | created_at | When the customer record was created |
⚙ Column renaming tip: Click directly on the column name text (the white text in the row) to edit it inline. Press Enter or click away to confirm. Rename it to exactly what your database schema expects — customer_id not "UUID / GUID".
Now we add the second table. This is where the relational magic happens — we'll add a foreign key that ties every address to a real customer.
3a — Add the second table
Click the + Add Table button at the top right of the builder panel. A new tab appears alongside the customers tab. Click the new table's name field and rename it address_book.
3b — Add the UUID primary key
Just like before, click UUID / GUID in the Identifiers section of the field picker. Rename the column to address_id.
3c — Add the Foreign Key column
This is the most important step. At the very top of the field picker panel, above all the categories, you'll see a green chip labelled 🔗 Foreign Key. Click it.
The Foreign Key modal opens. Fill it in as follows:
- Column name in this table: Type
customer_id - References table: Select
customersfrom the dropdown - References column: Select
customer_idfrom the dropdown - Distribution: Leave as Random — this means each address is randomly assigned to any customer, creating realistic variation where some customers have multiple addresses and some have one
Click Add Column. The foreign key column appears in the table, highlighted in green to distinguish it from regular columns.
3d — Add the address fields
With the FK column in place, add the actual address data columns. Click each one in the field picker and rename:
| Click in field picker | Rename to |
|---|---|
| UUID / GUID | address_id |
| Foreign Key (configured above) | customer_id |
| Address Line 1 | address_line_1 |
| Address Line 2 | address_line_2 |
| City | city |
| State | state |
| Zip Code | zip_code |
| Boolean | is_primary |
The invoices table follows the same pattern as address_book. It references customers — so every invoice is tied to a real customer — but it stores invoice-specific data: invoice number, date, amount, and status.
4a — Add the third table
Click + Add Table again. Rename the new table to invoices.
4b — Add the UUID primary key and Foreign Key
Add UUID / GUID and rename it invoice_id. Then click the 🔗 Foreign Key chip again and configure it:
- Column name in this table:
customer_id - References table:
customers - References column:
customer_id - Distribution: Random — some customers will have more invoices than others, which reflects real billing behaviour
4c — Add the invoice data fields
| Click in field picker | Rename to | What it stores |
|---|---|---|
| UUID / GUID | invoice_id | Unique identifier for each invoice |
| Foreign Key | customer_id | Which customer this invoice belongs to |
| Invoice Number | invoice_number | Human-readable invoice reference |
| Date | invoice_date | Date the invoice was issued |
| Date | due_date | Payment due date (add Date a second time) |
| Dollar Amount | amount_due | Total amount owed |
| Order Status | status | Invoice status: paid, pending, overdue, cancelled |
| Lorem (sentence) | notes | Optional notes or description |
💡 Adding the same field type twice: The Date field is used for both invoice_date and due_date. Just click Date in the field picker twice — two independent Date columns are added, each generating different random dates. Rename each one separately.
Click each table tab in turn and set the record count in the header of each table panel:
| Table | Record count | Ratio |
|---|---|---|
| customers | 100 | Root — 100 unique customers |
| address_book | 200 | ~2 addresses per customer on average |
| invoices | 400 | ~4 invoices per customer on average |
These ratios make sense for a small business billing system. You can adjust them freely — set customers to 500 and invoices to 5,000 for a larger dataset, or reduce to 20/50/100 for lightweight test fixtures.
⚠️ Always make child tables larger than parent tables when using Random FK distribution. If you have 200 parent records and only 50 child records, many parents will have zero children — which may not reflect your application's assumptions.
Scroll down to the Settings panel below the table builder. This is where you choose your output format and SQL dialect.
For this tutorial, use these settings:
- Output format: SQL
- SQL dialect: PostgreSQL (or choose your database)
- Rows per INSERT batch: 500
- CREATE TABLE: checked ✓
- DROP TABLE IF EXISTS: unchecked (leave unchecked unless you want to wipe existing data)
- FOREIGN KEY constraints: optionally check this to add
REFERENCESclauses to the DDL - Wrap in transactions: checked ✓
Click the ⚡ Generate Dataset button. A progress bar fills briefly as the tool generates all three tables in dependency order — customers first, then address_book and invoices (using the real customer IDs just generated).
Switch between the table tabs in the output panel to preview each table. Notice that when you click the address_book tab, the customer_id column shows UUID values — and every one of those values exists in the customers table. That's referential integrity working as designed.
Click ⬇ Download to save the complete SQL script.
Understanding the Generated SQL
Let's look at what the downloaded script actually contains and why each part is structured the way it is.
The file header
The script opens with a comment block summarising everything:
-- Script generated by DataConversionCenter.com
-- Dialect: PostgreSQL | Tables: 3 | Total records: 700
-- Generated: 2026-02-24
-- Insert order respects foreign key dependencies
-- ── Table order: customers → address_book → invoices
The "insert order" note is important. The script always inserts parent tables before child tables. If you try to run the address_book inserts before the customers inserts, the FK constraint check fails because the customer IDs don't exist yet. The generator handles this automatically.
The CREATE TABLE block for customers
-- ── 1 of 3: customers (100 records) ──────────────────────
CREATE TABLE IF NOT EXISTS "customers" (
"customer_id" UUID NOT NULL,
"first_name" VARCHAR(100) NOT NULL,
"last_name" VARCHAR(100) NOT NULL,
"email" VARCHAR(255) NOT NULL,
"phone" VARCHAR(25) NOT NULL,
"created_at" TIMESTAMPTZ NOT NULL,
PRIMARY KEY ("customer_id")
);
Notice the PostgreSQL-specific types: UUID (a native type in PostgreSQL), TIMESTAMPTZ (timestamp with timezone). If you'd chosen MySQL, these would be VARCHAR(36) and DATETIME respectively. The generator maps each field type to the correct SQL type for your chosen dialect.
The CREATE TABLE block for address_book — with the FK constraint
-- ── 2 of 3: address_book (200 records) ───────────────────
CREATE TABLE IF NOT EXISTS "address_book" (
"address_id" UUID NOT NULL,
"customer_id" UUID NOT NULL,
"address_line_1" VARCHAR(200) NOT NULL,
"address_line_2" VARCHAR(100),
"city" VARCHAR(100) NOT NULL,
"state" CHAR(2) NOT NULL,
"zip_code" CHAR(5) NOT NULL,
"is_primary" BOOLEAN NOT NULL,
PRIMARY KEY ("address_id"),
FOREIGN KEY ("customer_id") REFERENCES "customers" ("customer_id")
);
The final line is the FK constraint. It tells PostgreSQL: the customer_id column in this table must always reference a value that exists in customers.customer_id. Any attempt to insert an address with a customer_id that doesn't exist in the customers table will be rejected by the database engine itself.
The bulk INSERT blocks — transaction-wrapped
BEGIN;
INSERT INTO "customers" ("customer_id", "first_name", "last_name", "email", "phone", "created_at") VALUES
('f47ac10b-58cc-4372-a567-0e02b2c3d479', 'Sarah', 'Mendoza', '[email protected]', '(847) 293-5571', '2023-03-14T09:42:17Z'),
('b3c4d5e6-89ab-cdef-0123-456789abcdef', 'James', 'Wilson', '[email protected]', '(312) 847-2910', '2022-11-07T14:22:44Z'),
-- ... 98 more rows
;
COMMIT;
All 100 customers are inserted in a single batch, wrapped in BEGIN / COMMIT. If any row fails, the entire batch rolls back — you won't end up with a partially-inserted customers table.
The address_book inserts — with real customer IDs
BEGIN;
INSERT INTO "address_book" ("address_id", "customer_id", "address_line_1", ...) VALUES
('c8d9e0f1-...', 'f47ac10b-58cc-4372-a567-0e02b2c3d479', '4821 Ridgewood Ave', ...),
('d1e2f3a4-...', 'b3c4d5e6-89ab-cdef-0123-456789abcdef', '192 Maple St', ...),
-- ...
;
COMMIT;
Look at the customer_id values in the address_book insert: f47ac10b-58cc-4372-a567-0e02b2c3d479. That's the exact same UUID that was used as the primary key for Sarah Mendoza in the customers insert above. The generator captured all 100 customer UUIDs during generation and used them when building the address_book rows. That's referential integrity guaranteed.
The output data
Here's what the actual generated rows look like across all three tables. Notice that the customer_id values in address_book and invoices always match real IDs from customers:
| customer_id | first_name | last_name | phone | created_at | |
|---|---|---|---|---|---|
| f47ac10b-58cc-... | Sarah | Mendoza | [email protected] | (847) 293-5571 | 2023-03-14T09:42Z |
| b3c4d5e6-89ab-... | James | Wilson | [email protected] | (312) 847-2910 | 2022-11-07T14:22Z |
| a1b2c3d4-efab-... | Maria | Thompson | [email protected] | (773) 510-4829 | 2024-01-19T08:33Z |
| address_id | customer_id | address_line_1 | city | state | zip_code | is_primary |
|---|---|---|---|---|---|---|
| c8d9e0f1-... | f47ac10b-58cc-... | 4821 Ridgewood Ave | Austin | TX | 78741 | true |
| d1e2f3a4-... | f47ac10b-58cc-... | 113 Oak Blvd Apt 4B | Austin | TX | 78702 | false |
| e5f6a7b8-... | b3c4d5e6-89ab-... | 192 Maple St | Chicago | IL | 60614 | true |
| invoice_id | customer_id | invoice_number | invoice_date | amount_due | status |
|---|---|---|---|---|---|
| 11a2b3c4-... | f47ac10b-58cc-... | INV-2024-0892 | 2024-07-15 | 4291.47 | paid |
| 22b3c4d5-... | a1b2c3d4-efab-... | INV-2024-1047 | 2024-09-03 | 892.10 | pending |
| 33c4d5e6-... | f47ac10b-58cc-... | INV-2025-0012 | 2025-01-22 | 341.00 | overdue |
Sarah Mendoza (f47ac10b-...) appears in both the address_book (she has two addresses) and in invoices (she has two invoices). Maria Thompson (a1b2c3d4-...) also appears in invoices. Every value in the customer_id column of both child tables is a real UUID from the customers table — that's what makes this dataset importable.
Exporting as CSV and JSON Instead
SQL isn't always the right format. If you need to import the data using a GUI tool, test it in a JavaScript application, or load it into a document database, switch the output format before clicking Generate.
CSV format
Select CSV in the format row. Each table downloads as a separate file: customers.csv, address_book.csv, invoices.csv. The FK relationships are preserved by value — the customer_id values in address_book.csv match real IDs in customers.csv.
CSV is the right choice for importing with a GUI tool like TablePlus, DBeaver, or pgAdmin's import wizard, or for loading data into Google Sheets or Excel for review.
JSON nested format
Select JSON (nested) to get a single file where addresses and invoices are embedded inside each customer object:
[
{
"customer_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"first_name": "Sarah",
"last_name": "Mendoza",
"email": "[email protected]",
"addresses": [
{ "address_id": "c8d9e0f1-...", "address_line_1": "4821 Ridgewood Ave", "city": "Austin", ... },
{ "address_id": "d1e2f3a4-...", "address_line_1": "113 Oak Blvd Apt 4B", ... }
],
"invoices": [
{ "invoice_id": "11a2b3c4-...", "invoice_number": "INV-2024-0892", "amount_due": "4291.47", ... },
{ "invoice_id": "33c4d5e6-...", "invoice_number": "INV-2025-0012", "amount_due": "341.00", ... }
]
}
]
This nested format is ideal for MongoDB or Firestore where documents can contain nested arrays, for GraphQL mock servers, or for React and Vue application development where components receive nested customer data from an API.
Try it yourself. The complete three-table schema from this tutorial takes about 5 minutes to build from scratch.
Open Relational Data Generator →Database Normalization: Why Relational Tables Are Structured This Way
Relational tables are structured around normalization — a set of rules that eliminate data redundancy and ensure data integrity. Understanding normalization helps you design better schemas and understand why the generated tables are split the way they are.
First Normal Form (1NF)
Each column contains a single atomic value — no comma-separated lists, no arrays, no repeating groups. A customer's multiple phone numbers go in a separate phone_numbers table, not as phone1, phone2 columns.
Second Normal Form (2NF)
Every non-key column depends on the entire primary key. This matters for composite keys: in an order_items table keyed on (order_id, product_id), the product name belongs in the products table — not in order_items — because it only depends on product_id, not the full composite key.
Third Normal Form (3NF)
No non-key column depends on another non-key column (no transitive dependencies). A customers table should not store both zip_code and city — the city is determined by the zip code, creating a transitive dependency. City/state belong in a zip_codes reference table.
Why This Matters for Test Data Generation
The Relational Data Generator creates properly normalised schemas — customers, orders, and order items as separate linked tables. This mirrors real production database design and means the generated SQL and data is directly usable for realistic testing, not just toy examples.
