Guide

Relational Data Generator: Multi-Table Test Datasets with Real Foreign Keys

Bill Crawford — Developer Guide — February 2026 — 15 min read · Last updated January 09, 2026

Generating test data for a single flat table is straightforward. Generating test data for a real application — one with a users table, an orders table, and an order_items table where every order references a real customer and every item references a real order — is an entirely different problem.

Connect on LinkedIn →

Most fake data tools ignore this completely. They give you rows, but those rows don't know anything about each other. Import them into a database with foreign key constraints and you get immediate errors. Import them without constraints and your test data quietly lies to you: queries that join tables return nonsense, aggregations are wrong, and the bugs you're trying to find stay hidden behind incoherent data.

The Relational Data Generator solves this from the ground up. You define your tables, configure the foreign key relationships between them, and it generates all tables together — guaranteeing that every FK value in a child table exists as a real ID in the parent table. The result is a complete relational dataset you can import directly into a real database, or export as SQL INSERT scripts ready to run in your chosen dialect.

In this guide

Why flat fake data breaks real applications
The 5 pre-built templates explained
Building a custom schema from scratch
Foreign keys: Random vs Sequential distribution
SQL export: dialects, bulk INSERT, and transaction wrapping
CSV, JSON, and XML export options
Real-world workflows: seeding, staging, testing
Pro tips for better relational test data

Why Flat Fake Data Breaks Real Applications

Consider a standard three-table e-commerce schema:

customers   (customer_id PK, name, email, ...)
orders      (order_id PK, customer_id FK, total, status, ...)
order_items (item_id PK, order_id FK, product, qty, price)

If you generate each table independently, you end up with 100 customer IDs in one table, 500 order rows with random customer_id values that don't match any real customer, and 2,000 item rows with random order_id values that reference nothing. This data is worse than useless for testing. Any join query returns garbage. Any import into a database with FK constraints fails immediately. And any application logic that assumes data integrity — which is most of it — produces errors or silently wrong results.

Referential integrity means every FK value must exist in the referenced table. The Relational Data Generator achieves this by generating parent tables first, capturing all their primary key values in memory, and then pulling those real IDs when generating child table rows. The process is sequential: you cannot have an order that references a non-existent customer because the customer IDs are drawn directly from the customers that were just generated.

Ready to generate a relational dataset? Load the e-commerce template and have a complete 3-table dataset in under 60 seconds.

Open the Generator →

The 5 Pre-Built Templates Explained

Each template loads a complete multi-table structure with pre-configured FK relationships and sensible default record counts. You can use them as-is, modify them freely, or use them as a structural reference for building your own schema.

E-Commerce: customers → orders → order_items

The classic three-level hierarchy. Default counts: 200 customers, 800 orders, 2,500 order items — roughly 4 orders per customer and 3 items per order, reflecting real-world e-commerce ratios. The order_items table references orders which references customers, forming a clean three-level chain. Use for: checkout and order management testing, e-commerce demo environments, verifying revenue and order count aggregations.

CRM / SaaS: companies → contacts → deals

A B2B CRM schema where deals reference both a company and a specific contact. The deals table has two foreign keys — company_id and contact_id — demonstrating a pattern most fake data tools cannot handle. Default counts: 100 companies, 500 contacts, 400 deals. Use for: CRM demo environments, pipeline reporting tests, staging environments before sales demos.

Blog / CMS: users → posts → comments

Comments reference both a post and a user — another two-FK table. Every comment is guaranteed to be attached to a real post and written by a real user. Default counts: 150 users, 400 posts, 1,500 comments. The posts table uses Lorem Ipsum for titles and body text, and a category field drawn from real content categories. Use for: CMS testing, comment moderation system development, feed and pagination testing.

HR System: departments → employees → timesheets

Departments contain employees, employees log timesheets. The 20-department default with Random FK distribution creates realistic clustering — some departments are large, some are small — just like real organisations. Default counts: 20 departments, 300 employees, 2,000 timesheets. Use for: HRIS development, payroll system testing, workforce analytics dashboards, training environments for new HR software.

Healthcare: patients → appointments → prescriptions

Prescriptions are issued at specific appointments, so the prescriptions table references appointments which references patients. The domain-specific fields are particularly strong here: diagnoses are realistic clinical descriptions, specialties are real medical specialties, and appointment statuses (scheduled, confirmed, completed, cancelled, no-show) reflect real clinical workflow states. Default counts: 150 patients, 600 appointments, 800 prescriptions. All data is entirely fictional — no generated record corresponds to a real patient. Use for: EHR system development, patient portal testing, clinical workflow application demos, staff training environments where using real PHI is not permitted.

Load any template in one click. Adjust counts, add or rename columns, then generate your dataset.

Open the Generator →

Building a Custom Schema From Scratch

The templates cover common patterns, but your application has its own schema. Here is how to build one efficiently.

Plan before you open the tool

Spend two minutes reviewing your actual database schema. Write down: the table names you need, the primary key column name and type for each table, the FK columns and what they reference, and rough row counts. This planning step means you can rename columns as you add them rather than going back to fix them.

Add tables in dependency order

Add root tables first — the ones with no FK dependencies — then the tables that reference them. This makes the FK dropdowns intuitive: when you add an FK to a child table, all parent tables are already available in the reference dropdown.

Naming your UUID column

Make the first column of every table a UUID field and rename it to your PK column name exactly — customer_id, user_id, product_id. When you add FK columns to child tables, you reference this label by name, so consistency matters.

Adding a Foreign Key column

Click the green Foreign Key chip at the top of the field picker. The FK modal asks for: the column name in the current table, which parent table to reference, which column in that parent to reference (typically the UUID PK), and the distribution mode (Random or Sequential — covered below). A table can have multiple FK columns referencing different parents.

Renaming columns

Click any column name in the table panel to rename it inline. Rename aggressively. A column called "First Name" in the tool should become first_name, firstName, or FIRST_NAME — whatever your schema expects. The output uses exactly what you type.

Foreign Keys: Random vs Sequential Distribution

When generating a child row, the tool needs to pick which parent ID to assign. There are two modes, and the choice matters significantly.

Random distribution

Each child row picks any parent ID at random. This creates natural clustering — some parents end up with many children and some with few, exactly like real data. Some customers order frequently, some barely. Some posts attract many comments, some get none. This is the default for most use cases.

-- With 100 customers and 500 orders (Random):
CUST-001 -> 12 orders   (frequent buyer)
CUST-002 ->  1 order
CUST-003 ->  8 orders
CUST-004 ->  0 orders   (registered but never ordered)
...

Use Random when you want data that reflects real-world distributions — uneven, long-tailed, with natural outliers at both ends.

Sequential distribution

Children are distributed evenly across parents in round-robin order. Every parent gets approximately the same number of children. Use this when you need predictable ratios for testing aggregation logic, pagination, or any calculation where you want to know the exact expected result.

-- With 100 customers and 500 orders (Sequential):
CUST-001 -> 5 orders (exactly)
CUST-002 -> 5 orders (exactly)
CUST-003 -> 5 orders (exactly)
...

Mixing modes per FK column

Distribution is set per FK column independently. In the HR template you might want Random for the dept_id FK on employees (realistic uneven department sizes) but Sequential for the employee_id FK on timesheets (every employee has an equal number of entries for a fair payroll test). Configure them separately in the FK modal for each column.

SQL Export: Dialects, Bulk INSERT, and Transaction Wrapping

The SQL export generates a complete, runnable script with CREATE TABLE statements, correctly typed columns for your target database, and optimised INSERT batches. Here is exactly what each dialect produces.

MySQL / MariaDB

Backtick-quoted identifiers, TINYINT(1) for booleans, VARCHAR(36) for UUIDs, DATETIME for timestamps, DECIMAL(10,2) for money. The bulk INSERT performance lever is disabling autocommit — without it, MySQL commits each row individually causing a disk flush per row, which is the single biggest INSERT slowdown in MySQL:

SET autocommit=0;

INSERT INTO `customers` (`customer_id`, `first_name`, `email`) VALUES
  ('f47ac10b-58cc-4372-a567-0e02b2c3d479', 'Sarah', '[email protected]'),
  ('b3c4d5e6-89ab-cdef-0123-456789abcdef', 'James', '[email protected]')
  -- up to 500 rows per batch
;
COMMIT;

-- next 500 rows...
INSERT INTO `customers` (`customer_id`, `first_name`, `email`) VALUES (...);

PostgreSQL

Double-quoted identifiers, native UUID type, BOOLEAN, SMALLINT for age, TIMESTAMPTZ for timestamps, NUMERIC(10,2) for money. Batches are wrapped in BEGIN / COMMIT blocks. With FK constraints enabled, the output includes explicit REFERENCES clauses:

BEGIN;
INSERT INTO "orders" ("order_id", "customer_id", "total_amount") VALUES
  ('abc-123', 'f47ac10b-...', 4291.47),
  ('abc-124', 'b3c4d5e6-...', 892.10)
;
COMMIT;

SQLite

Unquoted identifiers, TEXT for strings, REAL for decimals, INTEGER for booleans. SQLite doesn't support multi-row VALUES reliably in older versions, so the bulk strategy is wrapping all single-row inserts in one transaction — this can make SQLite inserts 100x faster by reducing filesystem syncs from one-per-row to one-per-table:

BEGIN TRANSACTION;
INSERT INTO customers VALUES ('f47ac10b-...', 'Sarah', '[email protected]', ...);
INSERT INTO customers VALUES ('b3c4d5e6-...', 'James', '[email protected]', ...);
-- all rows for this table in one transaction
COMMIT;

SQL Server (T-SQL)

Bracket-quoted identifiers, UNIQUEIDENTIFIER for UUIDs, NVARCHAR for strings, BIT for booleans, DATETIME2 for timestamps, NVARCHAR(MAX) for long text. SQL Server has a hard 1,000-row limit per multi-row INSERT statement — the generator enforces this automatically, splitting at 1,000 and wrapping each group in a transaction:

BEGIN TRANSACTION;
INSERT INTO [customers] ([customer_id], [first_name], [email]) VALUES
  (N'F47AC10B-58CC-4372-A567-0E02B2C3D479', N'Sarah', N'[email protected]'),
  -- up to 1000 rows
;
COMMIT;

BEGIN TRANSACTION;
INSERT INTO [customers] ([customer_id], [first_name], [email]) VALUES
  -- rows 1001 onwards
;
COMMIT;

SQL options reference

Option	What it does	When to use
Include CREATE TABLE	Full DDL before each table's INSERT block	New databases or self-contained scripts
Include DROP TABLE IF EXISTS	DROP before CREATE	Re-seeding — wipes old data and reloads cleanly
Include FOREIGN KEY constraints	REFERENCES clauses on FK columns in DDL	When you want the database to enforce integrity
Wrap in transactions	Transaction blocks around each batch	Almost always — transactions are the primary bulk INSERT performance lever

For a re-seed script: enable DROP + CREATE + transactions. For an additive seed appending to existing data: disable DROP and CREATE, enable transactions only.

Table order in the SQL output

Parent tables always appear before their child tables in the output script. This is critical when FK constraints are enabled — inserting orders before customers exist violates the constraint. The generator resolves dependencies automatically, so the script is always safe to run top-to-bottom without any manual reordering.

CSV, JSON, and XML Export

SQL is not always the right format. The other options cover different integration points.

CSV — multi-file download

Each table downloads as a separate CSV file named after the table: customers.csv, orders.csv, order_items.csv. FK relationships are preserved by value — the customer_id values in orders.csv match real IDs in customers.csv. Use for: GUI database import wizards (TablePlus, DBeaver, pgAdmin), spreadsheet applications, ETL pipelines that accept flat files.

JSON flat — multi-file download

Each table downloads as a separate JSON array of objects. Use with json.load() in Python or require('./customers.json') in Node.js for immediate integration with test suites, API mock servers, or any JavaScript application that consumes JSON arrays.

JSON nested — single file

The most powerful JSON format: children are embedded inside their parents. For the e-commerce template, each customer object contains an array of their orders, and each order contains an array of its items:

[
  {
    "customer_id": "f47ac10b-...",
    "first_name": "Sarah",
    "email": "[email protected]",
    "orders": [
      {
        "order_id": "b3c4d5e6-...",
        "total_amount": "4291.47",
        "status": "delivered",
        "order_items": [
          { "item_id": "...", "product_name": "Wireless Headphones", "quantity": "2" },
          { "item_id": "...", "product_name": "Laptop Sleeve", "quantity": "1" }
        ]
      }
    ]
  }
]

Use nested JSON for: GraphQL mock servers, REST API fixtures that return related entities, React and Vue application development where components expect nested data structures, and document databases like MongoDB or Firestore where nested objects are the native storage format.

XML — multi-file download

Each table is a separate XML file with records wrapped in <Record> tags and column names as element names. Use for legacy system integration, SOAP API testing, or any enterprise platform that requires XML input.

Real-World Workflows

Workflow 1: Seeding a PostgreSQL development database

Load or build the schema matching your tables
Set record counts — 100 / 500 / 2,000 is a solid starting point for a 3-table schema
Select SQL, choose PostgreSQL dialect
Enable Create Table and transactions. Disable Drop Table if tables already exist
Click Generate, then Download
Run: psql -U myuser -d mydb -f relational-data.sql

The whole process takes under 2 minutes. Enable Drop Table to make the script safely re-runnable.

Workflow 2: Loading a CRM demo environment before a sales call

Load the CRM / SaaS template
Increase counts to feel substantial: 200 companies, 1,000 contacts, 800 deals
Generate as SQL and run against your demo database, or generate nested JSON for an API mock server

The CRM template generates company names, job titles, and deal values realistic enough that prospects focus on product features rather than noticing the data is synthetic.

Workflow 3: Integration test fixtures for a Node.js application

Build a schema matching your test database
Set small counts: 20 parents, 80 children, 200 grandchildren — enough diversity without bulk
Export as JSON flat format
Check the files into your repo: tests/fixtures/users.json, orders.json
In test setup: await db.users.insertMany(require('./fixtures/users.json'))

Every developer and every CI run uses the same deterministic baseline. Flaky tests caused by non-deterministic data are eliminated.

Workflow 4: Load testing with maximum dataset size

Set all tables to 5,000 rows
Export as SQL with batch size 500 and transactions enabled
Run against a local database to verify performance under realistic data volume
For even larger datasets, generate multiple batches — UUID primary keys guarantee no collisions between runs

Pro Tips for Better Relational Test Data

Use UUID as your primary key rather than auto-increment integers

Auto-increment integers conflict if you run the seed script twice. UUID keys are globally unique — you can seed multiple times and each run adds a fresh batch with no ID collisions. This is especially useful for staging environments where you accumulate data over multiple seed runs.

Match FK column names to your schema exactly

When you rename the FK column in the tool, use the exact name your ORM or application expects. If your ORM uses camelCase and expects authorId, rename it to authorId. If your convention is author_id, use that. The output uses exactly what you type.

Adjust record ratios to match your real data

The default e-commerce ratios imply 4 orders per customer and 3 items per order. If your real application has 20 orders per customer, set counts to 200 / 4,000 / 12,000 to match. Tests against realistically-proportioned data reveal performance issues that small uniform datasets miss entirely.

Use the healthcare template as a privacy-safe training environment

The healthcare template generates diagnostically plausible but entirely fictional medical records. Patient names, diagnoses, and appointment details are randomly combined from real medical vocabulary but assigned to no real person. It is designed specifically for training staff on EHR systems, demoing clinical workflow applications, and building patient portals without any real PHI.

Combine with the flat generator for reference tables

The relational generator handles your core entity tables. For supplementary flat reference tables — product catalogues, configuration tables, zip code lookups — the Custom Fake Data Generator is faster. Generate your relational schema here, then generate any flat reference tables separately, and combine both in your seed script.

Wrap the entire seed operation in an outer transaction

Even on staging, wrapping your complete seed import in a transaction means a mid-run failure leaves the database clean rather than partially seeded. The generated SQL handles this per-batch already, but an outer transaction from your SQL client adds an additional safety net for the multi-table import as a whole.

Build your relational dataset now. 5 templates, 5,000 rows per table, SQL for MySQL, PostgreSQL, SQLite, and SQL Server. Free, no signup.

Open Relational Data Generator →

Related Tools & Articles

TutorialHow to Build Relational Tables: Step-by-Step ToolRelational Data Generator ToolCustom Fake Data Generator ToolFake Address Generator HubAll Data Generators ToolJSON Formatter & Validator ToolCSV to JSON Converter GuideCustom Fake Data Generator Guide GuideJSON vs XML vs CSV