The Complete Guide to Ndjson Validating: Everything You Need to Know
NDJSON โ Newline Delimited JSON, also called JSONL โ is a format for storing and streaming structured data where each line of a text file is a self-contained, valid JSON value. Unlike a standard JSON array, which must be read in its entirety before any record can be processed, an NDJSON file can be read line-by-line, making it ideal for log pipelines, event streams, machine learning datasets, API exports, and any workload where records arrive incrementally or the file is too large to fit in memory.
NDJSON files are deceptively easy to create incorrectly. A single malformed line โ a missing closing brace, an unescaped special character, a bare scalar where an object is expected โ renders that record unparseable. At scale, silent parse failures are worse than loud ones: a pipeline that quietly skips malformed records produces wrong results without raising an alarm. NDJSON validation catches these problems before the data reaches a consumer.
This guide explains what NDJSON validation is, what a validator checks, how to interpret its output, and best practices for working with NDJSON files in development and production.
Validate your NDJSON file instantly: Checks JSON syntax per line, key consistency, encoding, empty lines, duplicate keys, and previews the first 5 records โ free, private, no uploads.
Open Ndjson Validator โTable of Contents
What Is NDJSON Validation?
NDJSON validation is the process of reading an NDJSON or JSONL file line by line, attempting to parse each non-empty line as JSON, and confirming that the result meets the structural requirements of the NDJSON specification: each record must be a valid JSON value โ typically an object ({}) or array ([]) โ and the records across the file should be structurally consistent.
A browser-based NDJSON validator reads the file using the Web File API and processes it entirely in JavaScript. Your data โ which may include API payloads, event logs, model training examples, or financial records โ never leaves your device. This makes browser-based validation safe to use with real production data files that contain sensitive information.
Why Validate NDJSON Files?
NDJSON files fail in ways that are easy to overlook during creation or transformation. Common scenarios where validation catches real problems:
- Pipeline input validation. Before loading an NDJSON file into a data pipeline, database, or machine learning framework, validating it confirms that every record will parse correctly. A single malformed line can cause a batch import to fail partway through โ or worse, to silently skip the bad record and complete with fewer rows than expected.
- Export verification. When an application exports data as NDJSON, validating the output confirms that the export logic handled edge cases correctly: values with special characters, null fields, nested objects, and Unicode content. Validation catches export bugs before they propagate to downstream systems.
- Key consistency checking. In an NDJSON file used as a dataset, every record should have the same set of top-level keys. A key that appears in some records but not others often indicates missing data, a schema change that was applied inconsistently, or records from multiple sources that were concatenated without normalization. Validation flags these inconsistencies explicitly.
- Encoding issues. NDJSON files are plain UTF-8 text, but files created on Windows systems may have a UTF-8 byte order mark (BOM) or CRLF line endings that some parsers handle incorrectly. Files generated by encoding errors may contain null bytes that indicate binary content. Validation detects these issues before they cause failures in production.
- Format confirmation. Not all files with a
.jsonextension are NDJSON. A file that contains a JSON array ([{...}, {...}]) rather than newline-delimited records is structurally different and requires a different parser. Validation confirms that the file actually follows the NDJSON convention rather than a different JSON layout.
JSON Syntax Per Line
The primary check in NDJSON validation is parsing each non-empty line as JSON using the native JSON.parse() function. This is the strictest and most accurate JSON validator available in a browser environment โ the same parser used by the JavaScript engine itself.
When a line fails to parse, the validator reports the line number and the parser's error message. Common parse errors in NDJSON files:
- Unexpected token. A character appeared where the parser did not expect it. Often caused by a missing colon between a key and value, an extra comma, a single-quoted string (JSON requires double quotes), or a JavaScript-style identifier used as a value (e.g.,
undefined, which is not valid JSON). - Unexpected end of JSON input. The line ends before the JSON structure is complete. This usually means a missing closing brace or bracket โ common when a record was written by a code path that forgot to close a nested object, or when a line was truncated during file transfer or export.
- Invalid string. A string value contains a character that must be escaped in JSON โ such as a literal newline, tab, or backslash โ but was written unescaped. Also triggered by a string that is not properly closed before the end of the line.
The validator reports the first 10 parse errors by line number and stops collecting additional errors beyond that threshold. If a file has more than 10 parse errors, the remaining count is reported as a summary. This prevents the output from being overwhelmed by errors in files where every line is malformed โ a situation that usually indicates the file is not NDJSON at all.
Root Value Type
The NDJSON specification requires each line to parse to a JSON object or array โ not a bare string, number, boolean, or null. A file where lines parse to scalar values is technically valid JSON per line, but is not a conforming NDJSON file and will fail when processed by most NDJSON-aware tools.
The validator checks the root type of each successfully parsed line and reports an error for any line where the root value is a bare scalar. Common sources of root type errors:
- An export script that wrote the string representation of a value (e.g., a JSON-encoded string rather than a JSON-encoded object) to each line.
- A file that was produced by a
JSON.stringify()call on a scalar value rather than on an object or array. - A configuration value or comment line that was accidentally included in the data file.
If a file contains a mix of lines that parse to objects and lines that parse to arrays, the validator reports a warning. Mixed record types are technically valid per the NDJSON specification โ the spec does not require all records to be the same type โ but they are unusual in practice and almost always indicate that records from two different sources were concatenated without normalization.
Key Consistency
For NDJSON files where every record is a JSON object, the validator collects all top-level keys from every record and identifies keys that appear in some records but not others. These inconsistencies are reported as warnings, not errors โ missing keys are not a parse failure, but they are a signal that deserves attention.
The consistency check reports, for each inconsistent key, how many records contain it and how many are missing it. This makes it easy to distinguish between genuinely optional fields (present in 95 out of 100 records) and likely errors (present in 3 out of 100 records).
Common sources of key inconsistency in real NDJSON files:
- Optional fields. Some records represent events or entities that have a field only when a certain condition is true โ for example, an error log that includes a
stack_tracekey only for error-level records. Optional fields are expected and intentional; the warning is informational. - Schema migration. A dataset that was produced over a period during which the record schema changed will have records with the old key set and records with the new key set. Records from before a field was added will be missing it; records from before a field was removed will have it. Validation surfaces the extent of the overlap.
- Multi-source concatenation. When NDJSON files from two different sources are concatenated โ for example, event logs from two different services โ the resulting file may have records with different key sets. Validation identifies which keys are common to all records and which are source-specific.
- Null vs. absent. Some pipelines represent missing data as
null(the key is present with a null value) while others omit the key entirely. Validation makes this difference visible, which is important for downstream consumers that distinguish between the two representations.
Up to 5 key consistency warnings are shown in detail. If more than 5 keys are inconsistent, the remaining count is reported as a summary to keep the output readable.
Duplicate Keys
The JSON specification does not prohibit duplicate keys within a single object, but their behavior is undefined โ most parsers silently use the last value, discarding all earlier occurrences. In an NDJSON record, a duplicate key is almost always a bug: a field was written twice by the export code, a merge operation produced duplicate output, or a record was constructed by combining two partial objects that shared a key.
The validator checks for duplicate keys within each parsed object using a heuristic approach. Because JSON.parse() silently deduplicates keys, the check re-parses each line with a replacer function that tracks key occurrences. When a duplicate is found, the validator reports the line number and the duplicated key name.
Duplicate key detection is applied to each top-level object in the file. Nested objects within a record are not individually checked โ only the top-level keys of each record are examined. This keeps the check fast for large files with many records.
Empty Lines
The NDJSON specification does not explicitly address empty lines, but most NDJSON-aware tools handle them in one of two ways: they skip empty lines silently, or they treat them as parse errors. The validator counts empty lines and reports their positions as a warning.
A single trailing empty line at the end of the file โ the most common case, produced by text editors that append a newline after the last line โ is detected and removed before processing. This prevents the trailing newline from being counted as an empty line or causing a spurious parse error on the last "record."
Interior empty lines โ blank lines between records rather than at the end โ are flagged as warnings. Interior empty lines are unusual in NDJSON files and often indicate that the file was produced by a process that inserted blank separator lines between records, which is not standard NDJSON convention and may cause failures in strict parsers.
Encoding Issues
NDJSON files must be encoded as UTF-8 text. Two encoding issues commonly appear in real files:
- UTF-8 BOM. A three-byte sequence (
0xEF 0xBB 0xBF) prepended by some Windows text editors. The BOM is invisible in most editors but causes strict JSON parsers to fail on the first line because the BOM bytes appear before the opening brace or bracket of the first record. The validator detects the BOM and reports it as a warning. The first line is still parsed โ with the BOM stripped โ so the rest of the file can be validated, but the warning should be resolved before the file is used in production. - Null bytes. Null bytes in the file content indicate either binary data that was incorrectly given an NDJSON extension, or a file saved in a wide-character encoding such as UTF-16 (which inserts null bytes between each character when representing ASCII text). A file with null bytes cannot be a valid UTF-8 text file and the validator reports it as an error, stopping further processing.
The validator also detects and reports the line ending style used in the file: LF (Unix/macOS, the standard for NDJSON), CRLF (Windows), or bare CR (old Mac format). CRLF line endings are handled transparently โ most NDJSON parsers accept them. Bare CR line endings are unusual and may cause failures in parsers that split on LF only; the validator reports them as a warning.
File Statistics and Data Preview
For valid NDJSON files, the validator reports a set of statistics and renders a data preview in addition to the pass/fail result:
- Total records. The count of non-empty lines that were successfully parsed as JSON. This is the number of records that will be available to a downstream consumer.
- Unique keys. The count of distinct top-level keys found across all object records. For array records, this field reports "N/A (arrays)".
- Record type. Whether all records are objects (
{}), all are arrays ([]), or the file contains a mix of both. - Parse errors. The count of lines that failed to parse as JSON. Zero means every line parsed successfully.
- Empty lines. The count of blank lines in the file, excluding the trailing newline.
- File size. The size of the file in bytes, kilobytes, or megabytes.
- Line endings. The line ending style detected in the file (LF, CRLF, or CR).
- BOM present. Whether a UTF-8 byte order mark was found at the start of the file.
The top-level keys panel lists all distinct keys found in the file, sorted alphabetically. For files with many records, this panel gives an immediate overview of the schema โ what fields are available and in what order they appear.
The data preview panel renders the first 5 records in a formatted table, with one column per top-level key. This makes it possible to visually confirm that the parsed data looks correct โ the right values in the right fields โ without opening the raw file in a text editor. For array records, the preview uses index-based column headers ([0], [1], etc.).
Best Practices
For developers working with NDJSON files โ whether generating them from applications, processing them in pipelines, or using them as datasets โ these practices reduce the risk of data quality issues reaching production:
- Validate immediately after generation. The easiest time to catch a malformed NDJSON file is immediately after it is generated โ before it is uploaded, loaded, or shared. Add a validation step to your export script or data pipeline so that a malformed output is caught before it propagates downstream.
- Validate before loading into a database or ML framework. Most databases and machine learning frameworks that accept NDJSON input will fail or silently skip malformed records without providing clear diagnostics. Validating the file before loading confirms that the import will succeed and that the record count matches expectations.
- Use LF line endings. Always produce NDJSON files with Unix LF line endings rather than Windows CRLF. While most parsers handle CRLF correctly, LF is the standard convention for NDJSON and eliminates a class of compatibility issues.
- Omit the BOM. Configure your text editors and export tools to write UTF-8 files without a byte order mark. The BOM is unnecessary in UTF-8 (unlike UTF-16, where it serves a functional purpose) and causes failures in strict parsers.
- Be explicit about optional fields. If your records have optional fields, document which fields are optional and why. When a key is missing from some records, a downstream consumer needs to know whether to treat the absence as null, as zero, or as "not applicable." Validation will flag the inconsistency; your documentation should explain how to handle it.
- Validate after concatenation. When you concatenate two or more NDJSON files โ for example, combining daily log exports into a weekly file โ validate the result. Concatenation can introduce duplicate records, mismatched schemas, encoding differences between files, and missing or extra newlines at join points.
- Check the record count. After generating or transforming an NDJSON file, compare the reported record count against an expected value. A mismatch indicates that records were dropped during generation, that the file is truncated, or that some records were split across lines (which would cause parse errors on the continuation lines).
- Do not embed newlines in string values without escaping. NDJSON uses the newline character as the record delimiter. If a string value contains a literal newline, it must be escaped as
\n. An unescaped newline inside a string will split the record across two lines, causing a parse error on the continuation line and corrupting the record.
Common Use Cases
Log aggregation and analysis. Application logs written in structured JSON format โ one JSON object per log entry โ are a natural fit for NDJSON. Log aggregation pipelines (Logstash, Fluentd, Vector) natively produce and consume NDJSON. Validating log files before loading them into Elasticsearch, ClickHouse, or a data warehouse confirms that the log schema is consistent and that no malformed entries will cause indexing failures.
Machine learning datasets. The JSONL format (JSON Lines, functionally identical to NDJSON) is widely used for NLP training datasets. The OpenAI fine-tuning API, Hugging Face datasets, and many other ML platforms accept JSONL as an input format. Validating a training dataset before submitting it to a fine-tuning job catches schema errors โ wrong key names, missing required fields, incorrect value types โ that would cause the job to fail or produce incorrect results.
API response streaming. Some APIs stream responses as NDJSON โ each line is a JSON object representing a chunk of the response. OpenAI's streaming chat completion API is a well-known example. Validating a captured stream confirms that the capture was complete and that no lines were corrupted during recording.
Database exports. Many databases support exporting query results as NDJSON: MongoDB's mongoexport, PostgreSQL via custom scripts, BigQuery's export feature. Validating the export output confirms that every row was correctly serialized, that special characters in string fields were properly escaped, and that the record count matches the expected row count from the query.
ETL pipeline output. Extract-transform-load pipelines often use NDJSON as an intermediate format between the extraction and load stages. Validating the transformer's output confirms that the transformation logic handled all input types correctly โ including null values, nested objects, arrays within records, and Unicode content โ before the data is loaded into the destination system.
Configuration and test fixtures. Small NDJSON files are sometimes used as test fixtures or seed data for development environments. Validating these files in CI ensures that a developer who modifies a fixture doesn't accidentally introduce a parse error that breaks tests for the whole team.
