How to Use the Parquet To Csv: Step-by-Step Tutorial
The Parquet to CSV Converter runs entirely in your browser โ your Parquet file is never sent to any server, no account is required, and no data leaves your device. This tutorial walks through every step: loading a file, running the conversion, reading the stats panel, reviewing the column schema, and downloading the CSV. It also covers the most common problems you will encounter and how to resolve them.
Follow along with the tool open: Open the Parquet to CSV Converter in a second tab, then work through each step below.
Open Parquet to CSV Converter โTable of Contents
Step 1 โ Open the Tool
Navigate to /developer-tools/parquet-to-csv/. The tool loads entirely in your browser. After the initial page load, converting a file makes no outbound network requests โ you can verify this in your browser's DevTools Network panel while the conversion runs.
The tool is accessible from the Developer Tools hub, the command palette (press Ctrl+K or โK and type "Parquet to CSV"), or directly via the URL above.
Step 2 โ Load Your Parquet File
Load your .parquet file using one of two methods:
- Drag and drop. Drag a
.parquetfile from your file manager, desktop, or Downloads folder and drop it anywhere on the blue-bordered drop zone. The zone highlights when a file is being dragged over it. You can also drop the file anywhere on the page โ a full-screen overlay appears to catch the drop. - Click to browse. Click the drop zone or the "browse" link inside it to open your operating system's file picker. Navigate to your
.parquetfile and select it.
Once the file is loaded, the filename appears in a bar below the drop zone. The drop zone itself hides to keep the interface clean. To replace the file, click the โ button in the filename bar โ this clears the loaded file and returns the drop zone so you can load a different one.
The tool accepts only .parquet files. If you drop a file with a different extension, a red error badge explains the problem. Files up to 200 MB are supported; beyond that, browser memory limits may cause the conversion to fail.
Step 3 โ Click Convert to CSV
Once a file is loaded, click the Convert to CSV button. The conversion pipeline runs immediately in the browser:
- The first and last 4 bytes of the file are checked for the
PAR1magic byte sequence. If either check fails, an error is reported before any decoding is attempted. - The Thrift-encoded file footer is read to extract the schema โ column names and types โ and the row group layout.
- Every row group is decoded page by page. Snappy and Gzip compression is decompressed automatically. Null values are preserved as empty fields.
- The decoded rows are serialized to RFC 4180-compliant CSV, with fields containing commas, quotes, or newlines properly escaped.
A progress bar tracks the conversion through these stages. For files under 10 MB, the conversion typically completes in under two seconds. For larger files, it may take several seconds; the button is disabled during processing to prevent accidental double-clicks.
Step 4 โ Read the Stats Panel
After a successful conversion, a green stats panel appears with five key numbers:
- Rows. The total number of data rows decoded across all row groups. Compare this against the expected row count from the system that produced the file. A mismatch indicates truncation or a file that was written by a failed job.
- Columns. The number of leaf columns in the Parquet schema. Each column becomes one header and one field per row in the CSV.
- Row Groups. The number of row groups in the file. Row groups are the horizontal partitions Parquet uses to organize large datasets. The file footer determines this count without decoding any data.
- Input Size. The size of the original
.parquetfile. This is the compressed, columnar representation. - CSV Size. The size of the generated CSV in memory. CSV is always larger than the equivalent Parquet โ typically 2โ10ร โ because it stores every value as uncompressed, row-oriented text.
Step 5 โ Review the Column Schema Table
Below the stats panel, a column schema table lists every column in the Parquet file alongside its Parquet type. Review this table before using the CSV output.
The most important cases to look for:
- DATE columns appear as integers in the CSV โ they represent the number of days since 1970-01-01. If your downstream tool expects formatted date strings (e.g.,
2026-04-14), you will need to transform the column after loading the CSV. - TIMESTAMP columns appear as large integers (microseconds or milliseconds since the epoch). The same post-processing caveat applies.
- DECIMAL columns appear as raw integers without the decimal scale applied. If you need the actual decimal value, divide by 10scale where scale is defined in the Parquet schema.
- BOOLEAN columns appear as
trueorfalsestrings. Most tools handle this correctly, but some SQL loaders may need a cast.
If the schema table is empty โ the converter could not detect typed columns โ the file may use a complex nested schema or an encoding combination not supported by the current version of the library. In that case, use pyarrow or pandas to perform the conversion.
Step 6 โ Download the CSV
Click the green Download CSV button. The browser saves the file to your default downloads folder. The filename is constructed by replacing the .parquet extension of the input file with .csv. For example, orders_2026_q1.parquet becomes orders_2026_q1.csv.
After downloading, open the file and spot-check a few rows before using it in production:
- Confirm the column count in the header matches the value shown in the stats panel.
- Confirm the row count using your spreadsheet's row count or a command like
wc -l orders_2026_q1.csv(subtract 1 for the header row). - Check any DATE or TIMESTAMP columns โ if they appear as large integers, note which columns need transformation.
- For columns with special characters, verify that quoting is correct โ fields containing commas should be wrapped in double quotes.
Troubleshooting Common Problems
"Invalid file โ Parquet magic bytes (PAR1) not found." The file does not have the expected PAR1 signature at its start or end. This usually means the file was truncated during download, was renamed with a .parquet extension but is actually another format (e.g., a CSV or JSON file), or was written by a failed Spark or Athena job that did not complete the file footer. Re-download the file from the source and try again. To confirm the file is valid Parquet, open it with the Parquet Validator first.
"Parse error โ see details below." The file passed magic byte validation but failed during footer parsing or row group decoding. The most common causes are: a Zstd- or LZ4-compressed file (not currently supported), a file using deeply nested repetition levels, or a file that was partially written. If the error message mentions a codec, try re-encoding the file with Snappy or Gzip using pyarrow: df = pd.read_parquet("file.parquet"); df.to_parquet("file_snappy.parquet", compression="snappy").
Row count mismatch. If the row count in the stats panel differs from the expected count from the source system, the file may have been written by a job that failed partway through, or only a partition of a multi-file dataset was converted. For multi-file Parquet datasets (Spark output directories containing multiple part-*.parquet files), convert each file separately and concatenate the CSVs, or use pyarrow's ParquetDataset to read the full directory at once.
Large file โ browser crashes or tab freezes. Files over 150 MB may exhaust browser memory on devices with limited RAM. If the tab crashes, convert the file using pandas: pd.read_parquet("large.parquet").to_csv("large.csv", index=False). This processes the file on disk without loading the full dataset into browser memory.
DATE / TIMESTAMP columns show integers. This is expected behavior โ see Step 5 above. In pandas, use pd.read_csv("file.csv", parse_dates=["date_column"]) to convert epoch-day integers to proper dates after loading.
Empty schema table. The converter could not detect leaf-level typed columns in the schema. This typically indicates a complex nested schema (LIST, MAP, or STRUCT types) or a very unusual file structure. For files with nested schemas, use pyarrow: import pyarrow.parquet as pq; table = pq.read_table("file.parquet"); table.to_pandas().to_csv("file.csv", index=False).
Worked Example
The following example shows a complete conversion using a small, representative Parquet file. You can follow along by creating the file with Python and then converting it using the tool.
Create the sample Parquet file (Python):
import pandas as pd
df = pd.DataFrame({
"order_id": [1001, 1002, 1003],
"customer": ["Alice", "Bob", "Carol"],
"amount": [149.99, 32.50, 210.00],
"shipped": [True, True, False],
"region": ["US-West", "US-East", None],
})
df.to_parquet("sample_orders.parquet", compression="snappy", index=False)
print("File created: sample_orders.parquet")
Run the script to produce sample_orders.parquet in your working directory. This creates a single-row-group Parquet file with 3 rows and 5 columns, compressed with Snappy.
Convert using the tool:
- Open the Parquet to CSV Converter.
- Drag
sample_orders.parquetonto the drop zone, or click browse and select it. - The filename bar shows: ๐ sample_orders.parquet.
- Click Convert to CSV.
- The stats panel shows: Rows: 3 ยท Columns: 5 ยท Row Groups: 1.
- The column schema table shows:
order_id(INT64),customer(BYTE_ARRAY),amount(DOUBLE),shipped(BOOLEAN),region(BYTE_ARRAY). - Click Download CSV. The file is saved as
sample_orders.csv.
Expected CSV output:
order_id,customer,amount,shipped,region 1001,Alice,149.99,true,US-West 1002,Bob,32.5,true,US-East 1003,Carol,210.0,false,
Note that the null value in the region field for Carol appears as an empty field in the CSV โ the last field on the third data row has nothing after the final comma. This is the RFC 4180 representation of a null value. Also note that 32.50 becomes 32.5 (trailing zero removed by JavaScript's number formatting) and 210.00 becomes 210.0. These are numerically identical values; the representation difference is a consequence of JavaScript's default Number.toString() behavior.
For a deeper explanation of the Parquet format, encoding types, and compression codecs, see the Complete Guide to Parquet To Csv.
