Large Datasets
For maximal compatibility, SheetJS API functions read entire files into memory and write files in memory. Browsers and other JS engines enforce tight memory limits. The library offers alternate strategies to optimize for memory usage.
Dense Mode​
Dense mode worksheets, which store cells in arrays of arrays, are designed to work around Google Chrome performance regressions. For backwards compatibility, dense mode worksheets are not created by default.
read, readFile and aoa_to_sheet accept the dense option. When enabled,
the methods create worksheet objects that store cells in arrays of arrays:
var dense_wb = XLSX.read(ab, {dense: true});
var dense_sheet = XLSX.utils.aoa_to_sheet(aoa, {dense: true});
Historical Note (click to show)
The various API functions will seamlessly handle dense and sparse worksheets.
Streaming Write​
The streaming write functions are available in the XLSX.stream object. They
take the same arguments as the normal write functions:
XLSX.stream.to_csvis the streaming version ofXLSX.utils.sheet_to_csv.XLSX.stream.to_htmlis the streaming version ofXLSX.utils.sheet_to_html.XLSX.stream.to_jsonis the streaming version ofXLSX.utils.sheet_to_json.XLSX.stream.to_xlmlis the streaming SpreadsheetML2003 workbook writer.
These functions are covered in the "Stream Export" section.
This feature was expanded in version 0.20.3. It is strongly recommended to
upgrade to the latest version.
NodeJS​
In a CommonJS context, NodeJS Streams and fs immediately work with SheetJS:
const XLSX = require("xlsx"); // "just works"
In NodeJS ESM, the dependency must be loaded manually:
import * as XLSX from 'xlsx';
import { Readable } from 'stream';
XLSX.stream.set_readable(Readable); // manually load stream helpers
Additionally, for file-related operations in NodeJS ESM, fs must be loaded:
import * as XLSX from 'xlsx';
import * as fs from 'fs';
XLSX.set_fs(fs); // manually load fs helpers
It is strongly encouraged to use CommonJS in NodeJS whenever possible.
Text Streams​
to_csv, to_html, and to_xlml emit strings. The data can be directly pushed
to a Writable stream. fs.createWriteStream1 is the recommended approach
for streaming to a file in NodeJS.
This example reads a worksheet passed as an argument to the script, pulls the
first worksheet, converts to CSV and writes to SheetJSNodeJStream.csv:
var XLSX = require("xlsx"), fs = require("fs");
/* read file */
var wb = XLSX.readFile(process.argv[2]), {dense: true};
/* get first worksheet */
var ws = wb.Sheets[wb.SheetNames[0]];
/* create CSV stream */
var csvstream = XLSX.stream.to_csv(ws);
/* create output stream */
var ostream = fs.createWriteStream("SheetJSNodeJStream.csv");
/* write data from CSV stream to output file */
csvstream.pipe(ostream);
Object Streams​
to_json uses Object-mode streams2. A Transform stream3 can be used to
generate a text stream for streaming to a file or the screen.
The following example prints data by writing to the process.stdout stream:
var XLSX = require("xlsx"), Transform = require("stream").Transform;
/* read file */
var wb = XLSX.readFile(process.argv[2], {dense: true});
/* get first worksheet */
var ws = wb.Sheets[wb.SheetNames[0]];
/* this Transform stream converts JS objects to text */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };
/* pipe `to_json` -> transformer -> standard output */
XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout);
BunJS​
BunJS is directly compatible with NodeJS streams.
Great open source software grows with user tests and reports. Any issues should be reported to the Bun project for further diagnosis.
NodeJS Demo​
This demo was tested in the following deployments:
| Node Version | Date | Node Status when tested |
|---|---|---|
0.12.18 | 2024-07-18 | End-of-Life |
4.9.1 | 2024-07-18 | End-of-Life |
6.17.1 | 2024-07-18 | End-of-Life |
8.17.0 | 2024-07-18 | End-of-Life |
10.24.1 | 2024-07-18 | End-of-Life |
12.22.12 | 2024-07-18 | End-of-Life |
14.21.3 | 2024-07-18 | End-of-Life |
16.20.2 | 2024-07-18 | End-of-Life |
18.20.4 | 2024-07-18 | Maintenance LTS |
20.15.1 | 2024-07-18 | Active LTS |
22.5.0 | 2024-07-18 | Current |
While streaming methods work in End-of-Life versions of NodeJS, production deployments should upgrade to a Current or LTS version of NodeJS.
This demo was also tested against BunJS 1.1.18 on 2024-07-18.
- Install the NodeJS module
- NodeJS
- BunJS
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
bun i --save xlsx@https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
- Download
SheetJSNodeJStream.js:
curl -LO https://docs.sheetjs.com/stream/SheetJSNodeJStream.js
- Download the test file:
curl -LO https://docs.sheetjs.com/pres.xlsx
- Run the script:
- NodeJS
- BunJS
node SheetJSNodeJStream.js pres.xlsx
bun SheetJSNodeJStream.js pres.xlsx
Expected Output (click to show)
Browser​
Each browser demo was tested in the following environments:
| Browser | Date |
|---|---|
| Chrome 126 | 2024-07-18 |
| Safari 17.4 | 2024-07-18 |
NodeJS streaming APIs are not available in the browser. The following function
supplies a pseudo stream object compatible with the to_csv function:
function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods
_read: function() { this.__done = true; },
// this function is called by the stream methods
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}
// assuming `workbook` is a workbook, stream the first sheet
const ws = workbook.Sheets[workbook.SheetNames[0]];
const strm = sheet_to_csv_cb(ws, (csv)=>{ if(csv != null) console.log(csv); });
strm.resume();
Web Workers​
For processing large files in the browser, it is strongly encouraged to use Web Workers. The Worker demo includes examples using the File System Access API.
Web Worker Details (click to show)
Live Demo​
The following live demo fetches and parses a file in a Web Worker. The to_csv
streaming function is used to generate CSV rows and pass back to the main thread
for further processing.
For Chromium browsers, the File System Access API provides a modern worker-only approach. The Web Workers demo includes a live example of CSV streaming write.
The demo has a URL input box. Feel free to change the URL. For example,
https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls
is an XLS file over 50 MB
https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx
is an XLSX file with 300000 rows (approximately 20 MB)
function SheetJSFetchCSVStreamWorker() {
const [__html, setHTML] = React.useState("");
const [state, setState] = React.useState("");
const [cnt, setCnt] = React.useState(0);
const [url, setUrl] = React.useState("https://docs.sheetjs.com/test_files/large_strings.xlsx");
return ( <>
<b>URL: </b><input type="text" value={url} onChange={(e) => setUrl(e.target.value)} size="80"/>
<button onClick={() => {
/* this mantra embeds the worker source in the function */
const worker = new Worker(URL.createObjectURL(new Blob([`\
/* load standalone script from CDN */
importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js");
function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods
_read: function() { this.__done = true; },
// this function is called by the stream methods
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}
/* this callback will run once the main context sends a message */
self.addEventListener('message', async(e) => {
try {
postMessage({state: "fetching " + e.data.url});
/* Fetch file */
const res = await fetch(e.data.url);
const ab = await res.arrayBuffer();
/* Parse file */
let len = ab.byteLength;
if(len < 1024) len += " bytes"; else { len /= 1024;
if(len < 1024) len += " KB"; else { len /= 1024; len += " MB"; }
}
postMessage({state: "parsing " + len});
const wb = XLSX.read(ab, {dense: true});
const ws = wb.Sheets[wb.SheetNames[0]];
/* Generate CSV rows */
postMessage({state: "csv"});
const strm = sheet_to_csv_cb(ws, (csv) => {
if(csv != null) postMessage({csv});
else postMessage({state: "done"});
});
strm.resume();
} catch(e) {
/* Pass the error message back */
postMessage({error: String(e.message || e) });
}
}, false);
`])));
/* when the worker sends back data, add it to the DOM */
worker.onmessage = function(e) {
if(e.data.error) return setHTML(e.data.error);
else if(e.data.state) return setState(e.data.state);
setHTML(e.data.csv);
setCnt(cnt => cnt+1);
};
setCnt(0); setState("");
/* post a message to the worker with the URL to fetch */
worker.postMessage({url});
}}><b>Click to Start</b></button>
<pre>State: <b>{state}</b><br/>Number of rows: <b>{cnt}</b></pre>
<pre dangerouslySetInnerHTML={{ __html }}/>
</> );
}
Deno​
Deno does not support NodeJS streams in normal execution, so a wrapper is used:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import { stream } from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
/* Callback invoked on each row (string) and at the end (null) */
const csv_cb = (d:string|null) => {
if(d == null) return;
/* The strings include line endings, so raw write ops should be used */
Deno.stdout.write(new TextEncoder().encode(d));
};
/* Prepare `Readable` function */
const Readable = () => ({
__done: false,
// this function will be assigned by the SheetJS stream methods
_read: function() { this.__done = true; },
// this function is called by the stream methods
push: function(d: any) {
if(!this.__done) csv_cb(d);
if(d == null) this.__done = true;
},
resume: function pump() {
for(var i = 0; i < 1000 && !this.__done; ++i) this._read();
if(!this.__done) setTimeout(pump.bind(this), 0);
}
})
/* Wire up */
stream.set_readable(Readable);
/* assuming `workbook` is a workbook, stream the first sheet */
const ws = workbook.Sheets[workbook.SheetNames[0]];
stream.to_csv(wb.Sheets[wb.SheetNames[0]]).resume();
This demo was last tested on 2024-07-18 against Deno 1.45.2.
SheetJSDenoStream.ts is a small
example script that downloads https://docs.sheetjs.com/pres.numbers and prints
CSV row objects.
- Run the script:
deno run -A https://docs.sheetjs.com/stream/SheetJSDenoStream.ts
This script will fetch pres.numbers and
generate CSV rows. The result will be printed to the terminal window.
Footnotes​
-
See
fs.createWriteStreamin the NodeJS documentation. ↩ -
See "Object mode" in the NodeJS documentation. ↩