Skip to main content

Data Import

The "Import Tutorial" is a gentle introduction to data import and analysis.

Parsing Workbooks​

API​

Extract data from spreadsheet bytes

var workbook = XLSX.read(data, opts);

The read method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).

Read spreadsheet bytes from a local file and extract data

var workbook = XLSX.readFile(filename, opts);

The readFile method attempts to read a spreadsheet file at the supplied path.

The second opts argument is optional. "Parsing Options" covers the supported properties and behaviors.

Browsers generally do not allow reading files by specifying filename (it is a security risk), and running XLSX.readFile in the browser will throw an error.

Deno scripts must be invoked with --allow-read to read from the filesystem.

Examples​

Here are a few common scenarios (click on each subtitle to see the code).

The demos cover special deployments in more detail.

Example: Local File​

XLSX.readFile supports reading local files in platforms like NodeJS. In other platforms like React Native, XLSX.read should be called with file data.

In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.

readFile uses fs.readFileSync under the hood:

var XLSX = require("xlsx");

var workbook = XLSX.readFile("test.xlsx");

For Node ESM, fs must be loaded manually:

import * as fs from "fs";
import { readFile, set_fs } from "xlsx";
set_fs(fs);

const workbook = readFile("test.xlsx");

Example: User Submissions​

This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.

For modern websites targeting Chrome 76+, File#arrayBuffer is recommended:

Assume drop_dom_element is the DOM element that will listen for changes:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. The code snippet highlights the difference between the drag-and-drop example and the file input example:

// XLSX is a global from the standalone script

async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);

https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.

For maximal compatibility (IE10+), the FileReader approach is recommended:

Assume drop_dom_element is the DOM element that will listen for changes:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. The code snippet highlights the difference between the drag-and-drop example and the file input example:

function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);

The oldie demo shows an IE-compatible fallback scenario.

Example: Remote File​

This example focuses on fetching files ("Ajax" in browser parlance) using APIs like XMLHttpRequest and fetch as well as third-party libraries.

For modern websites targeting Chrome 42+, fetch is recommended:

// XLSX is a global from the standalone script

(async() => {
const url = "https://docs.sheetjs.com/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
})();

For broader support, the XMLHttpRequest approach is recommended:

var url = "https://docs.sheetjs.com/pres.xlsx";

/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";

req.onload = function(e) {
var workbook = XLSX.read(req.response);

/* DO SOMETHING WITH workbook HERE */
};

req.send();

The HTTP Downloads demo includes examples using browser APIs and wrapper libraries.

https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.

Example: Readable Streams​

The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.

Technical details (click to show)

XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.

XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.

When dealing with ReadableStream, the easiest approach is to buffer the stream and process the whole thing at the end:

// XLSX is a global from the standalone script

async function buffer_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}

/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));

let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}

return out;
}

const data = await buffer_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);

More detailed examples are covered in the included demos

Processing JSON and JS Data​

JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.

Create a new Workbook

var workbook = XLSX.utils.book_new();

The book_new utility function creates an empty workbook with no worksheets.

API​

Create a worksheet from an array of arrays of JS values

var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);

The aoa_to_sheet utility function walks an "array of arrays" in row-major order, generating a worksheet object. The following snippet generates a sheet with cell A1 set to the string A1, cell B1 set to B1, etc:

var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);

"Array of Arrays Input" describes the function and the optional opts argument in more detail.

Create a worksheet from an array of JS objects

var worksheet = XLSX.utils.json_to_sheet(jsa, opts);

The json_to_sheet utility function walks an array of JS objects in order, generating a worksheet object. By default, it will generate a header row and one row per object in the array. The optional opts argument has settings to control the column order and header output.

"Array of Objects Input" describes the function and the optional opts argument in more detail.

Examples​

"Export Tutorial" contains a detailed example of fetching data from a JSON Endpoint and generating a workbook.

x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser.

"TensorFlow.js" covers strategies for creating worksheets from ML library exports (datasets stored in Typed Arrays).

Records from a database query (SQL or no-SQL) (click to show)

The data demo includes examples of working with databases and query results.

Processing HTML Tables​

API​

Create a worksheet by scraping an HTML TABLE in the page

var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);

The table_to_sheet utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts argument is optional. "HTML Table Input" describes the function in more detail.

Create a workbook by scraping an HTML TABLE in the page

var workbook = XLSX.utils.table_to_book(dom_element, opts);

The table_to_book utility function follows the same logic as table_to_sheet. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.

The options argument supports the same options as table_to_sheet, with the addition of a sheet property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1 is used.

Examples​

The Headless Demo includes examples of server-side spreadsheet generation from HTML TABLE elements using headless Chromium ("Puppeteer") and other browsers ("Playwright")

Here are a few common scenarios (click on each subtitle to see the code):

HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim -->
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>

<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>

<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));

/* DO SOMETHING WITH workbook HERE */
XLSX.writeFile(workbook, "SheetJSHTMLExport.xlsx");
</script>

Multiple tables on a web page can be converted to individual worksheets:

/* create new workbook */
var workbook = XLSX.utils.book_new();

/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");

/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");

/* workbook now has 2 worksheets */

Alternatively, the HTML code can be extracted and parsed:

var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)

The "Chrome and Chromium" demo includes a complete example and enumerates the required permissions and other settings.

In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:

/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
NodeJS HTML Tables without a browser (click to show)

NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. The "Synthetic DOM" demo includes examples for NodeJS.