Import Tutorial
Many government agencies distribute official data and statistics in workbooks. SheetJS libraries help translate these files to useful information.
The goal of this example is to process Federal Student Aid Portfolio data from a XLS worksheet. We will download and parse a workbook from the US Department of Education. Once the raw data is parsed, we will extract the total outstanding dollar amount and display the data in a table.
The "Live Demo" section includes a working demo in this page! "Run the Demo Locally" shows how to run the workflow in iOS / Android apps, desktop apps, NodeJS scripts and other environments.
The following sequence diagram shows the process:
sequenceDiagram
actor U as User
participant P as Page
participant A as Site
U->>P: load site
P->>A: fetch file
A->>P: raw file
Note over P: parse file
Note over P: process data
Note over P: generate table
P->>U: show table
Download File​
The raw data is available in a XLS workbook1. It has been mirrored at https://docs.sheetjs.com/PortfolioSummary.xls
This official dataset is distributed in XLS workbooks.
SheetJS supports a number of legacy and modern formats, ensuring that historical data is not lost in the sands of time.
Downloading the file is straightforward with fetch:
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const file = await (await fetch(url)).arrayBuffer();
Code Explanation (click to show)
The file data is stored in an ArrayBuffer.
Parse File​
With the file data in hand, XLSX.read2 parses the workbook:
const workbook = XLSX.read(file);
The workbook object follows the "Common Spreadsheet Format"3, an in-memory
format for representing workbooks, worksheets, cells, and spreadsheet features.
Explore Dataset​
Spreadsheets in the wild use many different inconsistent conventions.
To determine how to process the data, it is best to inspect the file first.
List Sheet Names​
As explained in the "Workbook Object"4 section, the SheetNames property is
a ordered list of the sheet names in the workbook.
The following live code block displays an ordered list of the sheet names:
function SheetJSheetNames() {
const [names, setNames] = React.useState([]);
React.useEffect(() => { (async() =>{
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const file = await (await fetch(url)).arrayBuffer();
const workbook = XLSX.read(file);
/* display sheet names */
setNames(workbook.SheetNames);
})(); }, []);
return ( <>
<b>Sheet Names</b><br/>
<ol start={0}>{names.map(n => (<li>{n}</li>))}</ol>
</> )
}
Inspect Worksheet Data​
The Sheets property of the workbook object5 is an object whose keys are
sheet names and whose values are sheet objects. For example, the first worksheet
is pulled by indexing SheetNames and using the name to index Sheets:
var first_sheet = workbook.Sheets[workbook.SheetNames[0]];
The actual worksheet object can be inspected directly6, but it is strongly recommended to use utility functions to present JS-friendly data structures.
Preview HTML​
The sheet_to_html utility function7 generates an HTML table from worksheet
objects. The following live example shows the first 20 rows of data in a table:
Live example (click to show)
The key points from looking at the table are:
- The data starts on row 7
- Rows 5 and 6 are the header rows, with merged cells for common titles
- For yearly data (2007-2012), columns A and B are merged
- For quarterly data (2013Q1 and later), column A stores the year. Cells may be merged vertically to span 4 quarters
Extract Data​
Extract Raw Data​
XLSX.utils.sheet_to_json8 generates arrays of data from worksheet objects.
For a complex layout like this, it is easiest to generate an "array of arrays" where each row is an array of cell values. The screenshot shows rows 5-8:

In the array of arrays, row 5 has a number of gaps corresponding to empty cells and cells that are covered in the merge ranges:
// Row 5 -- the gaps correspond to cells with no content
[ , , "Direct Loans", , "Federal Family Education Loans (FFEL)", , "Perkins Loans", , "Total1" ]
Row 7 includes the data for FY2007:
// Row 7 -- column B is covered by the merge
[ 2007, , 106.8, 7, 401.9, 22.6, 8.2, 2.8, 516, 28.3 ]
XLSX.utils.sheet_to_json will generate an array of arrays if the option
header: 1 is specified9:
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header: 1});
Fill Merged Blocks​
Cells A13:A16 are merged:

The merged data only applies to the top-left cell (A13). The array of arrays
will have holes in cells A14:A16 (written as null):
// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]
// Row 14
[null, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]
// Row 15
[null, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]
// Row 16
[null, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
The worksheet !merges property10 includes every merge range in the sheet.
It is possible to loop through every merge block and fill cells, but in this
case it is easier to post-process the raw data:
let last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
JavaScript code can be extremely concise. The "Code Explanation" blocks explain the code in more detail.
Code Explanation (click to show)
After post-processing, the rows now have proper year fields:
// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]
// Row 14
[2013, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]
// Row 15
[2013, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]
// Row 16
[2013, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
Select Data Rows​
At this point, each data row will have the year in column A and dollar value
in column C. The year will be between 2007 and 2024 and the value will be
positive. The following function tests a data row:
const is_valid_row = r =>
r[0] >= 2007 && r[0] <= 2024 // year (column A) is between 2007 and 2024
&& r[2] > 0; // dollar value (column C) is positive
Array#filter, using the previous test, can select the matching rows:
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
Live example (click to show)
Generate Row Objects​
Looking at the headers:

The desired data is in column I. The column index can be calculated using
XLSX.utils.decode_col11.
Column Index calculation (click to show)
The desired columns are:
| Column | Description | Property in Object |
|---|---|---|
| A / 0 | Fiscal Year | FY |
| B / 1 | Fiscal Quarter (if applicable) | FQ |
| I / 8 | Total Dollars Outstanding | total |
An Array#map over the data can generate the desired row objects:
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
This will generate an array of row objects. Each row object will look like the following row:
// 2016 Q1 - $1220.3 (billion)
{ "FY": 2016, "FQ": "Q1", "total": 1220.3 }
Live example (click to show)
Present Data​
At this point, objects is an array of objects.
ReactJS​
The live demos in this example use ReactJS. In ReactJS, arrays of objects are best presented in simple HTML tables12:
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody>
{objects.map((o,R) => ( <tr key={R}>
<td>{o.FY}</td>
<td>{o.FQ}</td>
<td>{o.total}</td>
</tr>))}
</tbody>
</table>
Vanilla JS​
https://sheetjs.com/sl.html is a hosted version of this demo.
Without a framework, HTML table row elements can be programmatically created
with document.createElement and added to the table body element. For example,
if the page has a stub table:
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>
TR elements can be added to the table body using appendChild:
/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = `<td>${o.FY}</td><td>${o.FQ||""}</td><td>${o.total}</td>`;
tbody.appendChild(row);
});
Command-Line Tools​
In the command line, there are ways to display data in a table:
FY FQ Total
-- -- -----
2007 516
2013 Q1 961.9
For data pipelines, tab-separated rows are strongly recommended:
/* print header row*/
console.log(`FY\tFQ\tTotal`);
/* print tab-separated values */
objects.forEach(o => {
console.log(`${o.FY}\t${o.FQ||""}\t${o.total}`);
});
Live Demo​
This demo runs in the web browser! It should automatically fetch the data file and display a table.
This example includes a row count that can be increased or decreased
function StudentAidTotal() {
const [rows, setRows] = React.useState([]);
const [num, setNum] = React.useState(5);
React.useEffect(() => { (async() =>{
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
/* display data */
setRows(objects);
})(); }, []);
return ( <>
<a onClick={()=>{setNum(Math.max(num-5,0))}}>Show Less </a>
<b>Showing {num} rows </b>
<a onClick={()=>{setNum(num+5)}}>Show More</a>
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody>
{rows.slice(0, num).map((o,R) => ( <tr key={R}>
<td>{o.FY}</td>
<td>{o.FQ}</td>
<td>{o.total}</td>
</tr>))}
</tbody>
</table>
</> );
}
Run the Demo Locally​
- Web Browser
- Command-Line (NodeJS)
- Desktop App
- Mobile App
Save the following script to SheetJSStandaloneDemo.html:
<body>
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>
<script>
(async() => {
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = ``;
tbody.appendChild(row);
});
})();
</script>
</body>
After saving the file, run a local web server in the folder with the HTML file. For example, if NodeJS is installed:
npx http-server .
The server process will display a URL (typically http://127.0.0.1:8080). Open
http://127.0.0.1:8080/SheetJSStandaloneDemo.html in your browser.
Install the dependencies:
- NodeJS
- Bun
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
bun install https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
Save the following script to SheetJSNodeJS.js:
const XLSX = require("xlsx");
(async() => {
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
/* print header row*/
console.log(`FY\tQ\tTotal`);
/* print tab-separated values */
objects.forEach(o => {
console.log(`${o.FY}\t${o.FQ||""}\t${o.total}`);
});
})();
After saving the script, run the script:
- NodeJS
- Bun
node SheetJSNodeJS.js
bun run SheetJSNodeJS.js
This script will print the rows in tab-separated values (TSV) format:
FY Q Total
2007 516
2008 577
...
2013 Q1 961.9
2013 Q2 998.6
2013 Q3 1006.8
...
Save the following script to SheetJSNW.html:
<body>
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>
<script>
(async() => {
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());
/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});
/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = ``;
tbody.appendChild(row);
});
})();
</script>
</body>
Save the following to package.json:
{
"name": "sheetjs-nwjs",
"author": "sheetjs",
"version": "0.0.0",
"main": "SheetJSNW.html",
"dependencies": {
"nw": "0.77.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz"
}
}
Install dependencies and run:
npm i
npx nw .
The app will show the data in a table.
Follow the Environment Setup of the React Native documentation before testing the demo.
For Android testing, React Native requires Java 11. It will not work with current Java releases.
In React Native, there are a number of ways to display rows of data. This demo
uses the native FlatList component.
Create a new project by running the following commands in the Terminal:
npx react-native@0.72.4 init SheetJSSL --version="0.72.4"
cd SheetJSSL
npm i -S https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
Save the following to App.tsx in the project:
import React, { useState } from 'react';
import { Alert, Button, SafeAreaView, Text, View, FlatList } from 'react-native';
import { utils, version, read } from 'xlsx';
const Item = ({FY, FQ, total}) => (
<View style={{borderColor: "#000000", borderWidth: 1}}>
<Text style={{fontSize: 12}}>{String(FY)} {String(FQ||"")} : ${String(total)} B</Text>
</View>
);
const App = () => {
const [rows, setRows] = React.useState([]);
React.useEffect(() => { (async() =>{
/* parse workbook */
const url = "https://docs.sheetjs.com/PortfolioSummary.xls";
const workbook = read(await (await fetch(url)).arrayBuffer());
/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = utils.sheet_to_json(worksheet, {header:1});
/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));
/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2024 && r[2] > 0);
/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));
/* display data */
setRows(objects);
})(); }, []);
return ( <SafeAreaView>
<View style={{ marginTop: 32, padding: 24 }}>
<Text style={{ fontSize: 24, fontWeight: 'bold' }}>SheetJS {version} Import Demo</Text>
<FlatList
data={rows}
renderItem={({item}) => <Item FQ={item.FQ} FY={item.FY} total={item.total} />}
keyExtractor={item => String(item.FY) + (item.FQ||"")}
/>
</View>
</SafeAreaView> );
}
export default App;
- Android
- iOS
The Android demo has been tested in Windows 10 and in macOS.
Test the app in the Android simulator:
npx react-native start
Once Metro is ready, it will display the commands:
r - reload the app
d - open developer menu
i - run on iOS
a - run on Android
Press a to run on Android.
The demo also runs on real Android devices! After enabling USB debugging13, the Android device can be connected to the computer with a USB cable.
This demo runs in iOS and requires a Macintosh computer with Xcode installed.
Test the app in the iOS simulator:
npm run ios
When the app is loaded, the data will be displayed in rows.
Footnotes​
-
The dataset URL has changed many times over the years. The current location for the CC0-licensed dataset can be found by searching for "National Student Loan Data System" on
data.gov.PortfolioSummary.xlsis the file name within the dataset. ↩ -
See "Workbook Object" ↩
-
See "Workbook Object" ↩
-
See "Sheet Objects" ↩
-
See "Running on Device" in the React Native documentation for more details. ↩