Spreadsheet Processing in Mathematica
Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.
This demo was tested by SheetJS users in the following deployments:
Architecture | Version | Date |
---|---|---|
darwin-x64 | 14.0 | 2024-06-05 |
win10-x64 | 14.0 | 2024-06-05 |
Integration Details​
The SheetJS NodeJS module can be
loaded in NodeJS scripts, including scripts invoked using the "NodeJS"
mode
of the ExternalEvaluate
1 Mathematica function.
However, the current cross-platform recommendation involves a dedicated command line tool that leverages SheetJS libraries to to perform spreadsheet processing.
External Engines​
The following diagram depicts the workbook waltz:
flowchart LR
subgraph `ExternalEvaluate`
file[(workbook\nfile)]
csvstr(CSV\nString)
end
data[(Dataset)]
file --> |NodeJS\nSheetJS Ops| csvstr
csvstr --> |ImportString\nMathematica| data
Mathematica
NodeJS can be activated from Mathematica using RegisterExternalEvaluator
2.
Once activated, JavaScript code can be run using ExternalEvaluate
3. If the
NodeJS code returns CSV data, ImportString
4 can generate a Dataset
5.
SheetJS
For a file residing on the filesystem, the SheetJS readFile
function6 can
generate a workbook object. The exact location can be determined by printing
require("process").cwd()
7 in ExternalEvaluate
:
In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"
After pulling the first worksheet8, the SheetJS sheet_to_csv
function9
generates a CSV string.
Complete Function
The following function reads a file, parses the first worksheet and returns a Dataset object assuming one header row.
- Linux/MacOS
- Windows
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
Command-Line Tools​
The "Command-Line Tools" demo creates xlsx-cli
, a
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
ExternalEvaluate
10 can run command-line tools and capture standard output.
The following snippet processes ~/Downloads/pres.numbers
and pulls CSV data
into a variable in Mathematica:
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
ImportString
11 can interpret the CSV data as a Dataset
12. Typically the
first row of the CSV output is the header row. The HeaderLines
13 option
controls how Mathematica parses the data:
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
The following diagram depicts the workbook waltz:
flowchart LR
subgraph `ExternalEvaluate`
file[(workbook\nfile)]
csvstr(CSV\nString)
end
data[(Dataset)]
file --> |`xlsx-cli`\nSheetJS Ops| csvstr
csvstr --> |ImportString\nMathematica| data
Complete Demo​
This demo tests the NodeJS external engine and dedicated command line tools.
NodeJS Engine​
-
Install NodeJS. When the demo was tested, version
20.14.0
was installed. -
Install dependencies in the Home folder (
~
or$HOME
or%HOMEPATH%
):
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz zeromq@6.0.0-beta.19
- Open a new Mathematica Notebook and register NodeJS. When the example was tested in Windows, the commands were:
- Linux/MacOS
- Windows
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]
The second argument to RegisterExternalEvaluator
should be the path to the
node
program, which can be found by running the following command in a new
terminal window:
which node
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"]
FindExternalEvaluators["NodeJS"]
The second argument to RegisterExternalEvaluator
should be the path to the
node.exe
program, which can be found by running the following command in a new
PowerShell window:
Get-Command node.exe
If NodeJS is registered, the value in the "Registered" column will be "True".
- To determine the base folder, run
require("process").cwd()
from NodeJS:
ExternalEvaluate["NodeJS", "require('process').cwd()"]
-
Download
pres.numbers
and move the file to the base folder as shown in the previous step. -
Copy, but do not run, the following snippet into the running notebook:
- Linux/MacOS
- Windows
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
After pasting, edit the highlighted line to reflect the path of the node
or
node.exe
binary. This path was discovered in Step 2.
After editing the snippet, run the expression.
- Run the function and confirm the result is a proper Dataset:
SheetJSImportFileEE["pres.numbers"]
Standalone Binary​
- Create the standalone
xlsx-cli
binary14. The commands should be run in a Terminal or PowerShell window:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz exit-on-epipe commander@2
curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js
npx -y nexe -t 14.15.3 xlsx-cli.js
- Linux/MacOS
- Windows
- Move the generated
xlsx-cli
to a fixed location in/usr/local/bin
:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
If there are permission errors, the command should be run with the root user:
sudo mv xlsx-cli /usr/local/bin/
- Find the current directory:
pwd
The generated binary will be xlsx-cli.exe
in the displayed path.
Reading a Local File​
- Linux/MacOS
- Windows
- In a new Mathematica notebook, run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
- In a new Mathematica notebook, copy but do not run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
Change /usr/local/bin/xlsx-cli
in the string to the path to the generated
xlsx-cli.exe
binary. For example, if the path in step 2 was
C:\Users\Me\Documents\
, then the code should be:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"C:\\Users\\Me\\Documents\\xlsx-cli.exe " <> x
]], "Dataset", "HeaderLines" -> 1]
Mathematica requires the \
characters must be doubled.
After making the change, run the snippet.
- Download https://docs.sheetjs.com/pres.numbers and save to Downloads folder:
cd ~/Downloads/
curl -LO https://docs.sheetjs.com/pres.numbers
- In the Mathematica notebook, run the new function. If the file was saved to
the Downloads folder, the path will be
"~/Downloads/pres.numbers"
in macOS:
- Linux/MacOS
- Windows
data = SheetJSImportFile["~/Downloads/pres.numbers"]
On Windows, the absolute path to the file must be used. To find this path, run the following commands in PowerShell:
cd $HOME\Downloads
pwd
Append \\pres.numbers
to the displayed path. For example, if the path was
C:\Users\Me\Downloads
, the command will be
data = SheetJSImportFile["C:\\Users\\Me\\Downloads\\pres.numbers"]
The \
characters must be doubled.
The result should be displayed in a concise table.
Reading from a URL​
FetchURL
15 downloads a file from a specified URL and returns a path to the
file. This function will be wrapped in a new function called SheetJSImportURL
.
- In the same notebook, run the following:
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
path = FetchURL[x];
SheetJSImportFile[path]
)];
- Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://docs.sheetjs.com/pres.numbers"]
Footnotes​
-
See the
ExternalEvaluate
Node.js example in the Mathematica documentation. ↩ -
See
RegisterExternalEvaluator
in the Mathematica documentation. ↩ -
See
ExternalEvaluate
in the Mathematica documentation. ↩ -
See
ImportString
in the Mathematica documentation. ↩ -
A
Dataset
will be created when using the"Dataset"
element inImportString
↩ -
See
process.cwd()
in the NodeJS documentation. ↩ -
The
Sheets
andSheetNames
properties of workbook objects are described in "Workbook Object" ↩ -
See
ExternalEvaluate
in the Mathematica documentation. ↩ -
See
ImportString
in the Mathematica documentation. ↩ -
A
Dataset
will be created when using the"Dataset"
element inImportString
↩ -
See
HeaderLines
in the Mathematica documentation. ↩ -
See "Command-line Tools" for more details. ↩
-
Mathematica 11 introduced new methods including
URLRead
. ↩