Skip to main content
torch.js has not been released yet.
torch.js logotorch.js logotorch.js
PlaygroundContact
Login
Documentation
IntroductionType SafetyTensor ExpressionsTensor IndexingEinsumEinopsAutogradTraining a ModelProfiling & MemoryPyTorch MigrationBest PracticesRuntimesPerformancePyTorch CompatibilityBenchmarksDType Coverage
torch.js· 2026
LegalTerms of UsePrivacy Policy
/
/
  1. docs
  2. Spark
  3. spark
  4. dataset

spark.dataset

function dataset(path: string): Promise<SparkDataset>

Load a dataset from torchjs.org with automatic batching.

Datasets on torchjs.org are defined by a torch.json manifest file that specifies how to load and batch the data. This function loads the manifest and returns a dataset object with train, test, and val splits that support batching.

The dataset function handles:

  • Image classification datasets (separate image and label files)
  • Text datasets (single file with character or token-based tokenization)
  • Automatic train/test/val split creation
  • Batching with automatic tensor creation

Common use cases:

  • Loading benchmark datasets (MNIST, CIFAR, etc.)
  • Loading custom datasets from torchjs.org
  • Interactive data exploration and training
  • Testing models on different splits
Dataset manifests (torch.json) must define: - name: Dataset name - description: Human-readable description - dataset.splits: Object with train/test/val split configs - For images: image&#95;size, dtype, separate images/labels files per split - For text: tokenizer, format, source file per split
Very large datasets may not fit in memory. Consider using smaller batches or streaming datasets for production use cases.

Parameters

pathstring
Path to project containing torch.json (e.g., "kasumi/mnist" or "username/project")

Returns

Promise<SparkDataset>– Promise that resolves to a dataset object with train/test/val splits

Examples

// Load MNIST dataset
const data = await spark.dataset('kasumi/mnist');

// Train loop with batching
for (const { x, y } of data.train.batch(64)) {
  // x: Tensor [64, 784] - normalized to float32
  // y: Tensor [64] - int32 labels
  const logits = model(x);
  const loss = criterion(logits, y);
  // ... backward pass
}

// Evaluate on test set
let accuracy = 0;
for (const { x, y } of data.test.batch(256)) {
  const pred = model(x).argmax(1);
  accuracy += (pred === y).sum().item();
}
accuracy /= data.size.test;

// Get single sample
const { x, y } = await data.train.get(0);
console.log('First sample:', x.shape, y);
Previous
createWorkerRpc
Next
deleteFile