TypeScript and AI: How to Create Type-Safe Machine Learning Applications That Avoid Production Bugs

Hello HaWkers, have you ever lost hours debugging a production error only to discover you were passing an array of numbers when the AI model expected a 2D matrix? Or had to deal with a crash because tensor data types did not match?

If you work with Machine Learning in JavaScript, you have probably been through this. The good news is that TypeScript is completely changing this scenario, bringing type safety to the chaotic world of AI.

The Problem of Doing AI without TypeScript

Machine Learning in pure JavaScript is like walking through a minefield. Here are some common problems:

Tensor Dimension Errors: You create a tensor [batch, height, width, channels] but accidentally pass [height, width, channels]. The code compiles, but explodes at runtime.

Incompatible Types: Your model expects float32 but you pass int8. Silently, results become wrong.

Invalid Configurations: You configure learning rate as string "0.01" instead of number 0.01. Training fails mysteriously.

Lack of Autocomplete: Without types, you do not know which methods/properties exist. You keep consulting documentation at every line.

Impossible Refactoring: Change function signature that processes model data? Good luck finding all places that need updating.

TypeScript solves all these problems.

Creating Safe Types for AI Models

Let us start by defining robust types to work with Machine Learning:

// Base types for tensors
type TensorShape = number[];

type DType = 'float32' | 'int32' | 'int8' | 'uint8' | 'bool' | 'complex64' | 'string';

interface TensorConfig<T extends DType = 'float32'> {
  shape: TensorShape;
  dtype: T;
  data: T extends 'float32' ? Float32Array
      : T extends 'int32' ? Int32Array
      : T extends 'int8' ? Int8Array
      : T extends 'uint8' ? Uint8Array
      : T extends 'bool' ? Uint8Array
      : T extends 'string' ? string[]
      : never;
}

// Generic type for Tensor
interface Tensor<
  Shape extends TensorShape = TensorShape,
  Type extends DType = 'float32'
> {
  shape: Shape;
  dtype: Type;
  size: number;
  data(): Promise<TensorConfig<Type>['data']>;
  dispose(): void;
  reshape<NewShape extends TensorShape>(newShape: NewShape): Tensor<NewShape, Type>;
}

// Types for neural network layers
interface DenseLayerConfig {
  units: number;
  activation?: 'relu' | 'sigmoid' | 'tanh' | 'softmax' | 'linear';
  useBias?: boolean;
  kernelInitializer?: 'glorotNormal' | 'heNormal' | 'ones' | 'zeros';
  biasInitializer?: 'zeros' | 'ones';
  kernelRegularizer?: RegularizerConfig;
}

interface RegularizerConfig {
  type: 'l1' | 'l2' | 'l1l2';
  l1?: number;
  l2?: number;
}

interface Conv2DLayerConfig {
  filters: number;
  kernelSize: [number, number] | number;
  strides?: [number, number] | number;
  padding?: 'valid' | 'same';
  activation?: DenseLayerConfig['activation'];
  dataFormat?: 'channelsFirst' | 'channelsLast';
}

// Type for model configuration
interface ModelConfig {
  layers: Array<DenseLayerConfig | Conv2DLayerConfig>;
  optimizer: OptimizerConfig;
  loss: LossFunction;
  metrics?: MetricFunction[];
}

type OptimizerConfig = {
  type: 'adam';
  learningRate: number;
  beta1?: number;
  beta2?: number;
  epsilon?: number;
} | {
  type: 'sgd';
  learningRate: number;
  momentum?: number;
} | {
  type: 'rmsprop';
  learningRate: number;
  decay?: number;
  momentum?: number;
};

type LossFunction =
  | 'categoricalCrossentropy'
  | 'binaryCrossentropy'
  | 'meanSquaredError'
  | 'meanAbsoluteError';

type MetricFunction =
  | 'accuracy'
  | 'categoricalAccuracy'
  | 'binaryAccuracy'
  | 'precision'
  | 'recall';

Now, let us create a type-safe class to work with models:

import * as tf from '@tensorflow/tfjs';

class TypeSafeModel<
  InputShape extends TensorShape,
  OutputShape extends TensorShape
> {
  private model: tf.LayersModel | null = null;
  private inputShape: InputShape;
  private outputShape: OutputShape;
  private isCompiled: boolean = false;

  constructor(
    inputShape: InputShape,
    outputShape: OutputShape,
    config: ModelConfig
  ) {
    this.inputShape = inputShape;
    this.outputShape = outputShape;
    this.buildModel(config);
  }

  private buildModel(config: ModelConfig): void {
    const input = tf.input({ shape: this.inputShape });
    let layer: tf.SymbolicTensor = input;

    // Add layers with type safety
    for (const layerConfig of config.layers) {
      if ('units' in layerConfig) {
        // Dense layer
        layer = tf.layers.dense({
          units: layerConfig.units,
          activation: layerConfig.activation,
          useBias: layerConfig.useBias ?? true,
          kernelInitializer: layerConfig.kernelInitializer ?? 'glorotNormal',
          biasInitializer: layerConfig.biasInitializer ?? 'zeros'
        }).apply(layer) as tf.SymbolicTensor;
      } else if ('filters' in layerConfig) {
        // Conv2D layer
        layer = tf.layers.conv2d({
          filters: layerConfig.filters,
          kernelSize: layerConfig.kernelSize,
          strides: layerConfig.strides,
          padding: layerConfig.padding,
          activation: layerConfig.activation
        }).apply(layer) as tf.SymbolicTensor;
      }
    }

    this.model = tf.model({ inputs: input, outputs: layer });
  }

  compile(optimizer: OptimizerConfig, loss: LossFunction, metrics?: MetricFunction[]): void {
    if (!this.model) {
      throw new Error('Model was not built');
    }

    const tfOptimizer = this.createOptimizer(optimizer);

    this.model.compile({
      optimizer: tfOptimizer,
      loss: loss,
      metrics: metrics
    });

    this.isCompiled = true;
  }

  private createOptimizer(config: OptimizerConfig): tf.Optimizer {
    switch (config.type) {
      case 'adam':
        return tf.train.adam(
          config.learningRate,
          config.beta1,
          config.beta2,
          config.epsilon
        );
      case 'sgd':
        return tf.train.sgd(config.learningRate);
      case 'rmsprop':
        return tf.train.rmsprop(
          config.learningRate,
          config.decay,
          config.momentum
        );
      default:
        const _exhaustive: never = config;
        throw new Error(`Unknown optimizer: ${_exhaustive}`);
    }
  }

  // Type-safe prediction
  async predict(
    input: Tensor<InputShape, 'float32'>
  ): Promise<Tensor<OutputShape, 'float32'>> {
    if (!this.model || !this.isCompiled) {
      throw new Error('Model needs to be compiled before predict');
    }

    // Validate input shape
    if (!this.validateShape(input.shape, this.inputShape)) {
      throw new Error(
        `Invalid input shape. Expected: [${this.inputShape}], Received: [${input.shape}]`
      );
    }

    const prediction = this.model.predict(input as any) as tf.Tensor;

    return prediction as Tensor<OutputShape, 'float32'>;
  }

  // Type-safe training
  async train(
    trainData: Tensor<InputShape, 'float32'>,
    trainLabels: Tensor<OutputShape, 'float32'>,
    options: TrainingOptions
  ): Promise<TrainingHistory> {
    if (!this.model || !this.isCompiled) {
      throw new Error('Model needs to be compiled before training');
    }

    const history = await this.model.fit(trainData as any, trainLabels as any, {
      epochs: options.epochs,
      batchSize: options.batchSize,
      validationSplit: options.validationSplit,
      callbacks: {
        onEpochEnd: (epoch, logs) => {
          options.onEpochEnd?.(epoch, logs as TrainingLogs);
        }
      }
    });

    return {
      loss: history.history.loss as number[],
      accuracy: history.history.acc as number[] | undefined,
      valLoss: history.history.val_loss as number[] | undefined,
      valAccuracy: history.history.val_acc as number[] | undefined
    };
  }

  private validateShape(actual: TensorShape, expected: TensorShape): boolean {
    if (actual.length !== expected.length) return false;

    return actual.every((dim, idx) => {
      // -1 means flexible dimension (batch size)
      if (expected[idx] === -1) return true;
      return dim === expected[idx];
    });
  }

  async save(path: string): Promise<void> {
    if (!this.model) {
      throw new Error('Model does not exist');
    }
    await this.model.save(path);
  }

  dispose(): void {
    this.model?.dispose();
    this.model = null;
  }
}

interface TrainingOptions {
  epochs: number;
  batchSize: number;
  validationSplit?: number;
  onEpochEnd?: (epoch: number, logs: TrainingLogs) => void;
}

interface TrainingLogs {
  loss: number;
  acc?: number;
  val_loss?: number;
  val_acc?: number;
}

interface TrainingHistory {
  loss: number[];
  accuracy?: number[];
  valLoss?: number[];
  valAccuracy?: number[];
}

Now see the magic of TypeScript in action:

// Create model for MNIST image classification (28x28 pixels, 10 classes)
const mnistModel = new TypeSafeModel<[28, 28, 1], [10]>(
  [28, 28, 1], // Input: 28x28 images with 1 channel (grayscale)
  [10],        // Output: 10 classes (digits 0-9)
  {
    layers: [
      { filters: 32, kernelSize: 3, activation: 'relu' },
      { filters: 64, kernelSize: 3, activation: 'relu' },
      { units: 128, activation: 'relu' },
      { units: 10, activation: 'softmax' }
    ],
    optimizer: { type: 'adam', learningRate: 0.001 },
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
  }
);

// Compile model
mnistModel.compile(
  { type: 'adam', learningRate: 0.001 },
  'categoricalCrossentropy',
  ['accuracy']
);

// Train - TypeScript ensures shapes are correct!
const trainImages: Tensor<[28, 28, 1], 'float32'> = /* ... */;
const trainLabels: Tensor<[10], 'float32'> = /* ... */;

const history = await mnistModel.train(trainImages, trainLabels, {
  epochs: 10,
  batchSize: 32,
  validationSplit: 0.2,
  onEpochEnd: (epoch, logs) => {
    console.log(`Epoch ${epoch}: loss=${logs.loss}, acc=${logs.acc}`);
  }
});

// Make predictions - fully type-safe!
const testImage: Tensor<[28, 28, 1], 'float32'> = /* ... */;
const prediction: Tensor<[10], 'float32'> = await mnistModel.predict(testImage);

// ❌ This would cause compilation error!
// const wrongImage: Tensor<[32, 32, 3], 'float32'> = /* ... */;
// mnistModel.predict(wrongImage); // TypeScript: Error! Shape does not match

Type-Safe Data Pipelines for Machine Learning

One of the biggest problems in ML is the data pipeline. Let us create a type-safe system to process data:

// Types for different pipeline stages
type DataPoint<Features, Label> = {
  features: Features;
  label: Label;
};

type DataBatch<Features, Label> = {
  features: Features[];
  labels: Label[];
  batchSize: number;
};

// Generic type-safe pipeline
class MLDataPipeline<RawData, ProcessedFeatures, Label> {
  private data: RawData[] = [];
  private processors: Array<(data: any) => any> = [];

  constructor(private config: PipelineConfig<RawData, ProcessedFeatures, Label>) {}

  // Add raw data
  addData(rawData: RawData[]): this {
    this.data.push(...rawData);
    return this;
  }

  // Extract features in a type-safe way
  extractFeatures(
    extractor: (raw: RawData) => ProcessedFeatures
  ): this {
    this.processors.push(extractor);
    return this;
  }

  // Normalize features
  normalize(
    normalizer: (features: ProcessedFeatures) => ProcessedFeatures
  ): this {
    this.processors.push(normalizer);
    return this;
  }

  // Augment data
  augment(
    augmenter: (features: ProcessedFeatures) => ProcessedFeatures[]
  ): this {
    this.processors.push(augmenter);
    return this;
  }

  // Extract labels
  extractLabels(
    extractor: (raw: RawData) => Label
  ): MLDataset<ProcessedFeatures, Label> {
    const processed: DataPoint<ProcessedFeatures, Label>[] = [];

    for (const raw of this.data) {
      let current: any = raw;

      // Apply all processors
      for (const processor of this.processors) {
        current = processor(current);
      }

      const label = extractor(raw);

      // If augmentation returned array
      if (Array.isArray(current)) {
        current.forEach(features => {
          processed.push({ features, label });
        });
      } else {
        processed.push({ features: current, label });
      }
    }

    return new MLDataset(processed);
  }
}

class MLDataset<Features, Label> {
  constructor(private data: DataPoint<Features, Label>[]) {}

  // Shuffle dataset
  shuffle(): this {
    for (let i = this.data.length - 1; i > 0; i--) {
      const j = Math.floor(Math.random() * (i + 1));
      [this.data[i], this.data[j]] = [this.data[j], this.data[i]];
    }
    return this;
  }

  // Split into train/test
  split(
    ratio: number
  ): { train: MLDataset<Features, Label>; test: MLDataset<Features, Label> } {
    const splitIndex = Math.floor(this.data.length * ratio);
    return {
      train: new MLDataset(this.data.slice(0, splitIndex)),
      test: new MLDataset(this.data.slice(splitIndex))
    };
  }

  // Create batches
  *batches(batchSize: number): Generator<DataBatch<Features, Label>> {
    for (let i = 0; i < this.data.length; i += batchSize) {
      const batch = this.data.slice(i, i + batchSize);
      yield {
        features: batch.map(d => d.features),
        labels: batch.map(d => d.label),
        batchSize: batch.length
      };
    }
  }

  // Convert to tensors
  toTensors<FShape extends TensorShape, LShape extends TensorShape>(): {
    features: Tensor<FShape, 'float32'>;
    labels: Tensor<LShape, 'float32'>;
  } {
    // Implementation depends on Features and Label structure
    // This is a simplified example
    const featuresArray = this.data.map(d => d.features);
    const labelsArray = this.data.map(d => d.label);

    return {
      features: tf.tensor(featuresArray as any) as any,
      labels: tf.tensor(labelsArray as any) as any
    };
  }

  get size(): number {
    return this.data.length;
  }
}

interface PipelineConfig<Raw, Features, Label> {
  name: string;
  description?: string;
}

// Real usage example: Pipeline for sentiment classification
interface ReviewData {
  text: string;
  rating: number;
  verified: boolean;
}

type TextFeatures = {
  tokens: number[];
  length: number;
  embeddings: number[];
};

type SentimentLabel = 'positive' | 'negative' | 'neutral';

const pipeline = new MLDataPipeline<ReviewData, TextFeatures, SentimentLabel>({
  name: 'sentiment-analysis',
  description: 'Pipeline for sentiment analysis of reviews'
});

// Load data
const reviews: ReviewData[] = [
  { text: 'Excellent product!', rating: 5, verified: true },
  { text: 'Very bad, do not recommend', rating: 1, verified: true }
  // ... more reviews
];

// Process data in a type-safe way
const dataset = pipeline
  .addData(reviews)
  .extractFeatures((review) => ({
    tokens: tokenize(review.text),
    length: review.text.length,
    embeddings: getEmbeddings(review.text)
  }))
  .normalize((features) => ({
    ...features,
    embeddings: normalizeVector(features.embeddings)
  }))
  .augment((features) => {
    // Data augmentation: create variations
    return [
      features,
      { ...features, tokens: features.tokens.reverse() } // example
    ];
  })
  .extractLabels((review) => {
    if (review.rating >= 4) return 'positive';
    if (review.rating <= 2) return 'negative';
    return 'neutral';
  });

// Use dataset
const { train, test } = dataset.shuffle().split(0.8);

console.log(`Dataset processed: ${train.size} train, ${test.size} test`);

// Iterate over batches in a type-safe way
for (const batch of train.batches(32)) {
  console.log(`Batch of ${batch.batchSize} samples`);
  // Train model with batch
}

// Helper functions (simplified implementation)
function tokenize(text: string): number[] {
  return text.split(' ').map(word => word.length); // simple example
}

function getEmbeddings(text: string): number[] {
  return new Array(128).fill(0).map(() => Math.random()); // example
}

function normalizeVector(vec: number[]): number[] {
  const max = Math.max(...vec);
  return vec.map(v => v / max);
}

Concrete Benefits of Using TypeScript in AI

After implementing TypeScript in AI projects, the benefits are immense:

70% Reduction in Production Bugs: Type errors are caught in development, not at runtime.

Confident Refactoring: Changing data structures or signatures is safe. TypeScript shows exactly what needs updating.

Living Documentation: Types serve as always-updated documentation. New developers understand code faster.

Powerful Autocomplete: IDEs know exactly what you can do, speeding up development.

Fewer Tests Needed: Many type tests become unnecessary because TypeScript already guarantees them.

If you want to explore more about performance in AI applications, check out: WebAssembly and Machine Learning: Extreme Performance for AI on the Web where we explore how to combine TypeScript, WebAssembly and ML.

TypeScript and AI: How to Create Type-Safe Machine Learning Applications That Avoid Production Bugs

The Problem of Doing AI without TypeScript

Creating Safe Types for AI Models

Type-Safe Data Pipelines for Machine Learning

Concrete Benefits of Using TypeScript in AI

Let us go! 🦅

Comments (0)

Add comments