Anúncio

TypeScript e IA: Como Criar Aplicações de Machine Learning Type-Safe que Evitam Bugs em Produção

Olá HaWkers, você já perdeu horas debugando um erro em produção só para descobrir que estava passando um array de números quando o modelo de IA esperava uma matriz 2D? Ou teve que lidar com um crash porque os tipos de dados dos tensores não batiam?

Se você trabalha com Machine Learning em JavaScript, provavelmente já passou por isso. A boa notícia é que TypeScript está mudando completamente esse cenário, trazendo segurança de tipos para o mundo caótico da IA.

O Problema de Fazer IA sem TypeScript

Machine Learning em JavaScript puro é como andar em campo minado. Veja alguns problemas comuns:

Erros de Dimensão de Tensores: Você cria um tensor batch, height, width, channels mas acidentalmente passa height, width, channels. O código compila, mas explode em runtime.

Tipos Incompatíveis: Seu modelo espera float32 mas você passa int8. Silenciosamente, resultados ficam errados.

Configurações Inválidas: Você configura learning rate como string "0.01" ao invés de número 0.01. Treinamento falha misteriosamente.

Falta de Autocomplete: Sem tipos, você não sabe quais métodos/propriedades existem. Fica consultando documentação a cada linha.

Refatoração Impossível: Mudar assinatura de uma função que processa dados do modelo? Boa sorte encontrando todos os lugares que precisam atualizar.

TypeScript resolve todos esses problemas.

Anúncio

Criando Tipos Seguros para Modelos de IA

Vamos começar definindo tipos robustos para trabalhar com Machine Learning:

// Tipos base para tensores
type TensorShape = number[];

type DType = 'float32' | 'int32' | 'int8' | 'uint8' | 'bool' | 'complex64' | 'string';

interface TensorConfig<T extends DType = 'float32'> {
  shape: TensorShape;
  dtype: T;
  data: T extends 'float32' ? Float32Array
      : T extends 'int32' ? Int32Array
      : T extends 'int8' ? Int8Array
      : T extends 'uint8' ? Uint8Array
      : T extends 'bool' ? Uint8Array
      : T extends 'string' ? string[]
      : never;
}

// Tipo genérico para Tensor
interface Tensor<
  Shape extends TensorShape = TensorShape,
  Type extends DType = 'float32'
> {
  shape: Shape;
  dtype: Type;
  size: number;
  data(): Promise<TensorConfig<Type>['data']>;
  dispose(): void;
  reshape<NewShape extends TensorShape>(newShape: NewShape): Tensor<NewShape, Type>;
}

// Tipos para layers de redes neurais
interface DenseLayerConfig {
  units: number;
  activation?: 'relu' | 'sigmoid' | 'tanh' | 'softmax' | 'linear';
  useBias?: boolean;
  kernelInitializer?: 'glorotNormal' | 'heNormal' | 'ones' | 'zeros';
  biasInitializer?: 'zeros' | 'ones';
  kernelRegularizer?: RegularizerConfig;
}

interface RegularizerConfig {
  type: 'l1' | 'l2' | 'l1l2';
  l1?: number;
  l2?: number;
}

interface Conv2DLayerConfig {
  filters: number;
  kernelSize: [number, number] | number;
  strides?: [number, number] | number;
  padding?: 'valid' | 'same';
  activation?: DenseLayerConfig['activation'];
  dataFormat?: 'channelsFirst' | 'channelsLast';
}

// Tipo para configuração de modelo
interface ModelConfig {
  layers: Array<DenseLayerConfig | Conv2DLayerConfig>;
  optimizer: OptimizerConfig;
  loss: LossFunction;
  metrics?: MetricFunction[];
}

type OptimizerConfig = {
  type: 'adam';
  learningRate: number;
  beta1?: number;
  beta2?: number;
  epsilon?: number;
} | {
  type: 'sgd';
  learningRate: number;
  momentum?: number;
} | {
  type: 'rmsprop';
  learningRate: number;
  decay?: number;
  momentum?: number;
};

type LossFunction =
  | 'categoricalCrossentropy'
  | 'binaryCrossentropy'
  | 'meanSquaredError'
  | 'meanAbsoluteError';

type MetricFunction =
  | 'accuracy'
  | 'categoricalAccuracy'
  | 'binaryAccuracy'
  | 'precision'
  | 'recall';

Agora, vamos criar uma classe type-safe para trabalhar com modelos:

import * as tf from '@tensorflow/tfjs';

class TypeSafeModel<
  InputShape extends TensorShape,
  OutputShape extends TensorShape
> {
  private model: tf.LayersModel | null = null;
  private inputShape: InputShape;
  private outputShape: OutputShape;
  private isCompiled: boolean = false;

  constructor(
    inputShape: InputShape,
    outputShape: OutputShape,
    config: ModelConfig
  ) {
    this.inputShape = inputShape;
    this.outputShape = outputShape;
    this.buildModel(config);
  }

  private buildModel(config: ModelConfig): void {
    const input = tf.input({ shape: this.inputShape });
    let layer: tf.SymbolicTensor = input;

    // Adicionar layers com type safety
    for (const layerConfig of config.layers) {
      if ('units' in layerConfig) {
        // Dense layer
        layer = tf.layers.dense({
          units: layerConfig.units,
          activation: layerConfig.activation,
          useBias: layerConfig.useBias ?? true,
          kernelInitializer: layerConfig.kernelInitializer ?? 'glorotNormal',
          biasInitializer: layerConfig.biasInitializer ?? 'zeros'
        }).apply(layer) as tf.SymbolicTensor;
      } else if ('filters' in layerConfig) {
        // Conv2D layer
        layer = tf.layers.conv2d({
          filters: layerConfig.filters,
          kernelSize: layerConfig.kernelSize,
          strides: layerConfig.strides,
          padding: layerConfig.padding,
          activation: layerConfig.activation
        }).apply(layer) as tf.SymbolicTensor;
      }
    }

    this.model = tf.model({ inputs: input, outputs: layer });
  }

  compile(optimizer: OptimizerConfig, loss: LossFunction, metrics?: MetricFunction[]): void {
    if (!this.model) {
      throw new Error('Modelo não foi construído');
    }

    const tfOptimizer = this.createOptimizer(optimizer);

    this.model.compile({
      optimizer: tfOptimizer,
      loss: loss,
      metrics: metrics
    });

    this.isCompiled = true;
  }

  private createOptimizer(config: OptimizerConfig): tf.Optimizer {
    switch (config.type) {
      case 'adam':
        return tf.train.adam(
          config.learningRate,
          config.beta1,
          config.beta2,
          config.epsilon
        );
      case 'sgd':
        return tf.train.sgd(config.learningRate);
      case 'rmsprop':
        return tf.train.rmsprop(
          config.learningRate,
          config.decay,
          config.momentum
        );
      default:
        const _exhaustive: never = config;
        throw new Error(`Optimizer desconhecido: ${_exhaustive}`);
    }
  }

  // Type-safe prediction
  async predict(
    input: Tensor<InputShape, 'float32'>
  ): Promise<Tensor<OutputShape, 'float32'>> {
    if (!this.model || !this.isCompiled) {
      throw new Error('Modelo precisa ser compilado antes de predict');
    }

    // Validar shape do input
    if (!this.validateShape(input.shape, this.inputShape)) {
      throw new Error(
        `Shape de input inválido. Esperado: [${this.inputShape}], Recebido: [${input.shape}]`
      );
    }

    const prediction = this.model.predict(input as any) as tf.Tensor;

    return prediction as Tensor<OutputShape, 'float32'>;
  }

  // Type-safe training
  async train(
    trainData: Tensor<InputShape, 'float32'>,
    trainLabels: Tensor<OutputShape, 'float32'>,
    options: TrainingOptions
  ): Promise<TrainingHistory> {
    if (!this.model || !this.isCompiled) {
      throw new Error('Modelo precisa ser compilado antes de treinar');
    }

    const history = await this.model.fit(trainData as any, trainLabels as any, {
      epochs: options.epochs,
      batchSize: options.batchSize,
      validationSplit: options.validationSplit,
      callbacks: {
        onEpochEnd: (epoch, logs) => {
          options.onEpochEnd?.(epoch, logs as TrainingLogs);
        }
      }
    });

    return {
      loss: history.history.loss as number[],
      accuracy: history.history.acc as number[] | undefined,
      valLoss: history.history.val_loss as number[] | undefined,
      valAccuracy: history.history.val_acc as number[] | undefined
    };
  }

  private validateShape(actual: TensorShape, expected: TensorShape): boolean {
    if (actual.length !== expected.length) return false;

    return actual.every((dim, idx) => {
      // -1 significa dimensão flexível (batch size)
      if (expected[idx] === -1) return true;
      return dim === expected[idx];
    });
  }

  async save(path: string): Promise<void> {
    if (!this.model) {
      throw new Error('Modelo não existe');
    }
    await this.model.save(path);
  }

  dispose(): void {
    this.model?.dispose();
    this.model = null;
  }
}

interface TrainingOptions {
  epochs: number;
  batchSize: number;
  validationSplit?: number;
  onEpochEnd?: (epoch: number, logs: TrainingLogs) => void;
}

interface TrainingLogs {
  loss: number;
  acc?: number;
  val_loss?: number;
  val_acc?: number;
}

interface TrainingHistory {
  loss: number[];
  accuracy?: number[];
  valLoss?: number[];
  valAccuracy?: number[];
}

Agora veja a mágica do TypeScript em ação:

// Criar modelo para classificação de imagens MNIST (28x28 pixels, 10 classes)
const mnistModel = new TypeSafeModel<[28, 28, 1], [10]>(
  [28, 28, 1], // Input: imagens 28x28 com 1 canal (grayscale)
  [10],        // Output: 10 classes (dígitos 0-9)
  {
    layers: [
      { filters: 32, kernelSize: 3, activation: 'relu' },
      { filters: 64, kernelSize: 3, activation: 'relu' },
      { units: 128, activation: 'relu' },
      { units: 10, activation: 'softmax' }
    ],
    optimizer: { type: 'adam', learningRate: 0.001 },
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
  }
);

// Compilar modelo
mnistModel.compile(
  { type: 'adam', learningRate: 0.001 },
  'categoricalCrossentropy',
  ['accuracy']
);

// Treinar - TypeScript garante que shapes estão corretos!
const trainImages: Tensor<[28, 28, 1], 'float32'> = /* ... */;
const trainLabels: Tensor<[10], 'float32'> = /* ... */;

const history = await mnistModel.train(trainImages, trainLabels, {
  epochs: 10,
  batchSize: 32,
  validationSplit: 0.2,
  onEpochEnd: (epoch, logs) => {
    console.log(`Época ${epoch}: loss=${logs.loss}, acc=${logs.acc}`);
  }
});

// Fazer predições - totalmente type-safe!
const testImage: Tensor<[28, 28, 1], 'float32'> = /* ... */;
const prediction: Tensor<[10], 'float32'> = await mnistModel.predict(testImage);

// ❌ Isso causaria erro de compilação!
// const wrongImage: Tensor<[32, 32, 3], 'float32'> = /* ... */;
// mnistModel.predict(wrongImage); // TypeScript: Error! Shape não bate

Anúncio

Pipelines de Dados Type-Safe para Machine Learning

Um dos maiores problemas em ML é o pipeline de dados. Vamos criar um sistema type-safe para processar dados:

// Tipos para diferentes estágios do pipeline
type DataPoint<Features, Label> = {
  features: Features;
  label: Label;
};

type DataBatch<Features, Label> = {
  features: Features[];
  labels: Label[];
  batchSize: number;
};

// Pipeline genérico type-safe
class MLDataPipeline<RawData, ProcessedFeatures, Label> {
  private data: RawData[] = [];
  private processors: Array<(data: any) => any> = [];

  constructor(private config: PipelineConfig<RawData, ProcessedFeatures, Label>) {}

  // Adicionar dados brutos
  addData(rawData: RawData[]): this {
    this.data.push(...rawData);
    return this;
  }

  // Extrair features de forma type-safe
  extractFeatures(
    extractor: (raw: RawData) => ProcessedFeatures
  ): this {
    this.processors.push(extractor);
    return this;
  }

  // Normalizar features
  normalize(
    normalizer: (features: ProcessedFeatures) => ProcessedFeatures
  ): this {
    this.processors.push(normalizer);
    return this;
  }

  // Augmentar dados
  augment(
    augmenter: (features: ProcessedFeatures) => ProcessedFeatures[]
  ): this {
    this.processors.push(augmenter);
    return this;
  }

  // Extrair labels
  extractLabels(
    extractor: (raw: RawData) => Label
  ): MLDataset<ProcessedFeatures, Label> {
    const processed: DataPoint<ProcessedFeatures, Label>[] = [];

    for (const raw of this.data) {
      let current: any = raw;

      // Aplicar todos os processadores
      for (const processor of this.processors) {
        current = processor(current);
      }

      const label = extractor(raw);

      // Se augmentation retornou array
      if (Array.isArray(current)) {
        current.forEach(features => {
          processed.push({ features, label });
        });
      } else {
        processed.push({ features: current, label });
      }
    }

    return new MLDataset(processed);
  }
}

class MLDataset<Features, Label> {
  constructor(private data: DataPoint<Features, Label>[]) {}

  // Shuffle dataset
  shuffle(): this {
    for (let i = this.data.length - 1; i > 0; i--) {
      const j = Math.floor(Math.random() * (i + 1));
      [this.data[i], this.data[j]] = [this.data[j], this.data[i]];
    }
    return this;
  }

  // Split em train/test
  split(
    ratio: number
  ): { train: MLDataset<Features, Label>; test: MLDataset<Features, Label> } {
    const splitIndex = Math.floor(this.data.length * ratio);
    return {
      train: new MLDataset(this.data.slice(0, splitIndex)),
      test: new MLDataset(this.data.slice(splitIndex))
    };
  }

  // Criar batches
  *batches(batchSize: number): Generator<DataBatch<Features, Label>> {
    for (let i = 0; i < this.data.length; i += batchSize) {
      const batch = this.data.slice(i, i + batchSize);
      yield {
        features: batch.map(d => d.features),
        labels: batch.map(d => d.label),
        batchSize: batch.length
      };
    }
  }

  // Converter para tensores
  toTensors<FShape extends TensorShape, LShape extends TensorShape>(): {
    features: Tensor<FShape, 'float32'>;
    labels: Tensor<LShape, 'float32'>;
  } {
    // Implementação depende da estrutura de Features e Label
    // Este é um exemplo simplificado
    const featuresArray = this.data.map(d => d.features);
    const labelsArray = this.data.map(d => d.label);

    return {
      features: tf.tensor(featuresArray as any) as any,
      labels: tf.tensor(labelsArray as any) as any
    };
  }

  get size(): number {
    return this.data.length;
  }
}

interface PipelineConfig<Raw, Features, Label> {
  name: string;
  description?: string;
}

// Exemplo de uso real: Pipeline para classificação de sentimentos
interface ReviewData {
  text: string;
  rating: number;
  verified: boolean;
}

type TextFeatures = {
  tokens: number[];
  length: number;
  embeddings: number[];
};

type SentimentLabel = 'positive' | 'negative' | 'neutral';

const pipeline = new MLDataPipeline<ReviewData, TextFeatures, SentimentLabel>({
  name: 'sentiment-analysis',
  description: 'Pipeline para análise de sentimentos de reviews'
});

// Carregar dados
const reviews: ReviewData[] = [
  { text: 'Produto excelente!', rating: 5, verified: true },
  { text: 'Muito ruim, não recomendo', rating: 1, verified: true }
  // ... mais reviews
];

// Processar dados de forma type-safe
const dataset = pipeline
  .addData(reviews)
  .extractFeatures((review) => ({
    tokens: tokenize(review.text),
    length: review.text.length,
    embeddings: getEmbeddings(review.text)
  }))
  .normalize((features) => ({
    ...features,
    embeddings: normalizeVector(features.embeddings)
  }))
  .augment((features) => {
    // Data augmentation: criar variações
    return [
      features,
      { ...features, tokens: features.tokens.reverse() } // exemplo
    ];
  })
  .extractLabels((review) => {
    if (review.rating >= 4) return 'positive';
    if (review.rating <= 2) return 'negative';
    return 'neutral';
  });

// Usar dataset
const { train, test } = dataset.shuffle().split(0.8);

console.log(`Dataset processado: ${train.size} treino, ${test.size} teste`);

// Iterar sobre batches de forma type-safe
for (const batch of train.batches(32)) {
  console.log(`Batch de ${batch.batchSize} amostras`);
  // Treinar modelo com batch
}

// Helper functions (implementação simplificada)
function tokenize(text: string): number[] {
  return text.split(' ').map(word => word.length); // exemplo simples
}

function getEmbeddings(text: string): number[] {
  return new Array(128).fill(0).map(() => Math.random()); // exemplo
}

function normalizeVector(vec: number[]): number[] {
  const max = Math.max(...vec);
  return vec.map(v => v / max);
}

Anúncio

Integrando TypeScript com Bibliotecas de IA Populares

Vamos criar wrappers type-safe para bibliotecas populares de IA:

// Wrapper type-safe para OpenAI API
interface OpenAIConfig {
  apiKey: string;
  organization?: string;
  baseURL?: string;
}

type ChatRole = 'system' | 'user' | 'assistant' | 'function';

interface ChatMessage {
  role: ChatRole;
  content: string;
  name?: string;
}

interface ChatCompletionOptions {
  model: 'gpt-4' | 'gpt-3.5-turbo' | 'gpt-4-turbo';
  messages: ChatMessage[];
  temperature?: number;
  maxTokens?: number;
  topP?: number;
  frequencyPenalty?: number;
  presencePenalty?: number;
  stop?: string | string[];
}

interface ChatCompletionResponse {
  id: string;
  created: number;
  model: string;
  choices: Array<{
    index: number;
    message: ChatMessage;
    finishReason: 'stop' | 'length' | 'content_filter' | 'function_call';
  }>;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
}

class TypeSafeOpenAI {
  constructor(private config: OpenAIConfig) {}

  async chatCompletion(
    options: ChatCompletionOptions
  ): Promise<ChatCompletionResponse> {
    const response = await fetch(`${this.config.baseURL ?? 'https://api.openai.com/v1'}/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.config.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: options.model,
        messages: options.messages,
        temperature: options.temperature ?? 0.7,
        max_tokens: options.maxTokens,
        top_p: options.topP,
        frequency_penalty: options.frequencyPenalty,
        presence_penalty: options.presencePenalty,
        stop: options.stop
      })
    });

    if (!response.ok) {
      throw new Error(`OpenAI API error: ${response.statusText}`);
    }

    return await response.json();
  }

  // Type-safe streaming
  async *chatCompletionStream(
    options: ChatCompletionOptions
  ): AsyncGenerator<string, void, unknown> {
    const response = await fetch(`${this.config.baseURL ?? 'https://api.openai.com/v1'}/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.config.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        ...options,
        stream: true
      })
    });

    if (!response.body) {
      throw new Error('Response body is null');
    }

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(line => line.trim() !== '');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') return;

          try {
            const parsed = JSON.parse(data);
            const content = parsed.choices[0]?.delta?.content;
            if (content) yield content;
          } catch (e) {
            // Ignorar erros de parsing
          }
        }
      }
    }
  }
}

// Uso
const ai = new TypeSafeOpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const response = await ai.chatCompletion({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'Você é um assistente útil.' },
    { role: 'user', content: 'Explique TypeScript em 3 frases.' }
  ],
  temperature: 0.7,
  maxTokens: 150
});

console.log(response.choices[0].message.content);

// Streaming type-safe
for await (const chunk of ai.chatCompletionStream({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Conte uma história' }]
})) {
  process.stdout.write(chunk);
}

Benefícios Concretos de Usar TypeScript em IA

Após implementar TypeScript em projetos de IA, os benefícios são imensos:

Redução de 70% em Bugs de Produção: Erros de tipo são pegos em desenvolvimento, não em runtime.

Refatoração Confiante: Mudar estruturas de dados ou assinaturas é seguro. TypeScript mostra exatamente o que precisa atualizar.

Documentação Viva: Tipos servem como documentação sempre atualizada. Novos desenvolvedores entendem o código mais rápido.

Autocomplete Poderoso: IDEs sabem exatamente o que você pode fazer, acelerando desenvolvimento.

Menos Testes Necessários: Muitos testes de tipo se tornam desnecessários pois TypeScript já garante.

Se você quer explorar mais sobre performance em aplicações de IA, confira: WebAssembly e Machine Learning: Performance Extrema para IA na Web onde exploramos como combinar TypeScript, WebAssembly e ML.

Bora pra cima! 🦅

Anúncio

TypeScript e IA: Como Criar Aplicações de Machine Learning Type-Safe que Evitam Bugs em Produção

O Problema de Fazer IA sem TypeScript

Criando Tipos Seguros para Modelos de IA

Pipelines de Dados Type-Safe para Machine Learning

Integrando TypeScript com Bibliotecas de IA Populares

Benefícios Concretos de Usar TypeScript em IA

Bora pra cima! 🦅

Comentários (0)

Adicionar comentário