MLTrainingService.swift

MLTrainingService Documentation

Overview

MLTrainingService.swift provides text embedding generation functionality using Apple's Natural Language framework. The service utilizes pre-trained word embeddings to convert text into numerical vector representations, which can be used for similarity matching and text analysis.

Core Components

TrainingData Structure

struct TrainingData {
    let text: String
    let category: String?
    let metadata: [String: Any]?
}
Property
Type
Description

text

String

The input text to be processed

category

String?

Optional category label

metadata

[String: Any]?

Optional additional information

TrainingConfig Structure

struct TrainingConfig {
    let dimension: Int
    let windowSize: Int
    
    static let `default` = TrainingConfig(
        dimension: 512,
        windowSize: 5
    )
}
Property
Type
Description

dimension

Int

Size of the embedding vectors

windowSize

Int

Context window for word embeddings

Main Features

Initialization

init(config: TrainingConfig = .default)
  • Creates a new MLTrainingService instance

  • Loads Apple's pre-trained word embeddings for English

  • Uses default configuration if not specified

Data Management

func addTrainingData(_ data: TrainingData)
func addTrainingBatch(_ batch: [TrainingData])
  • Methods to add single or multiple training data points

  • Stores data for potential future use or analysis

Embedding Generation

func generateEmbedding(for text: String) -> [Double]?
  1. Tokenizes input text into words

  2. Generates embeddings for each word

  3. Averages word vectors to create text embedding

  4. Returns nil if embedding generation fails

Async Support

func generateEmbeddingAsync(for text: String) async -> [Double]?
func generateEmbeddingsAsync(for texts: [String]) async -> [[Double]]
  • Asynchronous versions of embedding generation

  • Supports batch processing of multiple texts

  • Uses Swift concurrency for efficient processing

Error Handling

enum MLTrainingError: Error {
    case insufficientData
    case embeddingGenerationFailed
}
Error
Description

insufficientData

Not enough data for processing

embeddingGenerationFailed

Failed to generate embeddings

Usage Examples

Basic Usage

let service = MLTrainingService()

// Generate embeddings for a single text
if let embedding = service.generateEmbedding(for: "Sample text") {
    // Use embedding vector
}

Async Usage

let service = MLTrainingService()

Task {
    // Process single text
    let embedding = await service.generateEmbeddingAsync(for: "Sample text")
    
    // Process multiple texts
    let texts = ["Text 1", "Text 2", "Text 3"]
    let embeddings = await service.generateEmbeddingsAsync(for: texts)
}

Custom Configuration

let config = TrainingConfig(dimension: 256, windowSize: 3)
let service = MLTrainingService(config: config)

Best Practices

  1. Memory Management

    • Be mindful of batch sizes when processing multiple texts

    • Consider memory usage when storing training data

  2. Performance

    • Use async methods for better performance with large datasets

    • Process texts in batches when possible

  3. Error Handling

    • Always check for nil results when generating embeddings

    • Implement appropriate error handling for your use case

Dependencies

  • Foundation

  • NaturalLanguage

  • CoreML

  • EntryModel: Uses embeddings for entry processing

  • ClusterModel: Uses embeddings for similarity matching

  • Storage layer: Persists generated embeddings

Last updated