WordEmbeddingService.swift

WordEmbeddingService Documentation

Overview

WordEmbeddingService.swift provides functionality to convert words and text into vector representations (embeddings) using Apple's built-in Natural Language framework. It leverages pre-trained word embeddings to generate numerical representations that capture semantic meaning.

Core Components

Properties

private let embedding: NLEmbedding?
  • Uses Apple's built-in word embeddings for the specified language

  • Initialized with English language by default

Error Types

enum EmbeddingError: Error {
    case embeddingNotInitialized
    case embeddingGenerationFailed
    case invalidInput
}
Error
Description

embeddingNotInitialized

System embeddings couldn't be loaded

embeddingGenerationFailed

Failed to generate vector for input

invalidInput

Input text is empty or invalid

Primary Functions

Word Vector Generation

func generateWordVector(for word: String) throws -> [Double]
  • Generates vector representation for a single word

  • Throws error if word cannot be embedded

  • Returns double array representing word's semantic meaning

Text Vector Generation

func generateTextVector(for text: String) throws -> [Double]
  • Processes entire text passages

  • Splits text into words

  • Generates and averages word vectors

  • Returns combined vector representation

Similarity Calculation

func similarity(between word1: String, and word2: String) -> Double
  • Calculates semantic similarity between two words

  • Returns value between 0 (different) and 1 (similar)

  • Uses built-in distance calculation

Usage Examples

Basic Word Embedding

let service = WordEmbeddingService()

do {
    let vector = try service.generateWordVector(for: "hello")
    // Use vector for analysis
} catch {
    print("Error generating vector: \(error)")
}

Text Processing

do {
    let textVector = try service.generateTextVector(for: "machine learning")
    // Use combined vector for analysis
} catch {
    print("Error processing text: \(error)")
}

Word Similarity

let similarity = service.similarity(between: "happy", and: "joyful")
print("Similarity score: \(similarity)")

Helper Methods

Vector Averaging

private func averageVectors(_ vectors: [[Double]]) -> [Double]
  • Combines multiple word vectors into one

  • Used internally for text processing

  • Returns average vector across all inputs

Best Practices

  1. Initialization

    • Initialize service once and reuse

    • Handle potential initialization failures

    • Consider language requirements

  2. Error Handling

    • Implement proper error catching

    • Validate inputs before processing

    • Provide fallback behavior

  3. Performance

    • Cache results when appropriate

    • Process large texts in chunks

    • Consider memory usage with large inputs

  4. Usage Guidelines

    • Use word vectors for single words

    • Use text vectors for sentences/paragraphs

    • Compare similar length inputs

Integration Points

With Text Classifier

// Example integration with UnpartyTextClassifier
let vector = try service.generateTextVector(for: inputText)
let classification = try classifier.predict(text: inputText)

With Clustering Service

// Example usage in clustering
let vectors = texts.compactMap { try? service.generateTextVector(for: $0) }
let clusters = try clusteringService.cluster(vectors)

Limitations

  1. Language Support

    • Primary support for English

    • May have limited vocabulary

    • Quality varies by language

  2. Vector Quality

    • Depends on word frequency

    • May not capture context perfectly

    • Limited by training data

  3. Performance Considerations

    • Processing time increases with text length

    • Memory usage with large texts

    • Network requirements for initial loading

  • TextTokenizationService: For text preprocessing

  • KMeansClusteringService: For grouping similar vectors

  • SimilarityService: For advanced similarity calculations

Last updated