WordEmbeddingService.swift

WordEmbeddingService Documentation

Overview

WordEmbeddingService.swift provides functionality to convert words and text into vector representations (embeddings) using Apple's built-in Natural Language framework. It leverages pre-trained word embeddings to generate numerical representations that capture semantic meaning.

Core Components

Properties

private let embedding: NLEmbedding?

Uses Apple's built-in word embeddings for the specified language
Initialized with English language by default

Error Types

enum EmbeddingError: Error {
    case embeddingNotInitialized
    case embeddingGenerationFailed
    case invalidInput
}

Error

Description

embeddingNotInitialized

System embeddings couldn't be loaded

embeddingGenerationFailed

Failed to generate vector for input

invalidInput

Input text is empty or invalid

Primary Functions

Word Vector Generation

func generateWordVector(for word: String) throws -> [Double]

Generates vector representation for a single word
Throws error if word cannot be embedded
Returns double array representing word's semantic meaning

Text Vector Generation

func generateTextVector(for text: String) throws -> [Double]

Processes entire text passages
Splits text into words
Generates and averages word vectors
Returns combined vector representation

Similarity Calculation

func similarity(between word1: String, and word2: String) -> Double

Calculates semantic similarity between two words
Returns value between 0 (different) and 1 (similar)
Uses built-in distance calculation

Usage Examples

Basic Word Embedding

let service = WordEmbeddingService()

do {
    let vector = try service.generateWordVector(for: "hello")
    // Use vector for analysis
} catch {
    print("Error generating vector: \(error)")
}

Text Processing

do {
    let textVector = try service.generateTextVector(for: "machine learning")
    // Use combined vector for analysis
} catch {
    print("Error processing text: \(error)")
}

Word Similarity

let similarity = service.similarity(between: "happy", and: "joyful")
print("Similarity score: \(similarity)")

Helper Methods

Vector Averaging

private func averageVectors(_ vectors: [[Double]]) -> [Double]

Combines multiple word vectors into one
Used internally for text processing
Returns average vector across all inputs

Best Practices

Initialization
- Initialize service once and reuse
- Handle potential initialization failures
- Consider language requirements
Error Handling
- Implement proper error catching
- Validate inputs before processing
- Provide fallback behavior
Performance
- Cache results when appropriate
- Process large texts in chunks
- Consider memory usage with large inputs
Usage Guidelines
- Use word vectors for single words
- Use text vectors for sentences/paragraphs
- Compare similar length inputs

Integration Points

With Text Classifier

// Example integration with UnpartyTextClassifier
let vector = try service.generateTextVector(for: inputText)
let classification = try classifier.predict(text: inputText)

With Clustering Service

// Example usage in clustering
let vectors = texts.compactMap { try? service.generateTextVector(for: $0) }
let clusters = try clusteringService.cluster(vectors)

Limitations

Language Support
- Primary support for English
- May have limited vocabulary
- Quality varies by language
Vector Quality
- Depends on word frequency
- May not capture context perfectly
- Limited by training data
Performance Considerations
- Processing time increases with text length
- Memory usage with large texts
- Network requirements for initial loading

TextTokenizationService: For text preprocessing
KMeansClusteringService: For grouping similar vectors
SimilarityService: For advanced similarity calculations

PreviousTextTokenizationService.swift NextUNPARTYTextClassifier.mlmodel

Last updated 1 month ago