WordEmbeddingService.swift
WordEmbeddingService Documentation
Overview
WordEmbeddingService.swift
provides functionality to convert words and text into vector representations (embeddings) using Apple's built-in Natural Language framework. It leverages pre-trained word embeddings to generate numerical representations that capture semantic meaning.
Core Components
Properties
Uses Apple's built-in word embeddings for the specified language
Initialized with English language by default
Error Types
embeddingNotInitialized
System embeddings couldn't be loaded
embeddingGenerationFailed
Failed to generate vector for input
invalidInput
Input text is empty or invalid
Primary Functions
Word Vector Generation
Generates vector representation for a single word
Throws error if word cannot be embedded
Returns double array representing word's semantic meaning
Text Vector Generation
Processes entire text passages
Splits text into words
Generates and averages word vectors
Returns combined vector representation
Similarity Calculation
Calculates semantic similarity between two words
Returns value between 0 (different) and 1 (similar)
Uses built-in distance calculation
Usage Examples
Basic Word Embedding
Text Processing
Word Similarity
Helper Methods
Vector Averaging
Combines multiple word vectors into one
Used internally for text processing
Returns average vector across all inputs
Best Practices
Initialization
Initialize service once and reuse
Handle potential initialization failures
Consider language requirements
Error Handling
Implement proper error catching
Validate inputs before processing
Provide fallback behavior
Performance
Cache results when appropriate
Process large texts in chunks
Consider memory usage with large inputs
Usage Guidelines
Use word vectors for single words
Use text vectors for sentences/paragraphs
Compare similar length inputs
Integration Points
With Text Classifier
With Clustering Service
Limitations
Language Support
Primary support for English
May have limited vocabulary
Quality varies by language
Vector Quality
Depends on word frequency
May not capture context perfectly
Limited by training data
Performance Considerations
Processing time increases with text length
Memory usage with large texts
Network requirements for initial loading
Related Components
TextTokenizationService: For text preprocessing
KMeansClusteringService: For grouping similar vectors
SimilarityService: For advanced similarity calculations
Last updated