TrainingDataManager.swift

TrainingDataManager Documentation

Overview

TrainingDataManager.swift provides a centralized system for managing, storing, and retrieving training data used in the machine learning pipeline. It handles data persistence and offers various methods for data manipulation.

Core Components

TrainingData Structure

struct TrainingData: Codable {
    let id: UUID
    let text: String
    let category: String?
    let timestamp: Date
    let metadata: [String: Any]?
    var isProcessed: Bool
}

Properties

Property
Type
Description

id

UUID

Unique identifier for the training data

text

String

The content to be processed

category

String?

Optional category label

timestamp

Date

When the data was created

metadata

[String: Any]?

Optional additional information (not persisted)

isProcessed

Bool

Processing status flag

Primary Features

Data Management Methods

func add(_ data: TrainingData)
func addBatch(_ dataArray: [TrainingData])
func remove(_ id: UUID)
func update(_ data: TrainingData)

These methods handle the core CRUD operations:

  • add: Adds a single training data entry

  • addBatch: Adds multiple training data entries

  • remove: Removes an entry by ID

  • update: Updates an existing entry

Data Retrieval Methods

func getAllTrainingData() -> [TrainingData]
func getUnprocessedData() -> [TrainingData]
func getDataByCategory(_ category: String) -> [TrainingData]

Provides filtered access to the data:

  • getAllTrainingData: Returns all stored entries

  • getUnprocessedData: Returns only unprocessed entries

  • getDataByCategory: Returns entries matching a specific category

Data Persistence

func save() throws
func load() throws
private func getStorageURL() -> URL

Handles saving and loading data:

  • save: Persists current data to JSON file

  • load: Loads data from saved JSON file

  • getStorageURL: Manages storage location in app documents directory

Codable Implementation

The TrainingData structure implements Codable with custom encoding/decoding:

  • Excludes metadata from persistence (since [String: Any] isn't Codable)

  • Handles encoding/decoding of all other properties

  • Uses custom CodingKeys for property mapping

Example Usage

// Create training data manager
let manager = TrainingDataManager()

// Add new training data
let trainingData = TrainingData(
    text: "Sample text",
    category: "Category A",
    metadata: ["source": "user_input"]
)
manager.add(trainingData)

// Retrieve unprocessed data
let unprocessedData = manager.getUnprocessedData()

// Save data
try manager.save()

// Load data later
try manager.load()

Best Practices

  1. Data Management

    • Always use unique IDs for entries

    • Check processing status before operations

    • Handle metadata separately from persistence

  2. Error Handling

    • Implement proper error handling for save/load operations

    • Validate data before adding/updating

  3. Performance

    • Use batch operations for multiple entries

    • Consider memory usage with large datasets

  • MLTrainingService: Uses this manager for training data

  • ModelPersistenceManager: Works alongside for model storage

  • TextEmbeddingsService: Processes the training data

Last updated