src/perceptron.py:

LightPerceptron Documentation

Overview

The LightPerceptron class implements a memory-efficient binary classifier designed specifically for initial text classification in clustering workflows. This implementation focuses on action/non-action classification without state persistence, optimized for retraining scenarios.

Installation
Quick Start
Class Reference
Implementation Details
Performance Considerations
Examples
Integration with Preprocessor
Best Practices

Installation

# Copy perceptron.py to your project
from perceptron import LightPerceptron
import numpy as np  # Required dependency

Required dependencies: numpy

Quick Start

# Initialize perceptron with input size matching vocabulary
perceptron = LightPerceptron(input_size=1000)

# Training with sparse vectors
training_data = [
    ([1, 4, 7], 1),    # Action (indices where vector is 1)
    ([2, 5, 8], 0),    # Non-action
]
perceptron.train(training_data, is_sparse=True)

# Make predictions
result = perceptron.predict_sparse([1, 4, 7], total_size=1000)
print(f"Classification: {'Action' if result == 1 else 'Non-action'}")

Class Reference

Constructor

LightPerceptron(input_size: int, learning_rate: float = 0.1)

input_size: Dimension of input vectors (matches vocabulary size)
learning_rate: Step size for weight updates during training

Core Methods

predict_sparse(active_indices: List[int], total_size: int) → int

Makes prediction from sparse input representation.

Input:
- active_indices: Indices where vector has value 1
- total_size: Total vector dimension
Output: 1 (action) or 0 (non-action)
Time Complexity: O(k) where k is number of active indices

predict_dense(input_vector: List[float]) → int

Makes prediction from dense input vector.

Input: Full input vector
Output: 1 (action) or 0 (non-action)
Time Complexity: O(n) where n is input size

train(training_data, is_sparse=True, max_epochs=100, early_stop=True) → None

Trains perceptron on labeled data.

Input:
- training_data: List of (features, label) pairs
- is_sparse: Whether input uses sparse representation
- max_epochs: Maximum training iterations
- early_stop: Whether to stop when fully converged
Time Complexity: O(e * d * n) where:
- e is number of epochs
- d is number of training examples
- n is input dimension (or k for sparse inputs)

Implementation Details

Data Structures

weights: numpy array of feature weights
bias: Scalar bias term
Input formats supported:
- Sparse (indices of 1s)
- Dense (full vector)

Training Process

Initialize zero weights
For each epoch:
- Predict each training example
- Update weights when predictions are wrong
- Track errors for early stopping
Stop when:
- No errors found (if early_stop=True)
- Max epochs reached

Mathematics

# Prediction
activation = Σ(weights[i] * input[i]) + bias
prediction = 1 if activation > 0 else 0

# Weight Update
weights += learning_rate * error * input
bias += learning_rate * error

Performance Considerations

Memory Efficiency

Sparse input support reduces memory usage
No storage of training history
Numpy for efficient array operations

Processing Speed

Early stopping reduces training time
Optimized sparse operations
Minimal memory allocations

Trade-offs

Retrains each time (speed vs persistence)
Binary output only (simplicity vs granularity)
Linear decision boundary (speed vs complexity)

Examples

Basic Classification

# Initialize
perceptron = LightPerceptron(input_size=1000)

# Dense vector training
dense_training = [
    ([0, 1, 0, 1, 0], 1),
    ([1, 0, 1, 0, 0], 0)
]
perceptron.train(dense_training, is_sparse=False)

# Sparse vector training
sparse_training = [
    ([1, 3], 1),    # Same as [0, 1, 0, 1, 0]
    ([0, 2], 0)     # Same as [1, 0, 1, 0, 0]
]
perceptron.train(sparse_training, is_sparse=True)

Integration with Preprocessor

from preprocessor import LightPreprocessor

# Setup
preprocessor = LightPreprocessor(max_vocab_size=1000)
perceptron = LightPerceptron(input_size=1000)

# Process texts
texts = [
    "Activate user account",      # Action
    "System status report"        # Non-action
]
labels = [1, 0]  # Action labels

# Build vocabulary
preprocessor.build_vocabulary(texts)

# Convert to sparse vectors
training_data = [
    (preprocessor.text_to_sparse(text), label)
    for text, label in zip(texts, labels)
]

# Train and predict
perceptron.train(training_data, is_sparse=True)
result = perceptron.predict_sparse(
    preprocessor.text_to_sparse("New user activation"),
    preprocessor.get_vocab_size()
)

Best Practices

Training

Match input_size to vocabulary size
Use sparse representation for text data
Enable early_stopping for faster training
Normalize input vectors if using dense format

Prediction

Validate input dimensions
Use sparse prediction for text classification
Handle empty inputs gracefully

Integration

Pair with LightPreprocessor
Validate vocabulary size matching
Consider input vector sparsity

Error Handling

Check input dimensions
Validate index ranges
Handle empty feature vectors

Contributing

To contribute to this perceptron:

Maintain focus on text classification
Keep memory efficiency in mind
Add thorough documentation
Include usage examples

Previoussrc/preprocessor.py:Nextsrc/main.py:

Last updated 7 months ago

Was this helpful?