src/perceptron.py:
LightPerceptron Documentation
Overview
The LightPerceptron
class implements a memory-efficient binary classifier designed specifically for initial text classification in clustering workflows. This implementation focuses on action/non-action classification without state persistence, optimized for retraining scenarios.
Table of Contents
Installation
Quick Start
Class Reference
Implementation Details
Performance Considerations
Examples
Integration with Preprocessor
Best Practices
Installation
# Copy perceptron.py to your project
from perceptron import LightPerceptron
import numpy as np # Required dependency
Required dependencies: numpy
Quick Start
# Initialize perceptron with input size matching vocabulary
perceptron = LightPerceptron(input_size=1000)
# Training with sparse vectors
training_data = [
([1, 4, 7], 1), # Action (indices where vector is 1)
([2, 5, 8], 0), # Non-action
]
perceptron.train(training_data, is_sparse=True)
# Make predictions
result = perceptron.predict_sparse([1, 4, 7], total_size=1000)
print(f"Classification: {'Action' if result == 1 else 'Non-action'}")
Class Reference
Constructor
LightPerceptron(input_size: int, learning_rate: float = 0.1)
input_size
: Dimension of input vectors (matches vocabulary size)learning_rate
: Step size for weight updates during training
Core Methods
predict_sparse(active_indices: List[int], total_size: int) → int
Makes prediction from sparse input representation.
Input:
active_indices
: Indices where vector has value 1total_size
: Total vector dimension
Output: 1 (action) or 0 (non-action)
Time Complexity: O(k) where k is number of active indices
predict_dense(input_vector: List[float]) → int
Makes prediction from dense input vector.
Input: Full input vector
Output: 1 (action) or 0 (non-action)
Time Complexity: O(n) where n is input size
train(training_data, is_sparse=True, max_epochs=100, early_stop=True) → None
Trains perceptron on labeled data.
Input:
training_data
: List of (features, label) pairsis_sparse
: Whether input uses sparse representationmax_epochs
: Maximum training iterationsearly_stop
: Whether to stop when fully converged
Time Complexity: O(e * d * n) where:
e is number of epochs
d is number of training examples
n is input dimension (or k for sparse inputs)
Implementation Details
Data Structures
weights
: numpy array of feature weightsbias
: Scalar bias termInput formats supported:
Sparse (indices of 1s)
Dense (full vector)
Training Process
Initialize zero weights
For each epoch:
Predict each training example
Update weights when predictions are wrong
Track errors for early stopping
Stop when:
No errors found (if early_stop=True)
Max epochs reached
Mathematics
# Prediction
activation = Σ(weights[i] * input[i]) + bias
prediction = 1 if activation > 0 else 0
# Weight Update
weights += learning_rate * error * input
bias += learning_rate * error
Performance Considerations
Memory Efficiency
Sparse input support reduces memory usage
No storage of training history
Numpy for efficient array operations
Processing Speed
Early stopping reduces training time
Optimized sparse operations
Minimal memory allocations
Trade-offs
Retrains each time (speed vs persistence)
Binary output only (simplicity vs granularity)
Linear decision boundary (speed vs complexity)
Examples
Basic Classification
# Initialize
perceptron = LightPerceptron(input_size=1000)
# Dense vector training
dense_training = [
([0, 1, 0, 1, 0], 1),
([1, 0, 1, 0, 0], 0)
]
perceptron.train(dense_training, is_sparse=False)
# Sparse vector training
sparse_training = [
([1, 3], 1), # Same as [0, 1, 0, 1, 0]
([0, 2], 0) # Same as [1, 0, 1, 0, 0]
]
perceptron.train(sparse_training, is_sparse=True)
Integration with Preprocessor
from preprocessor import LightPreprocessor
# Setup
preprocessor = LightPreprocessor(max_vocab_size=1000)
perceptron = LightPerceptron(input_size=1000)
# Process texts
texts = [
"Activate user account", # Action
"System status report" # Non-action
]
labels = [1, 0] # Action labels
# Build vocabulary
preprocessor.build_vocabulary(texts)
# Convert to sparse vectors
training_data = [
(preprocessor.text_to_sparse(text), label)
for text, label in zip(texts, labels)
]
# Train and predict
perceptron.train(training_data, is_sparse=True)
result = perceptron.predict_sparse(
preprocessor.text_to_sparse("New user activation"),
preprocessor.get_vocab_size()
)
Best Practices
Training
Match input_size to vocabulary size
Use sparse representation for text data
Enable early_stopping for faster training
Normalize input vectors if using dense format
Prediction
Validate input dimensions
Use sparse prediction for text classification
Handle empty inputs gracefully
Integration
Pair with LightPreprocessor
Validate vocabulary size matching
Consider input vector sparsity
Error Handling
Check input dimensions
Validate index ranges
Handle empty feature vectors
Contributing
To contribute to this perceptron:
Maintain focus on text classification
Keep memory efficiency in mind
Add thorough documentation
Include usage examples
Last updated
Was this helpful?