src/perceptron.py:
LightPerceptron Documentation
Overview
The LightPerceptron
class implements a memory-efficient binary classifier designed specifically for initial text classification in clustering workflows. This implementation focuses on action/non-action classification without state persistence, optimized for retraining scenarios.
Table of Contents
Installation
Quick Start
Class Reference
Implementation Details
Performance Considerations
Examples
Integration with Preprocessor
Best Practices
Installation
Required dependencies: numpy
Quick Start
Class Reference
Constructor
input_size
: Dimension of input vectors (matches vocabulary size)learning_rate
: Step size for weight updates during training
Core Methods
predict_sparse(active_indices: List[int], total_size: int) → int
Makes prediction from sparse input representation.
Input:
active_indices
: Indices where vector has value 1total_size
: Total vector dimension
Output: 1 (action) or 0 (non-action)
Time Complexity: O(k) where k is number of active indices
predict_dense(input_vector: List[float]) → int
Makes prediction from dense input vector.
Input: Full input vector
Output: 1 (action) or 0 (non-action)
Time Complexity: O(n) where n is input size
train(training_data, is_sparse=True, max_epochs=100, early_stop=True) → None
Trains perceptron on labeled data.
Input:
training_data
: List of (features, label) pairsis_sparse
: Whether input uses sparse representationmax_epochs
: Maximum training iterationsearly_stop
: Whether to stop when fully converged
Time Complexity: O(e * d * n) where:
e is number of epochs
d is number of training examples
n is input dimension (or k for sparse inputs)
Implementation Details
Data Structures
weights
: numpy array of feature weightsbias
: Scalar bias termInput formats supported:
Sparse (indices of 1s)
Dense (full vector)
Training Process
Initialize zero weights
For each epoch:
Predict each training example
Update weights when predictions are wrong
Track errors for early stopping
Stop when:
No errors found (if early_stop=True)
Max epochs reached
Mathematics
Performance Considerations
Memory Efficiency
Sparse input support reduces memory usage
No storage of training history
Numpy for efficient array operations
Processing Speed
Early stopping reduces training time
Optimized sparse operations
Minimal memory allocations
Trade-offs
Retrains each time (speed vs persistence)
Binary output only (simplicity vs granularity)
Linear decision boundary (speed vs complexity)
Examples
Basic Classification
Integration with Preprocessor
Best Practices
Training
Match input_size to vocabulary size
Use sparse representation for text data
Enable early_stopping for faster training
Normalize input vectors if using dense format
Prediction
Validate input dimensions
Use sparse prediction for text classification
Handle empty inputs gracefully
Integration
Pair with LightPreprocessor
Validate vocabulary size matching
Consider input vector sparsity
Error Handling
Check input dimensions
Validate index ranges
Handle empty feature vectors
Contributing
To contribute to this perceptron:
Maintain focus on text classification
Keep memory efficiency in mind
Add thorough documentation
Include usage examples
Last updated