action2.py
Action Pattern Processor Documentation
Overview
The ActionPatternProcessor
implements pattern discovery in tokenized data through co-occurrence analysis and custom scoring. It uses ReLU activation and normalized scoring to identify meaningful patterns without maintaining state between runs.
Table of Contents
Installation
Core Components
Pattern Discovery
Scoring System
Usage Examples
Installation
Dependencies
import numpy as np
from typing import List, Dict, Any, Tuple, Set
from collections import Counter, defaultdict
import logging
from datetime import datetime
import math
Core Components
ActionPatternProcessor Class
Constructor
def __init__(self, window_size: int = 3, min_threshold: float = 0.3):
"""
Initialize pattern processor with configurable parameters.
Args:
window_size: Size of sliding window for substring patterns
min_threshold: Minimum score for pattern consideration
"""
Core Methods
process_tokens(tokens: List[str], cluster_type: str) -> Dict[str, Any]
Process token list for pattern discovery.
Input: List of tokens and their cluster type
Output: Dictionary containing patterns and metadata
Performance: O(n * w) where n is number of tokens, w is window size
batch_process(token_groups: Dict[str, List[str]]) -> Dict[str, Any]
Process multiple groups of tokens.
Input: Dictionary of cluster types to token lists
Output: Comprehensive analysis across clusters
Performance: O(m * n * w) where m is number of clusters
Pattern Discovery
Sliding Window Implementation
def find_substring_patterns(self, token: str) -> Set[str]:
"""Find meaningful substrings using sliding window."""
patterns = set()
if len(token) < self.window_size:
return {token}
for i in range(len(token) - self.window_size + 1):
substring = token[i:i + self.window_size]
if substring.isalnum():
patterns.add(substring)
return patterns
Frequency Calculation
def calculate_token_frequency(self, tokens: List[str]) -> Dict[str, float]:
"""Calculate logarithmic frequency scores for tokens."""
counts = Counter(tokens)
return {token: math.log1p(count) for token, count in counts.items()}
Scoring System
Score Normalization
def normalize_scores(self, scores: np.ndarray) -> np.ndarray:
"""
Apply min-max normalization with edge case handling.
Edge Cases:
- Empty array: Returns empty array
- All same values: Returns array of ones
"""
Pattern Scoring
def calculate_pattern_scores(self, tokens: List[str], cluster_type: str) -> Dict[str, float]:
"""
Calculate pattern scores based on:
- Token frequency (logarithmic scale)
- Pattern co-occurrence
- Cluster type weighting
"""
Output Format
Pattern Analysis Result
{
"patterns": {
"pattern1": {
"score": 0.85,
"frequency": 5,
"cluster_type": "action"
}
},
"metadata": {
"total_tokens": 100,
"unique_patterns": 25,
"processing_time": 0.023,
"timestamp": "2024-12-05T10:30:00Z",
"cluster_type": "action"
}
}
Batch Processing Result
{
"cluster_results": {
"action": {...},
"non_action": {...}
},
"pattern_distribution": {
"pattern1": {
"action": 0.85,
"non_action": 0.2
}
},
"summary": {
"total_patterns": 50,
"clusters_analyzed": 2,
"timestamp": "2024-12-05T10:30:00Z"
}
}
Usage Examples
Basic Usage
processor = ActionPatternProcessor(window_size=3)
# Process single token group
tokens = ["deploy", "update", "configure"]
result = processor.process_tokens(tokens, "action")
# Process multiple groups
token_groups = {
"action": ["deploy", "update", "configure"],
"non_action": ["status", "system", "report"]
}
results = processor.batch_process(token_groups)
Custom Configuration
# Adjust window size and threshold
processor = ActionPatternProcessor(
window_size=4,
min_threshold=0.4
)
Performance Considerations
Memory Usage
Pattern storage: O(n * w) where n is number of tokens
Score matrices: O(p) where p is number of patterns
Temporary calculations: O(n)
Processing Speed
Sliding window operations: O(n * w)
Score normalization: O(p)
Batch processing: O(m * n * w)
Optimization Tips
Adjust window_size based on token characteristics
Use appropriate min_threshold to filter noise
Process tokens in batches for efficiency
Contributing
Guidelines for extending functionality:
Maintain stateless processing
Document edge cases
Follow existing naming conventions
Add unit tests
Last updated
Was this helpful?