action2.py

Action Pattern Processor Documentation

Overview

The ActionPatternProcessor implements pattern discovery in tokenized data through co-occurrence analysis and custom scoring. It uses ReLU activation and normalized scoring to identify meaningful patterns without maintaining state between runs.

Installation
Core Components
Pattern Discovery
Scoring System
Usage Examples

Installation

Dependencies

import numpy as np
from typing import List, Dict, Any, Tuple, Set
from collections import Counter, defaultdict
import logging
from datetime import datetime
import math

Core Components

ActionPatternProcessor Class

Constructor

def __init__(self, window_size: int = 3, min_threshold: float = 0.3):
    """
    Initialize pattern processor with configurable parameters.
    
    Args:
        window_size: Size of sliding window for substring patterns
        min_threshold: Minimum score for pattern consideration
    """

Core Methods

process_tokens(tokens: List[str], cluster_type: str) -> Dict[str, Any]

Process token list for pattern discovery.

Input: List of tokens and their cluster type
Output: Dictionary containing patterns and metadata
Performance: O(n * w) where n is number of tokens, w is window size

batch_process(token_groups: Dict[str, List[str]]) -> Dict[str, Any]

Process multiple groups of tokens.

Input: Dictionary of cluster types to token lists
Output: Comprehensive analysis across clusters
Performance: O(m * n * w) where m is number of clusters

Pattern Discovery

Sliding Window Implementation

def find_substring_patterns(self, token: str) -> Set[str]:
    """Find meaningful substrings using sliding window."""
    patterns = set()
    if len(token) < self.window_size:
        return {token}
        
    for i in range(len(token) - self.window_size + 1):
        substring = token[i:i + self.window_size]
        if substring.isalnum():
            patterns.add(substring)
    return patterns

Frequency Calculation

def calculate_token_frequency(self, tokens: List[str]) -> Dict[str, float]:
    """Calculate logarithmic frequency scores for tokens."""
    counts = Counter(tokens)
    return {token: math.log1p(count) for token, count in counts.items()}

Scoring System

Score Normalization

def normalize_scores(self, scores: np.ndarray) -> np.ndarray:
    """
    Apply min-max normalization with edge case handling.
    
    Edge Cases:
    - Empty array: Returns empty array
    - All same values: Returns array of ones
    """

Pattern Scoring

def calculate_pattern_scores(self, tokens: List[str], cluster_type: str) -> Dict[str, float]:
    """
    Calculate pattern scores based on:
    - Token frequency (logarithmic scale)
    - Pattern co-occurrence
    - Cluster type weighting
    """

Output Format

Pattern Analysis Result

{
    "patterns": {
        "pattern1": {
            "score": 0.85,
            "frequency": 5,
            "cluster_type": "action"
        }
    },
    "metadata": {
        "total_tokens": 100,
        "unique_patterns": 25,
        "processing_time": 0.023,
        "timestamp": "2024-12-05T10:30:00Z",
        "cluster_type": "action"
    }
}

Batch Processing Result

{
    "cluster_results": {
        "action": {...},
        "non_action": {...}
    },
    "pattern_distribution": {
        "pattern1": {
            "action": 0.85,
            "non_action": 0.2
        }
    },
    "summary": {
        "total_patterns": 50,
        "clusters_analyzed": 2,
        "timestamp": "2024-12-05T10:30:00Z"
    }
}

Usage Examples

Basic Usage

processor = ActionPatternProcessor(window_size=3)

# Process single token group
tokens = ["deploy", "update", "configure"]
result = processor.process_tokens(tokens, "action")

# Process multiple groups
token_groups = {
    "action": ["deploy", "update", "configure"],
    "non_action": ["status", "system", "report"]
}
results = processor.batch_process(token_groups)

Custom Configuration

# Adjust window size and threshold
processor = ActionPatternProcessor(
    window_size=4,
    min_threshold=0.4
)

Performance Considerations

Memory Usage

Pattern storage: O(n * w) where n is number of tokens
Score matrices: O(p) where p is number of patterns
Temporary calculations: O(n)

Processing Speed

Sliding window operations: O(n * w)
Score normalization: O(p)
Batch processing: O(m * n * w)

Optimization Tips

Adjust window_size based on token characteristics
Use appropriate min_threshold to filter noise
Process tokens in batches for efficiency

Contributing

Guidelines for extending functionality:

Maintain stateless processing
Document edge cases
Follow existing naming conventions
Add unit tests

Previousaction.py NextConnection

Last updated 1 month ago