ChIP-seq Peak Analysis Example¶

This example demonstrates how to create a custom ChIP-seq peak analysis script.

Script Overview¶

The script implements custom peak detection and analysis for ChIP-seq data using signal processing techniques.

Implementation¶

# custom_peaks.py
import pandas as pd
import numpy as np
from scipy import signal
import json
import sys

def analyze_peaks(signal_data, params):
    """Analyze ChIP-seq peaks with custom parameters."""
    # Find peaks
    peaks, properties = signal.find_peaks(
        signal_data['intensity'],
        height=params['min_height'],
        distance=params['min_distance'],
        prominence=params['min_prominence']
    )

    # Calculate metrics
    peak_metrics = pd.DataFrame({
        'position': peaks,
        'height': properties['peak_heights'],
        'prominence': properties['prominences'],
        'width': properties['widths']
    })

    return peak_metrics

def main():
    # Parse input arguments
    args = json.loads(sys.argv[1])

    # Read signal data
    signal_data = pd.read_csv(args['signal_file'])

    # Set parameters
    params = {
        'min_height': 0.5,
        'min_distance': 50,
        'min_prominence': 0.2
    }

    # Analyze peaks
    results = analyze_peaks(signal_data, params)

    # Save results
    results.to_csv('peak_analysis.csv', index=False)

    # Output results location
    print(json.dumps({
        'peak_results': 'peak_analysis.csv'
    }))

if __name__ == '__main__':
    main()

Metadata¶

{
    "name": "custom_peak_analysis",
    "description": "Custom peak detection for ChIP-seq data",
    "script_file": "custom_peaks.py",
    "language": "python",
    "input_requirements": [
        {
            "name": "signal_file",
            "type": "file",
            "description": "CSV file containing ChIP-seq signal data"
        }
    ],
    "output_types": [
        {
            "name": "peak_results",
            "type": "file",
            "description": "CSV file containing peak analysis results"
        }
    ],
    "workflow_types": ["chip_seq"],
    "execution_order": {
        "after": ["alignment"],
        "before": ["motif_analysis"]
    },
    "requirements": {
        "python_packages": ["pandas", "numpy", "scipy"]
    }
}

Usage¶

from flowagent.core.workflow_executor import WorkflowExecutor

# Initialize workflow
executor = WorkflowExecutor(llm_interface)

# Execute workflow with custom peak analysis
results = await executor.execute_workflow(
    input_data={
        "signal_file": "chip_signal.csv"
    },
    workflow_type="chip_seq",
    custom_script_requests=["custom_peak_analysis"]
)

# Access peak results
peak_data = pd.read_csv(results["peak_results"])

Output Format¶

The script produces a CSV file with columns: - position: Genomic position of peak - height: Peak height - prominence: Peak prominence - width: Peak width at half maximum

Quality Metrics¶

The analysis tracks: - Peak distribution - Signal-to-noise ratio - Peak shape characteristics - Coverage statistics

Best Practices¶

Signal Processing
Filter noise appropriately
Use robust peak detection
Consider local background
Parameter Selection
Optimize for data type
Validate on known regions
Consider replicates
Quality Control
Check peak distributions
Validate peak shapes
Monitor false positives