Skip to content

ATAC-seq Workflow

The ATAC-seq workflow in FlowAgent implements comprehensive analysis of chromatin accessibility data.

Workflow Steps

  1. Quality Control
  2. FastQC analysis
  3. Fragment size distribution
  4. Nucleosome positioning
  5. Library complexity

  6. Alignment

  7. Read alignment
  8. Duplicate removal
  9. Mitochondrial filtering
  10. Quality filtering

  11. Peak Calling

  12. Accessibility peaks
  13. Signal normalization
  14. IDR analysis
  15. Peak annotation

  16. Footprinting

  17. TF footprint detection
  18. Motif enrichment
  19. Binding dynamics
  20. Factor activity

  21. Integration

  22. ChIP-seq correlation
  23. Gene expression
  24. Chromatin state
  25. Regulatory networks

Custom Script Integration Points

The ATAC-seq workflow supports custom scripts at various stages:

Pre-processing

  • Custom filtering
  • Quality metrics
  • Fragment analysis

Peak Analysis

  • Custom peak calling
  • Signal processing
  • Feature detection

Integration

  • Multi-omics analysis
  • Network inference
  • Visualization tools

Example: Custom Footprint Detector

# custom_footprints.py
import numpy as np
import pandas as pd
from scipy.stats import gaussian_kde

def detect_footprints(signal_data, motifs, params):
    """Detect transcription factor footprints."""
    window_size = params['window_size']
    min_depth = params['min_depth']

    footprints = []
    for motif in motifs:
        # Get signal around motif
        signal = get_signal_matrix(signal_data, motif, window_size)

        # Calculate protection score
        protection = calculate_protection(signal, window_size)

        # Call footprints
        if protection > min_depth:
            footprints.append({
                'motif': motif.name,
                'position': motif.position,
                'score': protection,
                'signal': signal.mean(axis=0)
            })

    return pd.DataFrame(footprints)

def calculate_protection(signal, window_size):
    """Calculate TF protection score."""
    flanks = np.concatenate([
        signal[:, :window_size//4],
        signal[:, -window_size//4:]
    ])
    center = signal[:, window_size//4:-window_size//4]

    flank_density = gaussian_kde(flanks.flatten())
    center_density = gaussian_kde(center.flatten())

    return flank_density.integrate_box_1d(0, np.inf) - \
           center_density.integrate_box_1d(0, np.inf)

Usage

from flowagent.core.workflow_executor import WorkflowExecutor

# Initialize workflow
executor = WorkflowExecutor(llm_interface)

# Execute ATAC-seq workflow with custom footprinting
results = await executor.execute_workflow(
    input_data={
        "fastq1": "read1.fastq",
        "fastq2": "read2.fastq",
        "genome": "reference.fa",
        "motifs": "motifs.txt"
    },
    workflow_type="atac_seq",
    custom_script_requests=["custom_footprints"]
)

Output Structure

results/
├── qc/
│   ├── fastqc/
│   ├── fragment_sizes.pdf
│   └── library_complexity.txt
├── alignment/
│   ├── filtered.bam
│   └── metrics.txt
├── peaks/
│   ├── peaks.narrowPeak
│   └── annotated_peaks.txt
├── footprints/
│   ├── footprints.bed
│   └── motif_enrichment.txt
└── integration/
    ├── chip_correlation/
    └── regulatory_network/

Quality Metrics

The workflow tracks various quality metrics:

  1. Library Quality
  2. Read quality scores
  3. Fragment size distribution
  4. Library complexity
  5. Mitochondrial content

  6. Signal Quality

  7. Signal-to-noise ratio
  8. Peak enrichment
  9. Reproducibility
  10. Coverage uniformity

  11. Analysis Quality

  12. Footprint depth
  13. Motif enrichment
  14. Integration scores
  15. Regulatory potential

Resource Requirements

Typical resource requirements for ATAC-seq analysis:

  • CPU: 8-16 cores
  • Memory: 32-64GB RAM
  • Storage: 50-100GB per sample
  • Time: 4-8 hours per sample

Best Practices

  1. Quality Control
  2. Filter low-quality reads
  3. Remove duplicates
  4. Check fragment sizes
  5. Monitor complexity

  6. Peak Calling

  7. Use appropriate parameters
  8. Consider replicates
  9. Validate peaks
  10. Annotate features

  11. Footprinting

  12. Optimize window size
  13. Use appropriate controls
  14. Consider dynamics
  15. Validate binding

  16. Integration

  17. Use matched samples
  18. Consider time points
  19. Validate networks
  20. Compare conditions