Skip to content

ChIP-seq Workflow

The ChIP-seq workflow in FlowAgent implements standard practices for chromatin immunoprecipitation sequencing analysis.

Workflow Steps

  1. Quality Control
  2. FastQC analysis
  3. Read quality assessment
  4. Contamination checking
  5. Library complexity estimation

  6. Alignment

  7. Bowtie2/BWA alignment
  8. Duplicate removal
  9. Quality filtering
  10. Mapping statistics

  11. Peak Calling

  12. MACS2/HOMER peak detection
  13. Signal-to-noise assessment
  14. Peak quality metrics
  15. Replicate analysis

  16. Motif Analysis

  17. De novo motif discovery
  18. Known motif enrichment
  19. Peak annotation
  20. Genomic distribution

Custom Script Integration Points

The ChIP-seq workflow supports custom scripts at key points:

Pre-processing

  • Quality filtering
  • Read trimming
  • Input normalization

Peak Analysis

  • Custom peak calling
  • Signal processing
  • Replicate handling

Downstream Analysis

  • Custom annotations
  • Specialized visualizations
  • Integration with other data

Example: Custom Peak Analysis

# custom_peaks.py
import pandas as pd
from scipy import signal

def analyze_peaks(signal_file):
    # Read signal data
    signal_data = pd.read_csv(signal_file)

    # Find peaks with custom parameters
    peaks, properties = signal.find_peaks(
        signal_data['intensity'],
        height=0.5,
        distance=50,
        prominence=0.2
    )

    # Calculate metrics
    peak_metrics = pd.DataFrame({
        'position': peaks,
        'height': properties['peak_heights'],
        'prominence': properties['prominences'],
        'width': properties['widths']
    })

    return {"peak_results": "peak_analysis.csv"}

Usage

from flowagent.core.workflow_executor import WorkflowExecutor

# Initialize workflow
executor = WorkflowExecutor(llm_interface)

# Execute ChIP-seq workflow with custom peak analysis
results = await executor.execute_workflow(
    input_data={
        "fastq": "input.fastq",
        "control": "control.fastq"
    },
    workflow_type="chip_seq",
    custom_script_requests=["custom_peak_analysis"]
)

Output Structure

results/
├── fastqc/
│   ├── fastqc_report.html
│   └── fastqc_data.txt
├── alignment/
│   ├── aligned.bam
│   └── alignment_stats.txt
├── peaks/
│   ├── peaks.narrowPeak
│   └── peak_summits.bed
└── motifs/
    ├── de_novo_motifs.txt
    └── known_motifs.txt

Quality Metrics

Key quality metrics tracked:

  1. Sequencing Quality
  2. Base quality scores
  3. GC content
  4. Sequence duplication
  5. Library complexity

  6. Alignment Quality

  7. Mapping rate
  8. Duplicate rate
  9. Fragment size distribution
  10. Coverage uniformity

  11. Peak Quality

  12. Signal-to-noise ratio
  13. Peak width distribution
  14. Peak intensity distribution
  15. Replicate concordance

Resource Requirements

Typical resource requirements for ChIP-seq analysis:

  • CPU: 8-16 cores
  • Memory: 16-32GB RAM
  • Storage: 20-50GB per sample
  • Time: 2-4 hours per sample

Best Practices

  1. Quality Control
  2. Filter low-quality reads
  3. Remove PCR duplicates
  4. Check for sample contamination

  5. Alignment

  6. Use appropriate mapping parameters
  7. Handle multi-mapped reads
  8. Filter low MAPQ alignments

  9. Peak Calling

  10. Use appropriate control samples
  11. Set FDR thresholds
  12. Consider peak types (narrow/broad)

  13. Motif Analysis

  14. Use appropriate background models
  15. Consider peak rankings
  16. Validate with known motifs

Advanced Analysis

  1. Differential Binding
  2. Between conditions
  3. Between replicates
  4. Statistical significance

  5. Integration

  6. With RNA-seq data
  7. With other ChIP-seq data
  8. With genomic annotations

  9. Visualization

  10. Coverage plots
  11. Peak heatmaps
  12. Motif logos
  13. Genomic browsers