ChIP-seq Workflow¶
The ChIP-seq workflow in FlowAgent implements standard practices for chromatin immunoprecipitation sequencing analysis.
Workflow Steps¶
- Quality Control
- FastQC analysis
- Read quality assessment
- Contamination checking
-
Library complexity estimation
-
Alignment
- Bowtie2/BWA alignment
- Duplicate removal
- Quality filtering
-
Mapping statistics
-
Peak Calling
- MACS2/HOMER peak detection
- Signal-to-noise assessment
- Peak quality metrics
-
Replicate analysis
-
Motif Analysis
- De novo motif discovery
- Known motif enrichment
- Peak annotation
- Genomic distribution
Custom Script Integration Points¶
The ChIP-seq workflow supports custom scripts at key points:
Pre-processing¶
- Quality filtering
- Read trimming
- Input normalization
Peak Analysis¶
- Custom peak calling
- Signal processing
- Replicate handling
Downstream Analysis¶
- Custom annotations
- Specialized visualizations
- Integration with other data
Example: Custom Peak Analysis¶
# custom_peaks.py
import pandas as pd
from scipy import signal
def analyze_peaks(signal_file):
# Read signal data
signal_data = pd.read_csv(signal_file)
# Find peaks with custom parameters
peaks, properties = signal.find_peaks(
signal_data['intensity'],
height=0.5,
distance=50,
prominence=0.2
)
# Calculate metrics
peak_metrics = pd.DataFrame({
'position': peaks,
'height': properties['peak_heights'],
'prominence': properties['prominences'],
'width': properties['widths']
})
return {"peak_results": "peak_analysis.csv"}
Usage¶
from flowagent.core.workflow_executor import WorkflowExecutor
# Initialize workflow
executor = WorkflowExecutor(llm_interface)
# Execute ChIP-seq workflow with custom peak analysis
results = await executor.execute_workflow(
input_data={
"fastq": "input.fastq",
"control": "control.fastq"
},
workflow_type="chip_seq",
custom_script_requests=["custom_peak_analysis"]
)
Output Structure¶
results/
├── fastqc/
│ ├── fastqc_report.html
│ └── fastqc_data.txt
├── alignment/
│ ├── aligned.bam
│ └── alignment_stats.txt
├── peaks/
│ ├── peaks.narrowPeak
│ └── peak_summits.bed
└── motifs/
├── de_novo_motifs.txt
└── known_motifs.txt
Quality Metrics¶
Key quality metrics tracked:
- Sequencing Quality
- Base quality scores
- GC content
- Sequence duplication
-
Library complexity
-
Alignment Quality
- Mapping rate
- Duplicate rate
- Fragment size distribution
-
Coverage uniformity
-
Peak Quality
- Signal-to-noise ratio
- Peak width distribution
- Peak intensity distribution
- Replicate concordance
Resource Requirements¶
Typical resource requirements for ChIP-seq analysis:
- CPU: 8-16 cores
- Memory: 16-32GB RAM
- Storage: 20-50GB per sample
- Time: 2-4 hours per sample
Best Practices¶
- Quality Control
- Filter low-quality reads
- Remove PCR duplicates
-
Check for sample contamination
-
Alignment
- Use appropriate mapping parameters
- Handle multi-mapped reads
-
Filter low MAPQ alignments
-
Peak Calling
- Use appropriate control samples
- Set FDR thresholds
-
Consider peak types (narrow/broad)
-
Motif Analysis
- Use appropriate background models
- Consider peak rankings
- Validate with known motifs
Advanced Analysis¶
- Differential Binding
- Between conditions
- Between replicates
-
Statistical significance
-
Integration
- With RNA-seq data
- With other ChIP-seq data
-
With genomic annotations
-
Visualization
- Coverage plots
- Peak heatmaps
- Motif logos
- Genomic browsers