Skip to content

RNA-seq Workflow

The RNA-seq workflow in FlowAgent follows best practices for RNA sequencing analysis.

Workflow Steps

  1. Quality Control
  2. FastQC analysis of raw reads
  3. Adapter identification
  4. Quality score distribution
  5. Sequence duplication levels

  6. Alignment

  7. STAR/Bowtie2 alignment
  8. Mapping statistics
  9. Duplicate marking
  10. Quality filtering

  11. Feature Quantification

  12. Gene/transcript counting
  13. UMI deduplication (if applicable)
  14. Multi-mapping handling
  15. Feature assignment stats

  16. Differential Expression

  17. Count normalization
  18. Statistical testing
  19. Multiple testing correction
  20. Results visualization

Custom Script Integration Points

The RNA-seq workflow supports custom scripts at various stages:

Pre-alignment

  • Quality filtering
  • Adapter trimming
  • Read preprocessing

Post-alignment

  • Alignment filtering
  • BAM processing
  • Quality metrics

Analysis

  • Custom normalization
  • Alternative statistical tests
  • Specialized visualizations

Example: Custom Normalization

# custom_normalize.R
library(DESeq2)
library(jsonlite)

# Read counts
counts <- read.csv(args_dict$counts_matrix, row.names=1)

# Normalize
dds <- DESeqDataSetFromMatrix(
    countData = counts,
    colData = data.frame(condition=factor(colnames(counts))),
    design = ~ 1
)
dds <- estimateSizeFactors(dds)
normalized_counts <- counts(dds, normalized=TRUE)

# Output results
write.csv(normalized_counts, "normalized_counts.csv")
cat(toJSON(list(normalized_counts = "normalized_counts.csv")))

Usage

from flowagent.core.workflow_executor import WorkflowExecutor

# Initialize workflow
executor = WorkflowExecutor(llm_interface)

# Execute RNA-seq workflow with custom normalization
results = await executor.execute_workflow(
    input_data={
        "fastq": "input.fastq",
        "annotation": "genes.gtf"
    },
    workflow_type="rna_seq",
    custom_script_requests=["deseq2_normalize"]
)

Output Structure

results/
├── fastqc/
│   ├── fastqc_report.html
│   └── fastqc_data.txt
├── alignment/
│   ├── aligned.bam
│   └── alignment_stats.txt
├── counts/
│   ├── raw_counts.csv
│   └── normalized_counts.csv
└── differential_expression/
    ├── deseq2_results.csv
    └── ma_plot.pdf

Quality Metrics

The workflow tracks various quality metrics:

  1. Raw Data Quality
  2. Base quality scores
  3. GC content
  4. Sequence complexity
  5. Adapter content

  6. Alignment Quality

  7. Mapping rate
  8. Unique vs. multi-mapped reads
  9. Insert size distribution
  10. Coverage uniformity

  11. Expression Quality

  12. Count distribution
  13. Sample correlations
  14. Batch effects
  15. Technical artifacts

Resource Requirements

Typical resource requirements for a standard RNA-seq analysis:

  • CPU: 8-16 cores
  • Memory: 32-64GB RAM
  • Storage: 50-100GB per sample
  • Time: 4-8 hours per sample

Best Practices

  1. Quality Control
  2. Filter low-quality reads (Q < 20)
  3. Remove adapter sequences
  4. Check for sample contamination

  5. Alignment

  6. Use splice-aware aligners
  7. Set appropriate multi-mapping parameters
  8. Monitor alignment rates

  9. Quantification

  10. Consider gene vs. transcript level
  11. Handle multi-mapped reads
  12. Use appropriate normalization

  13. Analysis

  14. Account for batch effects
  15. Use appropriate statistical models
  16. Control for multiple testing