RNA-seq Normalization Example¶

This example demonstrates how to create a custom RNA-seq normalization script using DESeq2.

Script Overview¶

The script performs DESeq2 normalization on RNA-seq count data and outputs normalized counts.

Implementation¶

# custom_normalize.R
library(DESeq2)
library(jsonlite)

# Read input arguments
args_dict <- fromJSON(commandArgs(trailingOnly = TRUE)[1])

# Read counts matrix
counts <- read.csv(args_dict$counts_matrix, row.names=1)

# Create DESeq dataset
dds <- DESeqDataSetFromMatrix(
    countData = counts,
    colData = data.frame(condition=factor(colnames(counts))),
    design = ~ 1
)

# Perform normalization
dds <- estimateSizeFactors(dds)
normalized_counts <- counts(dds, normalized=TRUE)

# Write output
write.csv(normalized_counts, "normalized_counts.csv")
cat(toJSON(list(normalized_counts = "normalized_counts.csv")))

Metadata¶

{
    "name": "deseq2_normalize",
    "description": "Normalize RNA-seq counts using DESeq2",
    "script_file": "custom_normalize.R",
    "language": "R",
    "input_requirements": [
        {
            "name": "counts_matrix",
            "type": "file",
            "description": "CSV file containing raw counts matrix"
        }
    ],
    "output_types": [
        {
            "name": "normalized_counts",
            "type": "file",
            "description": "CSV file containing normalized counts"
        }
    ],
    "workflow_types": ["rna_seq"],
    "execution_order": {
        "after": ["feature_counts"],
        "before": ["differential_expression"]
    },
    "requirements": {
        "r_packages": ["DESeq2", "jsonlite"]
    }
}

Usage¶

from flowagent.core.workflow_executor import WorkflowExecutor

# Initialize workflow
executor = WorkflowExecutor(llm_interface)

# Execute workflow with custom normalization
results = await executor.execute_workflow(
    input_data={
        "counts_matrix": "raw_counts.csv"
    },
    workflow_type="rna_seq",
    custom_script_requests=["deseq2_normalize"]
)

# Access normalized counts
normalized_counts = pd.read_csv(results["normalized_counts"])

Output¶

The script produces a CSV file containing the normalized counts matrix, where: - Rows represent genes - Columns represent samples - Values are normalized counts

Quality Metrics¶

The normalization process tracks: - Size factors per sample - Count distributions - Normalization effectiveness - Sample correlations

Best Practices¶

Input Data
Use raw (unfiltered) counts
Include all samples
Verify gene names/IDs
Quality Control
Check for low counts
Verify sample grouping
Monitor outliers
Output Handling
Save normalized data
Document parameters
Track QC metrics