FlowAgent 1.0¶

An advanced multi-agent framework for automating complex bioinformatics workflows.

Features¶

Workflow Automation: Seamlessly automate RNA-seq, ChIP-seq, single-cell analysis, and Hi-C processing
Multi-Agent Architecture: Distributed, fault-tolerant system with specialized agents
Dynamic Adaptation: Real-time workflow optimization and error recovery
Enterprise-Grade Security: Robust authentication, encryption, and audit logging
Advanced Monitoring: Real-time metrics, alerts, and performance tracking
Scalable Performance: Distributed processing and efficient resource management
Extensible Design: Easy integration of new tools and workflows
Comprehensive Logging: Detailed audit trails and debugging information

Installation¶

# Clone the repository
git clone https://github.com/cribbslab/flowagent.git
cd flowagent

# Create and activate the conda environment:
conda env create -f conda/environment/environment.yml
conda activate flowagent

# Verify installation of key components
kallisto version
fastqc --version
multiqc --version

# Add bioinformatics tools
mamba install -c bioconda fastqc=0.12.1
mamba install -c bioconda trim-galore=0.6.10
mamba install -c bioconda star=2.7.10b
mamba install -c bioconda subread=2.0.6
mamba install -c conda-forge r-base=4.2
mamba install -c bioconda bioconductor-deseq2
mamba install -c bioconda samtools=1.17
mamba install -c bioconda multiqc=1.14

Quick Start¶

Set up your environment:

# Copy the environment template
cp .env.example .env

# Edit .env with your settings
# Required:
# - SECRET_KEY: Generate a secure random key (e.g., using: python -c "import secrets; print(secrets.token_hex(32))")
# - OPENAI_API_KEY: Your OpenAI API key (if using LLM features)

Run a workflow:

# Basic workflow execution
flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state

# Resume a failed workflow
flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state --resume

Analyze workflow results:

# Generate analysis report
flowagent "analyze workflow results" --analysis-dir=results

# Generate report without saving to file
flowagent "analyze workflow results" --analysis-dir=results --no-save-report

API Key Configuration¶

FlowAgent requires several API keys for full functionality. You can configure these using environment variables or a .env file in the project root directory.

Required API Keys¶

Secret Key (for JWT token generation):
```
SECRET_KEY=your-secure-secret-key
```
OpenAI API Key (for LLM functionality):
```
OPENAI_API_KEY=your-openai-api-key
```

Setting Up OpenAI API Keys¶

There are two ways to configure your API keys:

Using Environment Variables:

export SECRET_KEY=your-secure-secret-key
export OPENAI_API_KEY=your-openai-api-key

Using a .env File: Create a .env file in the project root directory:

# .env
SECRET_KEY=your-secure-secret-key
OPENAI_API_KEY=your-openai-api-key

# Optional Settings
OPENAI_BASE_URL=https://api.openai.com/v1  # Default OpenAI API URL
OPENAI_MODEL=gpt-4                         # Default LLM model

Security Best Practices¶

Never commit your .env file to version control
Use strong, unique keys for each environment (development, staging, production)
Regularly rotate your API keys
Keep your API keys secure and never share them in public repositories

The .env file is automatically loaded by the application when it starts. All sensitive information is handled securely using Pydantic's SecretStr type to prevent accidental exposure in logs or error messages.

Security Configuration¶

Setting up the Secret Key¶

The SECRET_KEY is a crucial security element in FlowAgent used for: - Generating and validating JSON Web Tokens (JWTs) for API authentication - Securing session data - Protecting against cross-site request forgery (CSRF) attacks

To generate a secure random key, run:

# Generate a secure random key using Python
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Add the generated key to your .env file:

# Copy the example environment file
cp env.example /path/to/your/.env

# Edit .env and update the SECRET_KEY
SECRET_KEY=your-generated-key-here

Security Best Practices¶

Secret Key Management:
Never commit your .env file to version control
Use different secret keys for development and production
Regenerate the secret key if it's ever compromised
Keep your secret key at least 32 characters long
Token Configuration:
ACCESS_TOKEN_EXPIRE_MINUTES: Controls how long API tokens remain valid
Default is 30 minutes
Shorter duration (15 mins) = More secure
Longer duration (60 mins) = More convenient
Adjust based on your security requirements
API Key Header:
API_KEY_HEADER: Default is X-API-Key
This header is used for API authentication
Keep the default unless you have specific requirements

Example security configuration in .env:

# Security Settings
SECRET_KEY=r39pR2XJXhRLEt8rb4GlkTA5snI971VO5c2vF2FSzL0  # Generated secure key
API_KEY_HEADER=X-API-Key                                  # Default header
ACCESS_TOKEN_EXPIRE_MINUTES=30                            # Token lifetime

SLURM Configuration¶

FlowAgent supports SLURM cluster execution. To configure SLURM, create a .cgat.yml file in the project root directory:

cluster:
  queue_manager: slurm
  queue: your_queue
  parallel_environment: smp

slurm:
  account: your_account
  partition: your_partition
  mail_user: your.email@example.com

tools:
  kallisto_index:
    memory: 16G
    threads: 8
    queue: short

SLURM Integration¶

FlowAgent uses CGATCore for SLURM integration, which provides:

Job Management
Automatic job submission and dependency tracking
Resource allocation (memory, CPUs, time limits)
Queue selection and prioritization
Resource Configuration
Tool-specific resource requirements in .cgat.yml
Queue-specific limits and settings
Default resource allocations
Error Handling
Automatic job resubmission on failure
Detailed error logging
Email notifications for job completion/failure

SLURM Usage¶

To execute a workflow on a SLURM cluster, use the --executor cgat option:

python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto. The fastq files are in current directory and I want to use Homo_sapiens.GRCh38.cdna.all.fa as reference. The data is single ended. Generate QC reports and save everything in results/rna_seq_analysis." --workflow rnaseq --input data/ --executor cgat

Analysis Reports¶

The FlowAgent analysis report functionality provides comprehensive insights into your workflow outputs. It analyzes quality metrics, alignment statistics, and expression data to generate actionable recommendations.

Running Analysis Reports¶

# Basic analysis
flowagent "analyze workflow results" --analysis-dir=/path/to/workflow/output

# Focus on specific aspects
flowagent "analyze quality metrics" --analysis-dir=/path/to/workflow/output
flowagent "analyze alignment rates" --analysis-dir=/path/to/workflow/output
flowagent "analyze expression data" --analysis-dir=/path/to/workflow/output

The analyzer will recursively search for relevant files in your analysis directory, including: - FastQC outputs - MultiQC reports - Kallisto results - Log files

Report Components¶

The analysis report includes:

Summary
Number of files analyzed
QC metrics processed
Issues found
Recommendations
Quality Control Analysis
FastQC metrics and potential issues
Read quality distribution
Adapter contamination levels
Sequence duplication rates
Alignment Analysis
Overall alignment rates
Unique vs multi-mapped reads
Read distribution statistics
Expression Analysis
Gene expression levels
TPM distributions
Sample correlations
Recommendations
Quality improvement suggestions
Parameter optimization tips
Technical issue resolutions

Report Output¶

By default, the analysis report is: 1. Displayed in the console 2. Saved as a markdown file (analysis_report.md) in your analysis directory

To only view the report without saving:

flowagent "analyze workflow results" --analysis-dir=results --no-save-report

Architecture¶

FlowAgent 1.0 implements a modern, distributed architecture:

Core Engine: Orchestrates workflow execution and agent coordination
Agent System: Specialized agents for planning, execution, and monitoring
Knowledge Base: Vector database for storing and retrieving domain knowledge
Security Layer: Comprehensive security features and access control
API Layer: RESTful and GraphQL APIs for integration
Monitoring System: Real-time metrics and alerting

Development¶

# Run tests
python -m pytest

# Run type checking
python -m mypy .

# Run linting
python -m ruff check .

# Format code
python -m black .
python -m isort .

Contributing¶

Fork the repository
Create a feature branch
Make your changes
Run tests and linting
Submit a pull request

License¶

MIT License - see LICENSE file for details

Citation¶

If you use FlowAgent in your research, please cite:

@software{flowagent2025,
  title={FlowAgent: An Advanced Multi-Agent Framework for Bioinformatics Workflows},
  author={Cribbs Lab},
  year={2025},
  url={https://github.com/cribbslab/flowagent}
}

Version Compatibility¶

FlowAgent automatically handles version compatibility for Kallisto indices:

Version Checking
Checks Kallisto version before index creation
Validates index compatibility using kallisto inspect
Stores version information in workflow metadata
Error Prevention
Detects version mismatches before execution
Provides detailed error messages for incompatible indices
Suggests resolution steps for version conflicts
Metadata Management
Tracks index versions across workflows
Maintains compatibility information
Enables reproducible analyses

Updating the Environment¶

To update your conda environment with new dependencies:

conda env update -f conda/environment/environment.yml

Managing Multiple Environments¶

For development or testing, you can create a separate environment:

conda env create -f conda/environment/environment.yml -n flowagent-dev

Basic Usage¶

# Local execution
python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto"

# SLURM cluster execution
python -m flowagent.cli --executor cgat "Analyze RNA-seq data in my fastq.gz files using Kallisto"

Advanced Usage¶

Resume a failed workflow:

python -m flowagent.cli --resume --checkpoint-dir workflow_state "Your workflow prompt"

Specify custom resource requirements:

python -m flowagent.cli --executor cgat --memory 32G --threads 16 "Your workflow prompt"