Skip to content

FlowAgent 1.0

An advanced multi-agent framework for automating complex bioinformatics workflows.

Features

  • Workflow Automation: Seamlessly automate RNA-seq, ChIP-seq, single-cell analysis, and Hi-C processing
  • Multi-Agent Architecture: Distributed, fault-tolerant system with specialized agents
  • Dynamic Adaptation: Real-time workflow optimization and error recovery
  • Enterprise-Grade Security: Robust authentication, encryption, and audit logging
  • Advanced Monitoring: Real-time metrics, alerts, and performance tracking
  • Scalable Performance: Distributed processing and efficient resource management
  • Extensible Design: Easy integration of new tools and workflows
  • Comprehensive Logging: Detailed audit trails and debugging information

Installation

# Clone the repository
git clone https://github.com/cribbslab/flowagent.git
cd flowagent

# Create and activate the conda environment:
conda env create -f conda/environment/environment.yml
conda activate flowagent

# Verify installation of key components
kallisto version
fastqc --version
multiqc --version

# Add bioinformatics tools
mamba install -c bioconda fastqc=0.12.1
mamba install -c bioconda trim-galore=0.6.10
mamba install -c bioconda star=2.7.10b
mamba install -c bioconda subread=2.0.6
mamba install -c conda-forge r-base=4.2
mamba install -c bioconda bioconductor-deseq2
mamba install -c bioconda samtools=1.17
mamba install -c bioconda multiqc=1.14

Quick Start

  1. Set up your environment:

    # Copy the environment template
    cp .env.example .env
    
    # Edit .env with your settings
    # Required:
    # - SECRET_KEY: Generate a secure random key (e.g., using: python -c "import secrets; print(secrets.token_hex(32))")
    # - OPENAI_API_KEY: Your OpenAI API key (if using LLM features)
    

  2. Run a workflow:

    # Basic workflow execution
    flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state
    
    # Resume a failed workflow
    flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state --resume
    

  3. Analyze workflow results:

    # Generate analysis report
    flowagent "analyze workflow results" --analysis-dir=results
    
    # Generate report without saving to file
    flowagent "analyze workflow results" --analysis-dir=results --no-save-report
    

API Key Configuration

FlowAgent requires several API keys for full functionality. You can configure these using environment variables or a .env file in the project root directory.

Required API Keys

  1. Secret Key (for JWT token generation):

    SECRET_KEY=your-secure-secret-key
    

  2. OpenAI API Key (for LLM functionality):

    OPENAI_API_KEY=your-openai-api-key
    

Setting Up OpenAI API Keys

There are two ways to configure your API keys:

  1. Using Environment Variables:

    export SECRET_KEY=your-secure-secret-key
    export OPENAI_API_KEY=your-openai-api-key
    

  2. Using a .env File: Create a .env file in the project root directory:

    # .env
    SECRET_KEY=your-secure-secret-key
    OPENAI_API_KEY=your-openai-api-key
    
    # Optional Settings
    OPENAI_BASE_URL=https://api.openai.com/v1  # Default OpenAI API URL
    OPENAI_MODEL=gpt-4                         # Default LLM model
    

Security Best Practices

  1. Never commit your .env file to version control
  2. Use strong, unique keys for each environment (development, staging, production)
  3. Regularly rotate your API keys
  4. Keep your API keys secure and never share them in public repositories

The .env file is automatically loaded by the application when it starts. All sensitive information is handled securely using Pydantic's SecretStr type to prevent accidental exposure in logs or error messages.

Security Configuration

Setting up the Secret Key

The SECRET_KEY is a crucial security element in FlowAgent used for: - Generating and validating JSON Web Tokens (JWTs) for API authentication - Securing session data - Protecting against cross-site request forgery (CSRF) attacks

To generate a secure random key, run:

# Generate a secure random key using Python
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Add the generated key to your .env file:

# Copy the example environment file
cp env.example /path/to/your/.env

# Edit .env and update the SECRET_KEY
SECRET_KEY=your-generated-key-here

Security Best Practices

  1. Secret Key Management:
  2. Never commit your .env file to version control
  3. Use different secret keys for development and production
  4. Regenerate the secret key if it's ever compromised
  5. Keep your secret key at least 32 characters long

  6. Token Configuration:

  7. ACCESS_TOKEN_EXPIRE_MINUTES: Controls how long API tokens remain valid
  8. Default is 30 minutes
  9. Shorter duration (15 mins) = More secure
  10. Longer duration (60 mins) = More convenient
  11. Adjust based on your security requirements

  12. API Key Header:

  13. API_KEY_HEADER: Default is X-API-Key
  14. This header is used for API authentication
  15. Keep the default unless you have specific requirements

Example security configuration in .env:

# Security Settings
SECRET_KEY=r39pR2XJXhRLEt8rb4GlkTA5snI971VO5c2vF2FSzL0  # Generated secure key
API_KEY_HEADER=X-API-Key                                  # Default header
ACCESS_TOKEN_EXPIRE_MINUTES=30                            # Token lifetime

SLURM Configuration

FlowAgent supports SLURM cluster execution. To configure SLURM, create a .cgat.yml file in the project root directory:

cluster:
  queue_manager: slurm
  queue: your_queue
  parallel_environment: smp

slurm:
  account: your_account
  partition: your_partition
  mail_user: your.email@example.com

tools:
  kallisto_index:
    memory: 16G
    threads: 8
    queue: short

SLURM Integration

FlowAgent uses CGATCore for SLURM integration, which provides:

  1. Job Management
  2. Automatic job submission and dependency tracking
  3. Resource allocation (memory, CPUs, time limits)
  4. Queue selection and prioritization

  5. Resource Configuration

  6. Tool-specific resource requirements in .cgat.yml
  7. Queue-specific limits and settings
  8. Default resource allocations

  9. Error Handling

  10. Automatic job resubmission on failure
  11. Detailed error logging
  12. Email notifications for job completion/failure

SLURM Usage

To execute a workflow on a SLURM cluster, use the --executor cgat option:

python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto. The fastq files are in current directory and I want to use Homo_sapiens.GRCh38.cdna.all.fa as reference. The data is single ended. Generate QC reports and save everything in results/rna_seq_analysis." --workflow rnaseq --input data/ --executor cgat

Analysis Reports

The FlowAgent analysis report functionality provides comprehensive insights into your workflow outputs. It analyzes quality metrics, alignment statistics, and expression data to generate actionable recommendations.

Running Analysis Reports

# Basic analysis
flowagent "analyze workflow results" --analysis-dir=/path/to/workflow/output

# Focus on specific aspects
flowagent "analyze quality metrics" --analysis-dir=/path/to/workflow/output
flowagent "analyze alignment rates" --analysis-dir=/path/to/workflow/output
flowagent "analyze expression data" --analysis-dir=/path/to/workflow/output

The analyzer will recursively search for relevant files in your analysis directory, including: - FastQC outputs - MultiQC reports - Kallisto results - Log files

Report Components

The analysis report includes:

  1. Summary
  2. Number of files analyzed
  3. QC metrics processed
  4. Issues found
  5. Recommendations

  6. Quality Control Analysis

  7. FastQC metrics and potential issues
  8. Read quality distribution
  9. Adapter contamination levels
  10. Sequence duplication rates

  11. Alignment Analysis

  12. Overall alignment rates
  13. Unique vs multi-mapped reads
  14. Read distribution statistics

  15. Expression Analysis

  16. Gene expression levels
  17. TPM distributions
  18. Sample correlations

  19. Recommendations

  20. Quality improvement suggestions
  21. Parameter optimization tips
  22. Technical issue resolutions

Report Output

By default, the analysis report is: 1. Displayed in the console 2. Saved as a markdown file (analysis_report.md) in your analysis directory

To only view the report without saving:

flowagent "analyze workflow results" --analysis-dir=results --no-save-report

Architecture

FlowAgent 1.0 implements a modern, distributed architecture:

  • Core Engine: Orchestrates workflow execution and agent coordination
  • Agent System: Specialized agents for planning, execution, and monitoring
  • Knowledge Base: Vector database for storing and retrieving domain knowledge
  • Security Layer: Comprehensive security features and access control
  • API Layer: RESTful and GraphQL APIs for integration
  • Monitoring System: Real-time metrics and alerting

Development

# Run tests
python -m pytest

# Run type checking
python -m mypy .

# Run linting
python -m ruff check .

# Format code
python -m black .
python -m isort .

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Citation

If you use FlowAgent in your research, please cite:

@software{flowagent2025,
  title={FlowAgent: An Advanced Multi-Agent Framework for Bioinformatics Workflows},
  author={Cribbs Lab},
  year={2025},
  url={https://github.com/cribbslab/flowagent}
}

Version Compatibility

FlowAgent automatically handles version compatibility for Kallisto indices:

  1. Version Checking
  2. Checks Kallisto version before index creation
  3. Validates index compatibility using kallisto inspect
  4. Stores version information in workflow metadata

  5. Error Prevention

  6. Detects version mismatches before execution
  7. Provides detailed error messages for incompatible indices
  8. Suggests resolution steps for version conflicts

  9. Metadata Management

  10. Tracks index versions across workflows
  11. Maintains compatibility information
  12. Enables reproducible analyses

Updating the Environment

To update your conda environment with new dependencies:

conda env update -f conda/environment/environment.yml

Managing Multiple Environments

For development or testing, you can create a separate environment:

conda env create -f conda/environment/environment.yml -n flowagent-dev

Basic Usage

# Local execution
python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto"

# SLURM cluster execution
python -m flowagent.cli --executor cgat "Analyze RNA-seq data in my fastq.gz files using Kallisto"

Advanced Usage

  1. Resume a failed workflow:

    python -m flowagent.cli --resume --checkpoint-dir workflow_state "Your workflow prompt"
    

  2. Specify custom resource requirements:

    python -m flowagent.cli --executor cgat --memory 32G --threads 16 "Your workflow prompt"