FlowAgent 1.0¶
An advanced multi-agent framework for automating complex bioinformatics workflows.
Features¶
- Workflow Automation: Seamlessly automate RNA-seq, ChIP-seq, single-cell analysis, and Hi-C processing
- Multi-Agent Architecture: Distributed, fault-tolerant system with specialized agents
- Dynamic Adaptation: Real-time workflow optimization and error recovery
- Enterprise-Grade Security: Robust authentication, encryption, and audit logging
- Advanced Monitoring: Real-time metrics, alerts, and performance tracking
- Scalable Performance: Distributed processing and efficient resource management
- Extensible Design: Easy integration of new tools and workflows
- Comprehensive Logging: Detailed audit trails and debugging information
Installation¶
# Clone the repository
git clone https://github.com/cribbslab/flowagent.git
cd flowagent
# Create and activate the conda environment:
conda env create -f conda/environment/environment.yml
conda activate flowagent
# Verify installation of key components
kallisto version
fastqc --version
multiqc --version
# Add bioinformatics tools
mamba install -c bioconda fastqc=0.12.1
mamba install -c bioconda trim-galore=0.6.10
mamba install -c bioconda star=2.7.10b
mamba install -c bioconda subread=2.0.6
mamba install -c conda-forge r-base=4.2
mamba install -c bioconda bioconductor-deseq2
mamba install -c bioconda samtools=1.17
mamba install -c bioconda multiqc=1.14
Quick Start¶
- 
Set up your environment: # Copy the environment template cp .env.example .env # Edit .env with your settings # Required: # - SECRET_KEY: Generate a secure random key (e.g., using: python -c "import secrets; print(secrets.token_hex(32))") # - OPENAI_API_KEY: Your OpenAI API key (if using LLM features)
- 
Run a workflow: # Basic workflow execution flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state # Resume a failed workflow flowagent "run rna-seq analysis" --checkpoint-dir=workflow_state --resume
- 
Analyze workflow results: # Generate analysis report flowagent "analyze workflow results" --analysis-dir=results # Generate report without saving to file flowagent "analyze workflow results" --analysis-dir=results --no-save-report
API Key Configuration¶
FlowAgent requires several API keys for full functionality. You can configure these using environment variables or a .env file in the project root directory.
Required API Keys¶
- 
Secret Key (for JWT token generation): SECRET_KEY=your-secure-secret-key
- 
OpenAI API Key (for LLM functionality): OPENAI_API_KEY=your-openai-api-key
Setting Up OpenAI API Keys¶
There are two ways to configure your API keys:
- 
Using Environment Variables: export SECRET_KEY=your-secure-secret-key export OPENAI_API_KEY=your-openai-api-key
- 
Using a .env File: Create a .envfile in the project root directory:# .env SECRET_KEY=your-secure-secret-key OPENAI_API_KEY=your-openai-api-key # Optional Settings OPENAI_BASE_URL=https://api.openai.com/v1 # Default OpenAI API URL OPENAI_MODEL=gpt-4 # Default LLM model
Security Best Practices¶
- Never commit your .envfile to version control
- Use strong, unique keys for each environment (development, staging, production)
- Regularly rotate your API keys
- Keep your API keys secure and never share them in public repositories
The .env file is automatically loaded by the application when it starts. All sensitive information is handled securely using Pydantic's SecretStr type to prevent accidental exposure in logs or error messages.
Security Configuration¶
Setting up the Secret Key¶
The SECRET_KEY is a crucial security element in FlowAgent used for:
- Generating and validating JSON Web Tokens (JWTs) for API authentication
- Securing session data
- Protecting against cross-site request forgery (CSRF) attacks
To generate a secure random key, run:
# Generate a secure random key using Python
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
Add the generated key to your .env file:
# Copy the example environment file
cp env.example /path/to/your/.env
# Edit .env and update the SECRET_KEY
SECRET_KEY=your-generated-key-here
Security Best Practices¶
- Secret Key Management:
- Never commit your .envfile to version control
- Use different secret keys for development and production
- Regenerate the secret key if it's ever compromised
- 
Keep your secret key at least 32 characters long 
- 
Token Configuration: 
- ACCESS_TOKEN_EXPIRE_MINUTES: Controls how long API tokens remain valid
- Default is 30 minutes
- Shorter duration (15 mins) = More secure
- Longer duration (60 mins) = More convenient
- 
Adjust based on your security requirements 
- 
API Key Header: 
- API_KEY_HEADER: Default is- X-API-Key
- This header is used for API authentication
- Keep the default unless you have specific requirements
Example security configuration in .env:
# Security Settings
SECRET_KEY=r39pR2XJXhRLEt8rb4GlkTA5snI971VO5c2vF2FSzL0  # Generated secure key
API_KEY_HEADER=X-API-Key                                  # Default header
ACCESS_TOKEN_EXPIRE_MINUTES=30                            # Token lifetime
SLURM Configuration¶
FlowAgent supports SLURM cluster execution. To configure SLURM, create a .cgat.yml file in the project root directory:
cluster:
  queue_manager: slurm
  queue: your_queue
  parallel_environment: smp
slurm:
  account: your_account
  partition: your_partition
  mail_user: your.email@example.com
tools:
  kallisto_index:
    memory: 16G
    threads: 8
    queue: short
SLURM Integration¶
FlowAgent uses CGATCore for SLURM integration, which provides:
- Job Management
- Automatic job submission and dependency tracking
- Resource allocation (memory, CPUs, time limits)
- 
Queue selection and prioritization 
- 
Resource Configuration 
- Tool-specific resource requirements in .cgat.yml
- Queue-specific limits and settings
- 
Default resource allocations 
- 
Error Handling 
- Automatic job resubmission on failure
- Detailed error logging
- Email notifications for job completion/failure
SLURM Usage¶
To execute a workflow on a SLURM cluster, use the --executor cgat option:
python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto. The fastq files are in current directory and I want to use Homo_sapiens.GRCh38.cdna.all.fa as reference. The data is single ended. Generate QC reports and save everything in results/rna_seq_analysis." --workflow rnaseq --input data/ --executor cgat
Analysis Reports¶
The FlowAgent analysis report functionality provides comprehensive insights into your workflow outputs. It analyzes quality metrics, alignment statistics, and expression data to generate actionable recommendations.
Running Analysis Reports¶
# Basic analysis
flowagent "analyze workflow results" --analysis-dir=/path/to/workflow/output
# Focus on specific aspects
flowagent "analyze quality metrics" --analysis-dir=/path/to/workflow/output
flowagent "analyze alignment rates" --analysis-dir=/path/to/workflow/output
flowagent "analyze expression data" --analysis-dir=/path/to/workflow/output
The analyzer will recursively search for relevant files in your analysis directory, including: - FastQC outputs - MultiQC reports - Kallisto results - Log files
Report Components¶
The analysis report includes:
- Summary
- Number of files analyzed
- QC metrics processed
- Issues found
- 
Recommendations 
- 
Quality Control Analysis 
- FastQC metrics and potential issues
- Read quality distribution
- Adapter contamination levels
- 
Sequence duplication rates 
- 
Alignment Analysis 
- Overall alignment rates
- Unique vs multi-mapped reads
- 
Read distribution statistics 
- 
Expression Analysis 
- Gene expression levels
- TPM distributions
- 
Sample correlations 
- 
Recommendations 
- Quality improvement suggestions
- Parameter optimization tips
- Technical issue resolutions
Report Output¶
By default, the analysis report is:
1. Displayed in the console
2. Saved as a markdown file (analysis_report.md) in your analysis directory
To only view the report without saving:
flowagent "analyze workflow results" --analysis-dir=results --no-save-report
Architecture¶
FlowAgent 1.0 implements a modern, distributed architecture:
- Core Engine: Orchestrates workflow execution and agent coordination
- Agent System: Specialized agents for planning, execution, and monitoring
- Knowledge Base: Vector database for storing and retrieving domain knowledge
- Security Layer: Comprehensive security features and access control
- API Layer: RESTful and GraphQL APIs for integration
- Monitoring System: Real-time metrics and alerting
Development¶
# Run tests
python -m pytest
# Run type checking
python -m mypy .
# Run linting
python -m ruff check .
# Format code
python -m black .
python -m isort .
Contributing¶
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
License¶
MIT License - see LICENSE file for details
Citation¶
If you use FlowAgent in your research, please cite:
@software{flowagent2025,
  title={FlowAgent: An Advanced Multi-Agent Framework for Bioinformatics Workflows},
  author={Cribbs Lab},
  year={2025},
  url={https://github.com/cribbslab/flowagent}
}
Version Compatibility¶
FlowAgent automatically handles version compatibility for Kallisto indices:
- Version Checking
- Checks Kallisto version before index creation
- Validates index compatibility using kallisto inspect
- 
Stores version information in workflow metadata 
- 
Error Prevention 
- Detects version mismatches before execution
- Provides detailed error messages for incompatible indices
- 
Suggests resolution steps for version conflicts 
- 
Metadata Management 
- Tracks index versions across workflows
- Maintains compatibility information
- Enables reproducible analyses
Updating the Environment¶
To update your conda environment with new dependencies:
conda env update -f conda/environment/environment.yml
Managing Multiple Environments¶
For development or testing, you can create a separate environment:
conda env create -f conda/environment/environment.yml -n flowagent-dev
Basic Usage¶
# Local execution
python -m flowagent.cli "Analyze RNA-seq data in my fastq.gz files using Kallisto"
# SLURM cluster execution
python -m flowagent.cli --executor cgat "Analyze RNA-seq data in my fastq.gz files using Kallisto"
Advanced Usage¶
- 
Resume a failed workflow: python -m flowagent.cli --resume --checkpoint-dir workflow_state "Your workflow prompt"
- 
Specify custom resource requirements: python -m flowagent.cli --executor cgat --memory 32G --threads 16 "Your workflow prompt"