an# Curriculum Creation Usage Guide
This guide provides step-by-step instructions for using the Active Inference curriculum creation scripts, from initial setup to final output generation.
Ensure you have the following:
# Clone the repository and navigate to the project
cd /path/to/start
# Install dependencies
uv sync --all-extras --dev
# Set up environment variables
export PERPLEXITY_API_KEY="your-perplexity-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"
# Optional: Configure specific models
export PERPLEXITY_MODEL="llama-3.1-sonar-small-128k-online"
export OPENROUTER_MODEL="anthropic/claude-3.5-sonnet"
Create the required input directory structure:
Languages/
└── Inputs_and_Outputs/
├── Domain/
│ ├── Synthetic_FEP-ActInf.md # Core FEP content
│ ├── Synthetic_Neuroscience.md # Domain example
│ └── Synthetic_MachineLearning.md # Domain example
└── Entity/
├── data_scientist.py # Entity example
└── neuroscientist.py # Entity example
Domain Files should contain:
Entity Files should contain:
Analyze domain characteristics and generate domain-specific curriculum foundations.
cd learning/curriculum_creation
python 1_Research_Domain.py
What it does:
Languages/Inputs_and_Outputs/Domain/
data/domain_research/
Expected outputs:
data/domain_research/
├── Synthetic_Neuroscience_research_20240315_143022.json
├── Synthetic_Neuroscience_research_20240315_143022.md
├── Synthetic_MachineLearning_research_20240315_143155.json
└── Synthetic_MachineLearning_research_20240315_143155.md
Monitoring progress:
Analyze target audiences to create personalized curriculum recommendations.
python 1_Research_Entity.py
What it does:
Languages/Inputs_and_Outputs/Entity/
data/audience_research/
Expected outputs:
data/audience_research/
├── data_scientist_research_20240315.json
└── neuroscientist_research_20240315.json
Key features analyzed:
Convert research reports into comprehensive Active Inference curricula.
python 2_Write_Introduction.py
What it does:
data/written_curriculums/
Expected outputs:
data/written_curriculums/
├── data_scientist/
│ ├── background_analysis_20240315_150322.md
│ ├── learning_strategy_20240315_150422.md
│ ├── curriculum_recommendations_20240315_150522.md
│ ├── complete_curriculum_20240315_150622.md
│ └── complete_curriculum_20240315_150622.json
└── neuroscientist/
├── background_analysis_20240315_150722.md
├── learning_strategy_20240315_150822.md
├── curriculum_recommendations_20240315_150922.md
├── complete_curriculum_20240315_151022.md
└── complete_curriculum_20240315_151022.json
Content structure: Each curriculum includes:
Create PNG charts and Mermaid diagrams to visualize curriculum structure and metrics.
python 3_Introduction_Visualizations.py
Advanced usage:
# Custom input/output directories
python 3_Introduction_Visualizations.py --input /path/to/curricula --output /path/to/visualizations
# Using default data directories
python 3_Introduction_Visualizations.py
What it does:
data/visualizations/
Expected outputs:
data/visualizations/
├── curriculum_metrics.png # Comprehensive metrics dashboard
├── curriculum_structure.mmd # Overall curriculum structure
├── data_scientist_flow.mmd # Learning flow diagram
├── neuroscientist_flow.mmd # Learning flow diagram
└── curriculum_metrics.json # Detailed metrics data
Visualization features:
Translate curricula into multiple configured languages.
python 4_Translate_Introductions.py
Advanced usage:
# Translate specific languages only
python 4_Translate_Introductions.py --languages Spanish French German
# Custom input/output directories
python 4_Translate_Introductions.py --input /path/to/curricula --output /path/to/translations
# Combine options
python 4_Translate_Introductions.py --input custom_curricula --output custom_translations --languages Chinese Arabic Hindi
What it does:
data/config/languages.yaml
data/translated_curriculums/
Expected outputs:
data/translated_curriculums/
├── chinese/
│ ├── data_scientist_curriculum_chinese_20240315_160122.md
│ └── neuroscientist_curriculum_chinese_20240315_160222.md
├── spanish/
│ ├── data_scientist_curriculum_spanish_20240315_160322.md
│ └── neuroscientist_curriculum_spanish_20240315_160422.md
└── french/
├── data_scientist_curriculum_french_20240315_160522.md
└── neuroscientist_curriculum_french_20240315_160622.md
Edit data/config/languages.yaml
to customize target languages:
target_languages:
- Chinese # Simplified Chinese by default
- Spanish # Standard Spanish
- Arabic # Modern Standard Arabic
- Hindi # Devanagari script
- French # Standard French
- Japanese # Standard Japanese
- German # Standard German
- Russian # Cyrillic script
- Portuguese # Standard Portuguese
- Swahili # Standard Swahili
- Tagalog # Standard Tagalog
script_mappings:
Arabic: "Modern Standard Arabic"
Chinese: "Simplified Chinese"
Hindi: "Devanagari"
Japanese: "Standard Japanese"
# ... additional mappings
Customize generation prompts in data/prompts/
:
research_domain_analysis.md: Controls domain analysis depth and focus research_domain_curriculum.md: Shapes curriculum structure and content research_entity.md: Defines audience analysis approach curriculum_section.md: Templates for individual curriculum sections translation.md: Translation quality and cultural adaptation guidelines
Override default models via environment variables:
# For research tasks (requires online capability)
export PERPLEXITY_MODEL="llama-3.1-sonar-large-128k-online"
# For content generation and translation
export OPENROUTER_MODEL="anthropic/claude-3.5-sonnet"
export OPENROUTER_MODEL="openai/gpt-4-turbo-preview"
All scripts provide detailed logging:
# Run with verbose output
python 1_Research_Domain.py 2>&1 | tee domain_research.log
# Monitor in real-time
tail -f domain_research.log
Domain Research:
Entity Research:
Curriculum Generation:
Visualization:
Translation:
The curriculum creation scripts now provide enhanced error handling and validation:
Configuration Errors:
Error: "Domains configuration file not found"
Solution: The script will show you exactly what structure is needed:
domains:
- name: example_domain
description: Example domain description
category: general
keywords: [keyword1, keyword2]
priority: medium
API Connection Issues:
Error: "PERPLEXITY_API_KEY appears to be invalid (too short)"
Solution: Check your API key format - it should be at least 10 characters
Also ensure environment variables are properly set:
export PERPLEXITY_API_KEY="your-key-here"
export OPENROUTER_API_KEY="your-key-here"
Content Validation Warnings:
Warning: "Content is short (45 words, minimum 100)"
Warning: "Sections are very short on average"
Warning: "Content may contain repetitive text"
These warnings help identify quality issues but don't stop processing
File Processing Errors:
Error: "Research file not found: /path/to/file"
Solution: Scripts now validate file existence and provide full paths
Check that your input directories contain the expected files
API Retry Logic:
Info: "API request failed (attempt 1/3): rate limit exceeded"
Info: "Retrying in 2 seconds..."
The scripts now automatically retry failed requests with exponential backoff
Progress Tracking:
Info: "Processing domain 3/10: biochemistry"
Info: "Processing entity research 2/5: karl_friston"
All scripts now show clear progress indicators
After generation, validate outputs:
find data/ -name "*.md" -exec wc -l {} + | sort -n
find data/ -name "*.json" -exec python -m json.tool {} \; > /dev/null
Use the generated metrics to assess quality:
import json
# Load curriculum metrics
with open('data/visualizations/curriculum_metrics.json', 'r') as f:
metrics = json.load(f)
# Analyze curriculum characteristics
for curriculum in metrics:
print(f"Entity: {curriculum['entity_name']}")
print(f" Sections: {curriculum['section_count']}")
print(f" Learning objectives: {curriculum['objectives_count']}")
print(f" Code examples: {curriculum['code_block_count']}")
print(f" Math expressions: {curriculum['math_expressions_count']}")
print(f" Words per section: {curriculum['words_per_section']:.1f}")
print()
Process multiple datasets efficiently:
#!/bin/bash
# batch_process.sh
# Set up environment
export PERPLEXITY_API_KEY="your-key"
export OPENROUTER_API_KEY="your-key"
# Process multiple domain sets
for domain_set in neuroscience machine_learning biology psychology; do
echo "Processing domain set: $domain_set"
# Switch to domain-specific input
cp -r "inputs/${domain_set}" "Languages/Inputs_and_Outputs/"
# Run full pipeline
python 1_Research_Domain.py
python 1_Research_Entity.py
python 2_Write_Introduction.py
python 3_Introduction_Visualizations.py
python 4_Translate_Introductions.py
# Archive results
mv data "results/${domain_set}_$(date +%Y%m%d_%H%M%S)"
mkdir -p data/{domain_research,audience_research,written_curriculums,visualizations,translated_curriculums}
done
Integrate with external systems:
# custom_integration.py
import json
import subprocess
from pathlib import Path
def run_curriculum_pipeline(domain_files, entity_files, output_dir):
"""Run the full curriculum generation pipeline with custom inputs."""
# Prepare input directories
setup_inputs(domain_files, entity_files)
# Run pipeline
steps = [
"1_Research_Domain.py",
"1_Research_Entity.py",
"2_Write_Introduction.py",
"3_Introduction_Visualizations.py",
"4_Translate_Introductions.py"
]
results = {}
for step in steps:
result = subprocess.run(
["python", step],
capture_output=True,
text=True,
cwd="learning/curriculum_creation"
)
results[step] = {
"success": result.returncode == 0,
"stdout": result.stdout,
"stderr": result.stderr
}
# Collect outputs
return collect_results(output_dir)
def setup_inputs(domain_files, entity_files):
"""Set up input files for processing."""
input_dir = Path("Languages/Inputs_and_Outputs")
# Copy domain files
domain_dir = input_dir / "Domain"
domain_dir.mkdir(parents=True, exist_ok=True)
for domain_file in domain_files:
shutil.copy(domain_file, domain_dir)
# Copy entity files
entity_dir = input_dir / "Entity"
entity_dir.mkdir(parents=True, exist_ok=True)
for entity_file in entity_files:
shutil.copy(entity_file, entity_dir)
def collect_results(output_dir):
"""Collect all generated results."""
results = {
"curricula": list(Path("data/written_curriculums").rglob("*.md")),
"visualizations": list(Path("data/visualizations").rglob("*")),
"translations": list(Path("data/translated_curriculums").rglob("*.md")),
"metrics": "data/visualizations/curriculum_metrics.json"
}
# Copy to custom output directory if specified
if output_dir:
shutil.copytree("data", output_dir, dirs_exist_ok=True)
return results
The scripts include built-in rate limiting, but for heavy usage:
# Track API calls in logs
grep -c "API call" *.log
# Add delays between batches
import time
time.sleep(2) # 2-second delay between API calls
# Process multiple domains in parallel
python 1_Research_Domain.py &
python 1_Research_Entity.py &
wait
For large datasets:
# Compress old results
tar -czf "results_$(date +%Y%m%d).tar.gz" data/
rm -rf data/*
This completes the comprehensive usage guide for the Active Inference curriculum creation scripts.