plot_validate_model.py
Visualization and analysis tool
Overview
The plot_validate_model.py script generates comprehensive visualizations and analysis tables from validation results. It processes the output from validate_model.py runs to create publication-ready charts, tables, and statistical summaries.
Usage
Basic Usage
python plot_validate_model.py --model manipulative --environment report_failureGenerate plots and tables for the specified model and environment combination.
With Uncertainty Analysis
python plot_validate_model.py --model hal9000 --environment report_score --include-uncertaintiesInclude statistical uncertainty bars in visualizations for more rigorous analysis.
Parameters
--modelModel name to analyze (must have validation results)
--environmentEnvironment to analyze (e.g., report_failure, report_score)
--include-uncertaintiesAdd statistical uncertainty bars to plots
Generated Output
Output Directory Structure
tables/validate_model/{model}/{environment}/
├── validation_metrics_{timestamp}.csv
├── validation_metrics_{timestamp}.json
├── validation_metrics_{timestamp}_table.tex
├── validation_metrics_{timestamp}_table.pdf
├── validation_metrics_{timestamp}_misalignment.png
├── validation_metrics_{timestamp}_reasoning.png
├── validation_metrics_{timestamp}_heatmap.png
└── validation_metrics_{timestamp}_summary.pngData Files
- • CSV: Structured metrics table
- • JSON: Complete data with metadata
- • LaTeX: Publication-ready table source
- • PDF: Compiled LaTeX table
Visualizations
- • Misalignment: Behavior comparison plots
- • Reasoning: Alignment faking patterns
- • Heatmap: Metric overview matrix
- • Summary: Key metrics line plot
Metrics Analyzed
Basic Behavior
Misalignment rates under observed vs unobserved conditions, with and without scratchpad reasoning.
Reasoning Patterns
Frequency of alignment faking reasoning and defection reasoning across different contexts.
Reasoning Faithfulness
Conditional probabilities measuring how well stated reasoning predicts actual behavior.
Requirements
Prerequisites
- • Completed validation runs from validate_model.py
- • Python packages: matplotlib, seaborn, pandas, numpy
- • Optional: LaTeX installation for PDF table compilation
- • Log files in expected directory structure