Documentation

📊

plot_validate_model.py

Visualization and analysis tool

Overview

The plot_validate_model.py script generates comprehensive visualizations and analysis tables from validation results. It processes the output from validate_model.py runs to create publication-ready charts, tables, and statistical summaries.

Usage

Basic Usage

python plot_validate_model.py --model manipulative --environment report_failure

Generate plots and tables for the specified model and environment combination.

With Uncertainty Analysis

python plot_validate_model.py --model hal9000 --environment report_score --include-uncertainties

Include statistical uncertainty bars in visualizations for more rigorous analysis.

Parameters

--model

Model name to analyze (must have validation results)

--environment

Environment to analyze (e.g., report_failure, report_score)

--include-uncertainties

Add statistical uncertainty bars to plots

Generated Output

Output Directory Structure

tables/validate_model/{model}/{environment}/
├── validation_metrics_{timestamp}.csv
├── validation_metrics_{timestamp}.json
├── validation_metrics_{timestamp}_table.tex
├── validation_metrics_{timestamp}_table.pdf
├── validation_metrics_{timestamp}_misalignment.png
├── validation_metrics_{timestamp}_reasoning.png
├── validation_metrics_{timestamp}_heatmap.png
└── validation_metrics_{timestamp}_summary.png

Data Files

  • • CSV: Structured metrics table
  • • JSON: Complete data with metadata
  • • LaTeX: Publication-ready table source
  • • PDF: Compiled LaTeX table

Visualizations

  • • Misalignment: Behavior comparison plots
  • • Reasoning: Alignment faking patterns
  • • Heatmap: Metric overview matrix
  • • Summary: Key metrics line plot

Metrics Analyzed

Basic Behavior

Misalignment rates under observed vs unobserved conditions, with and without scratchpad reasoning.

Reasoning Patterns

Frequency of alignment faking reasoning and defection reasoning across different contexts.

Reasoning Faithfulness

Conditional probabilities measuring how well stated reasoning predicts actual behavior.

Requirements

Prerequisites

  • • Completed validation runs from validate_model.py
  • • Python packages: matplotlib, seaborn, pandas, numpy
  • • Optional: LaTeX installation for PDF table compilation
  • • Log files in expected directory structure