Automated Alignment Researchers: Using large language models to scale scalable oversight
…When we prescribed a specific workflow (“propose ideas, then generate a plan, then write the code…”), we found we’d ultimately constrained Claude’s work. Left to its own devices, Claude was…