Coding agents are useful for quickly making complex code changes. FL experiments differ from ordinary local model tuning because the correctness of the experiment depends on a contract among the server, clients, model updates, metadata, data splits, and evaluation logic. A candidate can raise the reported score while quietly changing what is being compared—for example, by altering the evaluation data, model capacity, communication budget, local compute, or server-client update semantics. Auto-FL makes the research loop explicit. The agent begins with program.md, which acts as the control plane
NVIDIA FLARE Auto-FL is an automated, AI-driven research loop designed to test and optimize federated learning strategies. The idea is straightforward: start with a comparable benchmark task, give the agent a clear research control plane, set a fixed training budget, constrain the mutation surface, and record every result in an experiment ledger. From there, the agent can autonomously iterate through candidate FL strategies while preserving the FLARE Client API and Recipe API contracts. Rather than handing an agent an open-ended research problem, Auto-FL begins with a fair, comparable benchma
What is the function of literature-grounded recovery?
Auto-FL tracks the performance in a ledger (results.tsv). A useful campaign should not continue making small local changes after the ledger shows that a search direction has stalled. Hence, a literature-grounded recovery path has been included for that moment. The agent uses the ledger to summarize the current best stack, recent candidates, repeated crashes, null or worse ideas, and the active mutation contract. When the run appears to plateau, the workflow shifts from local sweeps to a source-backed literature loop. The goal is to stop guessing, identify what kind of failure mode the campaign
After a human manually stops an Auto-FL campaign, the reporting skill is used on the experiment branch that contains results.tsv. It creates a final progress plot, writes a report, and commits the reporting artifacts. That final report is the bridge between autonomous iteration and researcher review. It summarizes the baseline and best score, absolute and relative lift, runtime cost, final stack, crash notes, null or worse ideas, and recommended next-step experiments. In the Auto-FL loop, discarded candidates stay visible in the committed ledger, while kept code changes are committed on the ex