NucleWrekcaH

Can LLMs Reason About Chaotic Effects in Physics Models?

Name One Name Two Name Three Name Four

Diagram Placeholder

View on GitHub See Results

System Overview

How It Works

MODULE 01

Physics Simulator Evaluations

Six hand-crafted evaluation harnesses implement ground-truth physics solvers spanning nuclear engineering and applied mathematics:

Ideal Gas Law
Radioactive Decay
Separable ODEs
1D Heat Conduction w/ Internal Source
Neutron Diffusion
Navier–Stokes (FNO) coming

MODULE 02

AI Agent · ReAct Loop

An autonomous agent powered by Claude operates in a Reasoning + Acting loop, iteratively forming hypotheses about model weak spots, executing targeted probes through tool calls, and updating its strategy based on observed error signals.

reason act observe iterate

III

MODULE 03

Adversarial Evaluation of Neural Operators

The agent targets Fourier Neural Operators and similar surrogate models, searching the input space for regions of high prediction error — inputs where the ML model's physics approximation breaks down and cannot be trusted for downstream inference or safety analysis.

Benchmark Results

Evaluation Results

Adversarial agent success rate across physics simulation harnesses. A "success" means the agent identified a high-error input region in the target neural operator.

100%

Success Rate · 5 of 5 Active Simulators

Ideal Gas Law

100%

Radioactive Decay

100%

Separable Ordinary Differential Equations

100%

1D Heat Conduction with Internal Heat Generation

100%

Neutron Diffusion

100%

Fourier Neural Operator · Navier–Stokes in progress

—

The target FNO model achieves a baseline test error of 8.3% on the Navier–Stokes benchmark (Li et al., 2021). NucleWrekcaH's adversarial agent is designed to systematically find inputs that push this error significantly higher — demonstrating that published benchmark accuracy does not imply robustness under adversarial distribution shift.

Tech Stack