{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Flexcompute Engineering",
  "home_page_url": "https://engineering.flexcompute.com",
  "feed_url": "https://engineering.flexcompute.com/feed.json",
  "description": "Essays, tutorials, and case studies on AI engineering, computational physics, photonics, and simulation from Flexcompute.",
  "items": [
    {
      "id": "https://engineering.flexcompute.com/articles/electrical-routing-agents/",
      "url": "https://engineering.flexcompute.com/articles/electrical-routing-agents/",
      "title": "Learning Auto-Routing by Building: From Brute Force to an Auto-Design Agent",
      "summary": "How I learned auto-routing for photonic chips — a failed brute-force attempt, AI as a learning partner, interactive HTML arenas, a PhotonForge router, and an agent that iterates 27 designs in under three minutes.",
      "image": "https://engineering.flexcompute.com/images/og/electrical-routing.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/electrical-routing.png",
      "date_published": "2026-04-20T00:00:00.000Z",
      "date_modified": "2026-04-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Prash Kharel",
          "path": "/authors/prash-kharel/",
          "url": "https://engineering.flexcompute.com/authors/prash-kharel/"
        }
      ],
      "tags": [
        "AI Engineering",
        "AI Agents",
        "Photonics"
      ],
      "content_html": "<p>I wanted to learn auto-routing for photonic chips. The story below is how I got there: a failed brute-force attempt, then using AI as a learning partner, then building interactive HTML routing arenas, then translating that into a PhotonForge router, and finally wrapping an auto-design agent around the whole thing.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/hero-chip.png\"\n    alt=\"Quantum photonic chip with dozens of active elements\"\n  />\n  <figcaption>\n    The kind of chip we end up routing: dozens of active elements, each needing a trace to a\n    bondpad.\n  </figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/hero-timelapse.gif\"\n    alt=\"Agent iterating on DC routing over 27 attempts\"\n  />\n  <figcaption>The final outcome: an agent iterating 27 designs in under 3 minutes.</figcaption>\n</figure>\n<h2>Brute forcing it</h2>\n<p>My first instinct was lazy and honest. I opened Claude Code, dropped in screenshots of a half-routed projector chip, and asked the model to fix the layout by editing PhotonForge's <code>route_manhattan</code> call. No algorithm in mind, no plan, just \"make this work.\"</p>\n<p>It sort of worked. And then it didn't.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/brute-force-attempt-1.png\"\n    alt=\"Brute-force routing attempt, routes bunched up and overlapping\"\n  />\n  <figcaption>First attempt: routes bunch up, overlap, cut through other pads.</figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/brute-force-attempt-2.png\"\n    alt=\"Second brute-force attempt after several rounds of fixes\"\n  />\n  <figcaption>\n    Second attempt after several rounds of tweaks: cleaner, but still hand-waved and fragile.\n  </figcaption>\n</figure>\n<p>Each round of \"please fix this\" traded one failure for another. Waypoints helped two nets and broke three. A tweak that untangled the left half pushed the mess to the right half. The AI was writing plausible code, and I was nodding along, but neither of us actually understood routing. We were patching symptoms.</p>\n<p>This is the trap that's easy to miss: AI is so good at producing code that reads right that you can spend days iterating without stepping back to ask the real question.</p>\n<h2>Using AI as a learning partner</h2>\n<p>So I stopped asking for code and started asking for a map of the territory. What does this problem look like when people have thought about it carefully?</p>\n<p>The AI pointed me at a 60-year-old field: VLSI and PCB autorouting. Lee's algorithm from 1961, A* with Manhattan heuristics, Dijkstra over weighted cost fields, PathFinder-style negotiation routing, rip-up and reroute from commercial EDA tools. And then it pointed me at the best single resource it knew of: Rob Rutenbar's <em>VLSI CAD: Logic to Layout</em> lecture series from the University of Illinois, weeks 7 and 8 specifically, free on the Internet Archive.</p>\n<p>I watched them. Not skimmed, watched. And this is where the story changes. From this point on, the AI is no longer guessing with me. It is helping me build and test ideas I actually understand.</p>\n<h2>Building an HTML routing arena</h2>\n<p>Routing is spatial. Reading about BFS wavefront expansion does not click until you <em>see</em> the wavefront propagate across a grid. So I asked the AI to build me interactive visualizers. Not in Python, not in a Jupyter notebook, in pure HTML with Canvas, so I could open them on my phone, on the couch, with zero setup.</p>\n<p>Each lecture concept turned into an interactive prototype within minutes. Click to place a source, a target, and obstacles. Step through BFS one layer at a time. Watch the wavefront wrap around walls. Trace the shortest path back. Then: add a second net, then four, then a cost-field heatmap for bundling.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/bundle-cost-field.png\"\n    alt=\"Bundle router with its cost-field overlay visible as a heatmap\"\n  />\n  <figcaption>\n    Bundle router with its cost-field overlay. Existing routes emit a soft attraction field; new\n    routes settle into the channel one cell away. Making the field visible made the algorithm\n    obvious.\n  </figcaption>\n</figure>\n<p>Once the building blocks worked, I stacked them into a comparison arena that runs twelve algorithms on the same obstacle layout and the same nets, side by side.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/algorithms-grid.png\"\n    alt=\"Grid comparing twelve routing algorithms on the same 5-net problem\"\n  />\n  <figcaption>\n    Twelve algorithms on one 5-net problem. Each has a different personality. A\\* and Corridor\n    complete all 5 nets but route wide. Rip-Up and Hybrid route shortest but leave some nets for\n    later. Bundle and Smooth Bundle hug existing routes. River fails topologically.\n  </figcaption>\n</figure>\n<p>The arena made trade-offs visceral. I could drag obstacle walls around with a slider and watch every algorithm respond in real time. New hybrid strategies (rip-up plus bundle, smooth bundle with turn-cost penalties) came out of playing with the arena, not from staring at pseudocode.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/bfs-vs-hybrid.png\"\n    alt=\"Side-by-side comparison of naive Sequential BFS and Hybrid Rip-Up plus Bundle\"\n  />\n  <figcaption>\n    The two ends of the spectrum on the same problem: naive Sequential BFS (left) and Hybrid Rip-Up\n    + Bundle (right). Same nets, same obstacles, very different cost and topology.\n  </figcaption>\n</figure>\n<h3>Why HTML was the right medium</h3>\n<p>I want to single this out because it felt counterintuitive at first. The algorithms themselves are Python-native, and the production target is PhotonForge. Why prototype in a browser?</p>\n<p>Because the point of the prototype was to build <em>my</em> intuition, not to ship code. HTML means zero setup, works on any device, and Canvas makes it trivial to animate the wavefront at a rate a human can follow. Every time I wanted to understand something — why does net ordering matter? what does the congestion history look like after iteration 5? what turn-penalty kills the sawtooth pattern? — I could have an interactive answer in minutes. That loop is what learning looked like.</p>\n<h2>Translating to a PhotonForge router</h2>\n<p>Once the algorithms were intuitive on an abstract grid, translating them to PhotonForge was mechanical. The architecture has two layers:</p>\n<ul>\n<li><strong>Algorithm layer:</strong> pure Python on an integer grid. Takes source and target cells, avoids obstacle cells, returns a list of grid coordinates. No dependency on PhotonForge.</li>\n<li><strong>Integration layer:</strong> a thin bridge that converts PhotonForge layout coordinates to grid cells, rasterizes structures on a given metal layer into obstacles, runs the algorithm, and converts the result back into a <code>Path</code> on the routing layer.</li>\n</ul>\n<pre><code class=\"language-python\">from pf_routing_arena import PFRoutingArena\nfrom routing_algorithms import HybridRipUpBundle\n\narena = PFRoutingArena(\n    bbox_min=(-200, -200), bbox_max=(2200, 1400),\n    grid_pitch=25, router_class=HybridRipUpBundle,\n)\narena.add_obstacles_from_layer(component, M2_layer, margin=1)\narena.add_obstacles_from_layer(component, M1_heater, margin=1)\n\nfor i in range(n_nets):\n    arena.add_terminal_pair(f'D{i}', left[i]['T0'], right[i]['T0'],\n                            source_dir='E', target_dir='W')\n\nresults = arena.route_all(trace_width=20, layer=M2_layer)\n</code></pre>\n<p>The integration work was almost entirely about the boundary between the grid world and the layout world: reserving a net's own pad exit zone while keeping other pads blocked, rasterizing heater pads on a different metal layer as routing obstacles, and enforcing a 1-cell margin between committed routes so they do not physically overlap at real trace widths.</p>\n<p>I prototyped all of this in a Jupyter notebook first, then moved it to the library. The notebook let me flip between algorithms on the same layout and see the difference, which is the same feedback loop the HTML arena gave me, just now in the domain that ships.</p>\n<h2>Scaling up: the auto-design agent</h2>\n<p>This is where I thought I was done. Twelve algorithms, a clean bridge to PhotonForge, a notebook that can route a small test layout. Ship it.</p>\n<p>Then I pointed it at a real quantum photonic chip with 32 active elements, each needing a metal trace to its own bondpad, with heaters and other obstacles scattered through the middle. Picking an algorithm was no longer the hard part. The hard parts were all the human decisions an engineer makes before the router ever runs: how to pair pins with bondpads, where to place the bondpad row, what directions to let routes approach from, how wide the trace should be.</p>\n<p>No single algorithm solves those. They are <em>design</em> decisions, and they interact. Change the bondpad row position and the net-to-pad assignment flips. Change the assignment and the approach directions become wrong. Change the trace width and obstacle inflation changes with it.</p>\n<p>This is the regime where an auto-design agent earns its keep. You give it a goal, the tools to build a layout, a design-rule checker that counts violations, and let it propose, try, measure, and revise.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/auto-design-loop.png\"\n    alt=\"The Auto-Design loop: human specifies goal, agent edits routing script, layout tool builds, DRC returns pass/fail, journal records each attempt\"\n  />\n  <figcaption>\n    The Auto-Design loop. A human writes the goal once. The agent edits the routing script,\n    PhotonForge builds the layout, the DRC returns pass or fail counts, and a journal remembers\n    every attempt.\n  </figcaption>\n</figure>\n<h3>The baseline was bad on purpose</h3>\n<p>The starting script was the laziest possible: call <code>route_manhattan</code> independently for each of 32 pin-to-pad connections and hope.</p>\n<p>That first attempt produced 192 violations: 30 heater crossings and 162 pairs of routes crossing each other.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/polygon-before.png\"\n    alt=\"Polygon view of the starting layout, with red hatches marking DRC violations\"\n  />\n  <figcaption>Polygon view with red hatches marking the 192 DRC violations.</figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/layout-before.png\"\n    alt=\"PhotonForge layout view of the naive starting routing\"\n  />\n  <figcaption>\n    The starting point in PhotonForge. Every net routes selfishly and hits everything else.\n  </figcaption>\n</figure>\n<h3>What the agent did</h3>\n<p>Over 27 iterations the agent:</p>\n<ul>\n<li>Switched from independent Manhattan routing to a grid-based planner so nets could share the chip.</li>\n<li>Taught the planner to treat heaters as obstacles, not just metal layers.</li>\n<li>Swept polygon inflation (2, 5, 8.5, 12 µm) to find the right trace-to-obstacle margin.</li>\n<li>Tried four pin-to-bondpad assignment strategies (split by AMZI, split by terminal, x-sort, nearest).</li>\n<li>Shifted the bondpad row farther from the active area and swept its spacing.</li>\n<li>Diagnosed a last-mile edge case where one contact sat awkwardly close to a heater and fixed it with a single approach-direction constraint.</li>\n</ul>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/progress-plot.png\"\n    alt=\"Plot of DRC violations and unrouted nets per iteration across 27 attempts\"\n  />\n  <figcaption>\n    Violations and unrouted nets across 27 iterations. The four green circles mark the passing\n    configurations.\n  </figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/polygon-after.png\"\n    alt=\"Polygon view of the final layout, all 32 nets clean\"\n  />\n  <figcaption>Polygon view, all 32 nets clean, zero DRC violations.</figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/electrical-routing/layout-after.png\"\n    alt=\"PhotonForge layout view of the final routing, foundry-ready\"\n  />\n  <figcaption>The agent's final layout in PhotonForge — foundry-ready.</figcaption>\n</figure>\n<p>Total wall-clock for all 27 iterations: <strong>2 minutes 25 seconds.</strong> A human engineer laying this out by hand — pairing pins to pads, sketching the bundles, chasing the last DRC violations — typically needs two to three hours.</p>\n<h2>What I learned</h2>\n<p>The shortest version of this story: <strong>the AI did not replace the thinking, it compressed the distance between question and interactive answer.</strong></p>\n<p>The brute-force attempt failed because I was outsourcing the understanding. The learning partner approach worked because I stopped. The lectures still had to be watched. The wavefront still had to be visualized with my own eyes. The pin-reservation bug still had to be diagnosed by me, not narrated by the model.</p>\n<p>But everything between \"I want to explore this idea\" and \"I can see this idea running\" collapsed. Interactive HTML prototype in minutes. PhotonForge bridge in an afternoon. Twelve algorithms ready to compare by the end of a weekend. And finally, an auto-design agent that picks up those algorithms and runs them 27 times in the time it takes to make coffee.</p>\n<p>You can outsource implementation friction, algorithm discovery, and visualization scaffolding. You cannot outsource understanding. The surprise is that this style of working actually forces deeper understanding than the old way, because you are constantly building and testing your mental model instead of reading about someone else's.</p>\n<p>Interested in testing these algorithms on your own layout? Reach out to <a href=\"mailto:prash@flexcompute.com\">prash@flexcompute.com</a>.</p>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/electrical-routing-agents.md",
          "mime_type": "text/markdown",
          "title": "Learning Auto-Routing by Building: From Brute Force to an Auto-Design Agent markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/electrical-routing.png",
          "mime_type": "image/png",
          "title": "Learning Auto-Routing by Building: From Brute Force to an Auto-Design Agent social image"
        }
      ],
      "_flexcompute": {
        "kind": "Essay",
        "tags": [
          "ai-engineering",
          "agents",
          "photonics"
        ],
        "markdown_url": "https://engineering.flexcompute.com/articles/electrical-routing-agents.md"
      }
    },
    {
      "id": "https://engineering.flexcompute.com/articles/mode-solver-memory-calibration/",
      "url": "https://engineering.flexcompute.com/articles/mode-solver-memory-calibration/",
      "title": "Predicting Peak Memory for an Electromagnetic Mode Solver",
      "summary": "How we replaced a heuristic memory estimate with a calibrated model for Tidy3D mode solver workloads, eliminating under-predictions across the calibration set.",
      "image": "https://engineering.flexcompute.com/images/og/mode-solver-memory-calibration.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/mode-solver-memory-calibration.png",
      "date_published": "2026-04-15T00:00:00.000Z",
      "date_modified": "2026-04-15T00:00:00.000Z",
      "authors": [
        {
          "name": "Momchil Minkov",
          "path": "/authors/momchil-minkov/",
          "url": "https://engineering.flexcompute.com/authors/momchil-minkov/"
        }
      ],
      "tags": [
        "Photonics",
        "Tidy3D",
        "Verification"
      ],
      "content_html": "<p>import MemoryScatterFigure from '../../components/MemoryScatterFigure.astro';</p>\n<p>Before launching a mode solve, the platform has to choose a machine and a parallelism strategy without yet knowing the solve's true peak memory. That estimate depends less on the sparse matrix size than on what happens around it: fill-in during factorization, whether the problem is real or complex, how much state forked workers can share, and how MPI overhead grows with rank count.</p>\n<p>Our original estimate was a heuristic, and it missed in both directions. Some jobs OOMed on the first worker and had to be retried on a larger one; others landed on larger machines than they needed from the start. We replaced it with a calibrated model built from measured runs. On the calibration dataset, it removed under-predictions and pulled small-job estimates closer to reality.</p>\n<div class=\"article-overview\">\n  <p class=\"article-overview__eyebrow\">At a glance</p>\n  <div class=\"article-overview__grid\">\n    <section>\n      <h3>Goal</h3>\n      <p>Predict peak mode-solver memory before launch well enough to choose machines safely.</p>\n    </section>\n    <section>\n      <h3>Method</h3>\n      <p>\n        Fit an affine model, detect matrix regime from metadata, and model copy-on-write and MPI\n        overhead explicitly.\n      </p>\n    </section>\n    <section>\n      <h3>Outcome</h3>\n      <p>Zero under-predictions on the calibration set, with less excess padding on small jobs.</p>\n    </section>\n  </div>\n</div>\n<h2>Why the Old Estimate Broke</h2>\n<p>A mode solver finds the guided electromagnetic modes of a waveguide by assembling a sparse eigenvalue problem and extracting a small number of eigenpairs with shift-invert iteration. The expensive step is LU factorization of the shifted matrix <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>A</mi><mo>−</mo><mi>σ</mi><mi>I</mi></mrow><annotation encoding=\"application/x-tex\">A - \\sigma I</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">A</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span></span></span></span>. Most of the peak memory comes from fill-in during that factorization, not from the original sparse matrix itself.</p>\n<p>The scheduler has to estimate that peak before the solve starts, because it needs to choose both a machine and a parallelism plan. Two execution modes make that harder:</p>\n<ul>\n<li>Frequency parallel: a Python <code>multiprocessing.Pool</code> of <code>N</code> workers, each solving one frequency independently. Total memory is not <code>N</code> times a single solve because forked workers share the parent process image via copy-on-write.</li>\n<li>MPI parallel: a distributed direct solver that uses <code>K</code> MPI ranks for the LU factorization. Overhead scales sublinearly with rank count.</li>\n</ul>\n<p>The old estimate knew about these backends in a coarse way, but it missed several effects that turned out to matter. It did not distinguish lossless from lossy materials, or PML from non-PML boundaries, even though either can double or quadruple the effective matrix size. It also ignored mode count, sequential frequency accumulation, and copy-on-write sharing. In practice, it was optimistic in some regions and overly padded in others.</p>\n<p><MemoryScatterFigure\nid=\"legacy-scatter\"\nvariant=\"legacy\"\ncaptionTitle=\"Legacy model.\"\ncaptionBody=\"On the calibration dataset, many runs sit above the diagonal, meaning the system would allocate less memory than the solver actually needed.\"\n/></p>\n<div class=\"article-note article-note--warning\">\n  <p class=\"article-note__label\">Why the baseline failed</p>\n  <p>\n    <strong>Many runs sit above the diagonal.</strong> Each such point is a job the scheduler would\n    have started on too little memory, forcing a retry on a larger worker.\n  </p>\n</div>\n<h2>What Drives Memory in a Sparse Eigensolver</h2>\n<p>The shift-invert eigensolver builds a sparse matrix and factors it. Memory is dominated by fill-in: the additional non-zero entries created during LU factorization. For a 2D modal cross-section with <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span> grid points, the matrix structure depends on the physics:</p>\n<table>\n<thead>\n<tr>\n<th>Regime</th>\n<th>Typical trigger</th>\n<th>Effective matrix form</th>\n<th>Relative weight</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Scalar real</td>\n<td>Lossless materials without PML</td>\n<td><code>2N x 2N</code> real</td>\n<td>Baseline</td>\n</tr>\n<tr>\n<td>Scalar complex</td>\n<td>Lossy materials or PML boundaries</td>\n<td><code>2N x 2N</code> complex, often expanded to <code>4N x 4N</code> real</td>\n<td>Roughly <code>4x</code> the scalar-real non-zeros</td>\n</tr>\n<tr>\n<td>Tensorial</td>\n<td>Fully anisotropic media or off-axis propagation</td>\n<td><code>4N x 4N</code> complex</td>\n<td>About <code>5x</code> the scalar-real memory at the same grid size</td>\n</tr>\n</tbody>\n</table>\n<p><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span> can range from a few thousand to nearly a million, and the matrix dimension determines fill-in volume. A flat per-point heuristic will be right for one regime and wrong for the others.</p>\n<h2>Predicting Matrix Structure From Metadata</h2>\n<p>Choosing the right memory model before launch means inferring two things from the simulation definition alone: whether the problem is tensorial, and whether the matrix will be complex. We have to get both right before assembling anything.</p>\n<div class=\"article-key-grid article-key-grid--duo\">\n  <section class=\"article-key-card\">\n    <p class=\"article-key-card__eyebrow\">Tensorial detection</p>\n    <h3>Angle is not enough</h3>\n    <p>\n      Treat the problem as tensorial for fully anisotropic media, or for off-axis propagation when\n      <code>angle_rotation</code> is not active.\n    </p>\n    <p>\n      But <code>angle_rotation</code> can rotate the geometry back into an effectively scalar case,\n      so angle alone can over-predict.\n    </p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-key-card__eyebrow\">Complex detection</p>\n    <h3>PML counts too</h3>\n    <p>\n      Treat the matrix as complex if any material has imaginary permittivity, or if any boundary\n      uses PML.\n    </p>\n    <p>\n      PML introduces complex coordinate stretching even when every material is real, so missing it\n      under-predicts.\n    </p>\n  </section>\n</div>\n<p>This logic has to err on the side of safety. A conservative call wastes some capacity; an optimistic one can trigger an OOM and a retry. We checked the metadata-time prediction against the matrix structure the solver actually built at runtime across the full test suite, and treated any under-prediction as a bug.</p>\n<h2>The Calibrated Model</h2>\n<h3>Calibration Harness</h3>\n<p>We built a harness to run hundreds of controlled solves while sweeping:</p>\n<div class=\"article-key-grid article-stat-grid\">\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">7k to 924k</p>\n    <p class=\"article-stat-grid__label\">Grid points</p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">1 to 10</p>\n    <p class=\"article-stat-grid__label\">Modes</p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">1 to 5</p>\n    <p class=\"article-stat-grid__label\">Frequencies</p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">1 to 5</p>\n    <p class=\"article-stat-grid__label\">Pool workers</p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">\n      <code>scipy</code> and <code>MUMPS/PETSc</code>\n    </p>\n    <p class=\"article-stat-grid__label\">Solver backends</p>\n  </section>\n  <section class=\"article-key-card\">\n    <p class=\"article-stat-grid__value\">Lossless, lossy, PML, tensorial</p>\n    <p class=\"article-stat-grid__label\">Material regimes</p>\n  </section>\n</div>\n<p>Each experiment runs in a fresh subprocess so we get a clean resident set size (RSS) measurement for that solve, rather than allocator leftovers from earlier work. We used a separate MPI scaling dataset for larger problems, spanning 366k to 8.9M grid points and 1 to 28 MPI ranks.</p>\n<h3>Model Structure</h3>\n<p>The final estimator has four parts:</p>\n<div class=\"article-process\">\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">01</p>\n    <div>\n      <h3>Fit a base affine model</h3>\n      <p>\n        Start from scalar real cases and fit intercept, per-point slope, and mode-count slope\n        separately for each solver backend.\n      </p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">02</p>\n    <div>\n      <h3>Scale only the size-dependent terms</h3>\n      <p>\n        Apply the regime multiplier to the slopes, not the intercept, so small jobs do not inherit\n        exaggerated startup overhead.\n      </p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">03</p>\n    <div>\n      <h3>Account for pool copy-on-write behavior</h3>\n      <p>\n        Model the first worker separately, then add only the incremental per-worker overhead that\n        survives copy-on-write sharing.\n      </p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">04</p>\n    <div>\n      <h3>Add MPI overhead as a power law</h3>\n      <p>\n        For distributed solves, model extra rank cost separately and use different coefficients for\n        real and complex matrices.\n      </p>\n    </div>\n  </section>\n</div>\n<p>The base model is affine, fitted on scalar real cases:</p>\n<p class=\"article-mini-heading\">Base affine fit</p>\n<pre><code class=\"language-text\">single_solve_gb = intercept + pts_coeff * (pts / 1000) + modes_coeff * num_modes\n</code></pre>\n<p>We fit separate coefficients for each solver backend. The intercept, roughly 1 GB, captures Python and library startup overhead.</p>\n<p>Tensorial and complex problems need a regime scale factor on the memory estimate, but it should apply only to the size-dependent terms. The intercept is startup overhead; it does not change with matrix structure.</p>\n<p>Scaling the whole estimate inflates predictions by an order of magnitude at small sizes, where the intercept dominates, without buying anything at large sizes, where the slope dominates. Scaling only the growing terms kept small solves tighter while preserving headroom where it mattered.</p>\n<p class=\"article-mini-heading\">Regime-aware scaling</p>\n<pre><code class=\"language-text\">scaled_solve = intercept + (pts_coeff * pts / 1000 + modes_coeff * num_modes) * regime_scale\n</code></pre>\n<p>For frequency-parallel execution with a pool of <code>N</code> workers, total memory accounts for copy-on-write sharing:</p>\n<p class=\"article-mini-heading\">Frequency-parallel pool model</p>\n<pre><code class=\"language-text\">first_worker = scaled_solve + accumulation * (freqs_per_worker - 1)\nincremental  = min(per_worker_overhead, first_worker)\ntotal        = first_worker + (N - 1) * incremental + safety_buffer\n</code></pre>\n<p>The <code>accumulation</code> term captures memory growth when one worker handles multiple frequencies sequentially.</p>\n<p>For MPI-parallel execution with <code>K</code> ranks, where frequencies are solved sequentially, the overhead follows a power law:</p>\n<p class=\"article-mini-heading\">MPI overhead model</p>\n<pre><code class=\"language-text\">mpi_overhead = (alpha * pts_millions + beta) * (K - 1)^gamma\ntotal        = scaled_solve + mpi_overhead + accumulation * (nf - 1) + safety_buffer\n</code></pre>\n<p>There are separate <code>(alpha, beta, gamma)</code> coefficients for real versus complex matrices because the distributed solver expands complex matrices to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>4</mn><mi>N</mi><mo>×</mo><mn>4</mn><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">4N \\times 4N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span> real, quadrupling fill-in.</p>\n<h3>Allocator Retention in Forked Workers</h3>\n<p>When Python forks pool workers, the parent's memory is shared copy-on-write, so the cost of adding a worker is much smaller than duplicating a full solve. But the calibration data exposed a second effect: glibc malloc arena accumulation was inflating RSS across sequential solves handled by the same worker. Each frequency solve allocates and frees large sparse arrays, yet the allocator often keeps that memory mapped instead of returning it to the OS. After a few solves, the worker can look materially larger than a fresh process doing the same job. That polluted the calibration data, especially when one worker processed multiple frequencies, and led to scale factors that were too pessimistic.</p>\n<p>Setting <code>maxtasksperchild=1</code> in the multiprocessing pool fixed it: each worker gets recycled after a single task. Re-running calibration produced cleaner data and tighter coefficients. Single-frequency or single-worker cases skip the pool entirely and run in-process.</p>\n<div class=\"article-note\">\n  <p class=\"article-note__label\">Operational change</p>\n  <p>\n    <strong>`maxtasksperchild=1` became part of the runtime policy.</strong> Recycle each worker\n    after one task so allocator arenas do not accumulate across sequential frequencies.\n  </p>\n</div>\n<h3>Indexing Limits in Sparse Factorization</h3>\n<p>Calibration also surfaced a failure that looked like a memory problem but wasn't. At large grid counts in the tensorial regime, the solver crashed with a cryptic error even though the machine still had plenty of RAM left.</p>\n<p>The cause was a 32-bit integer overflow in the sparse factorization's internal indexing. For a tensorial <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>4</mn><mi>N</mi><mo>×</mo><mn>4</mn><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">4N \\times 4N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span> matrix at large <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span>, fill-in scales roughly as <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>O</mi><mo stretchy=\"false\">(</mo><msup><mi>n</mi><mrow><mn>3</mn><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow></msup><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">O(n^{3/2})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.138em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.888em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">3/2</span></span></span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span> and can exceed the 2.15 billion non-zero limit of 32-bit indices. More RAM doesn't help.</p>\n<p>We now validate the grid point count before submission and return a clear error, rather than letting it crash deep inside the factorization.</p>\n<div class=\"article-note article-note--warning\">\n  <p class=\"article-note__label\">Hard limit</p>\n  <p>\n    <strong>This is an indexing ceiling, not a RAM ceiling.</strong> When a solver fails at a\n    suspiciously round problem size, check the index types before adding more RAM.\n  </p>\n</div>\n<p><MemoryScatterFigure\nid=\"calibrated-scatter\"\nvariant=\"calibrated\"\ncaptionTitle=\"Calibrated model.\"\ncaptionBody=\"After accounting for regime scaling, allocator retention, and indexing limits, every point in the calibration set stays on or below the diagonal.\"\n/></p>\n<div class=\"article-note article-note--success\">\n  <p class=\"article-note__label\">Calibration result</p>\n  <p>\n    <strong>Every point sits below the diagonal.</strong> On this dataset, the estimator is\n    conservative everywhere.\n  </p>\n</div>\n<h2>What We Learned</h2>\n<p>The calibrated model removed the cases where the first attempt would have OOMed. A few lessons from this work carry beyond mode solvers:</p>\n<ol class=\"article-takeaways\">\n  <li>\n    <strong>Separate fixed overhead from growth terms.</strong> If your model includes a constant\n    startup cost, scaling the whole expression inflates predictions at small sizes without helping\n    at large sizes. Scale only the terms that actually grow with the input.\n  </li>\n  <li>\n    <strong>Measure in subprocesses.</strong> Long-lived workers accumulate allocator artifacts. If\n    you want clean peak RSS numbers, fork a child, do the work, and measure before it exits. Watch\n    for allocator retention and use <code>maxtasksperchild=1</code> where it matters.\n  </li>\n  <li>\n    <strong>Validate metadata-time predictions against runtime observations.</strong> It is easy to\n    write a detection rule that works for the common case but misses edge cases like angle plus\n    rotation or PML without lossy materials. Asserting that the frontend prediction never\n    under-predicts the backend's actual matrix structure catches these before production.\n  </li>\n  <li>\n    <strong>Not every ceiling is a memory ceiling.</strong> Integer overflow in sparse indexing,\n    32-bit pointer limits, or process-count limits can all masquerade as OOM. When a solver fails at\n    a suspiciously round problem size, check the index types before adding more RAM.\n  </li>\n</ol>\n<p>Get the memory estimate wrong, and you pay for it in retries, latency, or wasted RAM.</p>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/mode-solver-memory-calibration.md",
          "mime_type": "text/markdown",
          "title": "Predicting Peak Memory for an Electromagnetic Mode Solver markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/mode-solver-memory-calibration.png",
          "mime_type": "image/png",
          "title": "Predicting Peak Memory for an Electromagnetic Mode Solver social image"
        }
      ],
      "_flexcompute": {
        "kind": "Case Study",
        "tags": [
          "photonics",
          "tidy3d",
          "verification"
        ],
        "markdown_url": "https://engineering.flexcompute.com/articles/mode-solver-memory-calibration.md"
      }
    },
    {
      "id": "https://engineering.flexcompute.com/articles/autoresearch-photonic-design/",
      "url": "https://engineering.flexcompute.com/articles/autoresearch-photonic-design/",
      "title": "Can AI Agents Autonomously Design Components on Photonic Chips?",
      "summary": "We gave AI agents a photonic simulator, a DRC engine, and four design challenges. They autonomously designed waveguide bends, crossings, splitters, and demultiplexers — some reaching near-perfect performance.",
      "image": "https://engineering.flexcompute.com/images/og/autoresearch-photonic-design.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/autoresearch-photonic-design.png",
      "date_published": "2026-04-13T00:00:00.000Z",
      "date_modified": "2026-04-13T00:00:00.000Z",
      "authors": [
        {
          "name": "Tyler Hughes",
          "path": "/authors/tyler-hughes/",
          "url": "https://engineering.flexcompute.com/authors/tyler-hughes/"
        }
      ],
      "tags": [
        "Photonics",
        "Inverse Design",
        "Optimization",
        "AI Agents"
      ],
      "content_html": "<p>import DesignLoopFigure from '../../components/DesignLoopFigure.astro';</p>\n<p>Inspired by the recent hype around <a href=\"https://github.com/karpathy/autoresearch\">autoresearch</a> loops with AI agents, I decided to explore a simple question:</p>\n<blockquote>\n<p>If you give agents access to a well-defined photonic component design problem, a local electromagnetic simulator, and a fabrication constraint checker, could they reliably design photonic devices on their own, just by iterating on geometries?</p>\n</blockquote>\n<p>To my surprise, the answer was a resounding yes! The agents were able to create a handful of common photonic components, with designs that met our performance criteria and were simple and intuitive. There were some caveats and surprising results though, which we'll get into later. For now, let's dive into the problem setup and show what the agents came up with.</p>\n<h2>Background</h2>\n<p>Photonic integrated circuits route light through structures etched on a silicon chip, much like electronic circuits route current through wires. Light can carry far more information than electronic signals and with very low power loss. As a result, photonic technologies are playing an increasingly important role in improving performance of computing platforms for AI and other applications. These photonic devices are composed of many components, such as waveguides, splitters, filters, etc., which help guide the light around the chip. Engineering these components is hard because, unlike with electrons, the wave nature of light needs to be accounted for in the design. Even small geometry changes can scatter light, create unwanted reflections, or couple power into the wrong mode.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img src=\"https://engineering.flexcompute.com/images/autoresearch/pic_chip.png\" alt=\"Photonic integrated circuit\" />\n</figure>\n<p>A standard approach is to use a simulator for Maxwell's equations to model and optimize the device. One such algorithm is <strong>FDFD</strong> (finite-difference frequency-domain), which discretizes Maxwell's equations on a grid and solves for the steady-state field pattern at a given frequency. You define the geometry and materials, pick a wavelength, launch light into the simulation, and measure what comes out the other end. The figure of merit <strong>(FoM)</strong> is typically a <strong>mode overlap</strong>, which measures how much of the output light matches the desired profile on the other side of the device.</p>\n<p>One can use simulation combined with optimization techniques like gradient descent or parameter sweeps to come up with a candidate device. But a device that looks great in simulation may be impossible to fabricate since these structures are so small (typically features are measured on the tens of nanometers scale). Foundries impose <strong>design rule checks (DRC)</strong>: minimum feature width, minimum gap between features, minimum area, no tiny holes. A design that scores well in simulation but violates DRC is useless. In our case, we chose a minimum feature size of 300 nm — more relaxed than typical foundry PDKs, which often enforce 150–200 nm, but strict enough to impose meaningful constraints.</p>\n<blockquote>\n<p>Therefore, the real problem for any photonic device designer is: find a geometry that maximizes the photonic objective while passing foundry design rule checks!</p>\n</blockquote>\n<h2>Problem Statement</h2>\n<p>We wanted to explore whether we could automate this design process, with a simulator, a DRC engine, and a few well-defined design problems.</p>\n<p>For a given design problem, each agent got the same interface:</p>\n<ol>\n<li>They can <strong>submit designs as a list of geometric objects</strong>. These can be anything (polygons, pixelated grids, etc.).</li>\n<li>We compute the scalar <strong>figure of merit</strong> for this submission using a fast, local electromagnetic solver.</li>\n<li>We check whether this geometry passes a basic <strong>design rule check</strong> (DRC).</li>\n<li>We make sure all geometries provided are <strong>within a \"design region\"</strong>.</li>\n</ol>\n<p>Given the figure of merit and acceptance criteria, we evaluate the submission and instruct the agent to continue iterating until a suitable design is found. The challenge is defined in a repository that makes it reusable: a challenge wrapper, evaluator, autoresearch harness, and leaderboard. The whole setup was designed so that any agent can enter this directory, read the prompt, and begin submitting solutions. As the agents work, they create visual artifacts: geometry visualizations, electromagnetic field profiles. The agents can see each other's previous submissions, in fact they often used them!</p>\n<DesignLoopFigure />\n<p>Let's walk through what happened. We ran two agents, Claude Opus 4.6 and OpenAI GPT 5.4, on four challenges of increasing difficulty — from a simple waveguide bend up to a three-channel wavelength demultiplexer. Both agents could see each other's submissions.</p>\n<h2>Experiment 1: 90-Degree Bend</h2>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/sketch_bend.png\"\n    alt=\"Bend challenge setup: horizontal input waveguide, 3×3 μm design region, vertical output waveguide\"\n  />\n</figure>\n<p>We often need to steer light around on photonic circuits, so having the ability to bend light on chip is a critical functionality. However, if you make such a bend too sharp, the light will tend to leak out.</p>\n<p><strong>Problem</strong>: Route light from a horizontal waveguide to a vertical waveguide through a 90-degree turn in a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3.0</mn><mo>×</mo><mn>3.0</mn><mtext> </mtext><mi>μ</mi><mtext>m</mtext></mrow><annotation encoding=\"application/x-tex\">3.0 \\times 3.0\\,\\mu\\text{m}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">3.0</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">3.0</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">μ</span><span class=\"mord text\"><span class=\"mord\">m</span></span></span></span></span> design region.</p>\n<p><strong>FoM</strong>: What fraction of light makes it to the vertical output waveguide? Perfect score = 1.0.</p>\n<p><strong>Starting point</strong>: A naive, circular arc bend with matching sections and a corner pad, FoM = 0.904.</p>\n<h3>Design Process</h3>\n<p>The agents first performed a general test of several different candidate geometries. Claude tested 14 diverse geometries (Euler bends, corner mirrors, MMI regions (\"multi-mode interference\" regions), chamfered L-bends) and noticed that simply widening the arc from 0.50 to 0.70 μm beat everything else. GPT arrived at the same conclusion through a more incremental path, starting by tuning the size of rectangular pads at the bend junctions and progressively stripping away decorative geometry until we were left with a simple bend again with a variable width.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/progression_gif_waveguide_bend.gif\"\n    alt=\"Animated progression of bend designs submitted by the agents\"\n  />\n  <figcaption>\n    <strong>Bend design progression.</strong> Both agents converge from diverse initial geometries\n    toward a simple variable-width arc.\n  </figcaption>\n</figure>\n<p>With the basic strategy in place, both agents naturally progressed to a second stage: fine tuning the geometry through parameter sweep. Both agents independently swept the width of the bend and found that 0.75–0.80 μm was the sweet spot for a uniform-width arc with a corner pad. But things got interesting as they tried pushing further and ran into constraints.</p>\n<p><strong>DRC Violation:</strong> Wider arcs without the pad gave raw FoM up to 0.977, but failed DRC because the gap between the wider arc and the 0.50 μm arm waveguides at the junction dropped below 300 nm. Both agents hit this wall and both found the same way around it.</p>\n<p><strong>Solution - Variable-width arcs:</strong> Instead of a uniform-width arc, the agents decided to use one that starts at the arm width (0.50 μm) at the ports and smoothly widens at the bend apex. This solves two problems simultaneously: no DRC gap at the ports (width matches the arms exactly) and reduced radiation at the apex (wider waveguide confines the mode better).</p>\n<p>Claude used a sinusoidal width profile and found the optimal apex width at 0.65 μm. GPT used a power-law profile with a tunable exponent and found a similar optimum at 0.648 μm with exponent 0.79. Both reached FoM ≈ 1.0 — effectively perfect transmission in the 2D single-wavelength simulation.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/blog_width_sweeps.png\"\n    alt=\"Width sweep results showing FoM vs. arc width for both agents\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Width sweep results.</strong> Both agents independently find that a variable-width arc\n    near 0.65 μm apex width maximizes transmission.\n  </figcaption>\n</figure>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/bend_champion_report.png\"\n    alt=\"Winning bend design: geometry and simulated field profile\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Winning bend.</strong> Claude's variable-width arc achieves FoM ≈ 1.0 with a single\n    polygon.\n  </figcaption>\n</figure>\n<h3>Result</h3>\n<p>Both agents achieved <strong>FoM ≈ 1.0</strong> with a single bare polygon. The winning design family is a variable-width quarter-circle arc: <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>w</mi><mo stretchy=\"false\">(</mo><mi>θ</mi><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>w</mi><mtext>port</mtext></msub><mo>+</mo><mi mathvariant=\"normal\">Δ</mi><mi>w</mi><mo>⋅</mo><mi>f</mi><mo stretchy=\"false\">(</mo><mi>θ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">w(\\theta) = w_\\text{port} + \\Delta w \\cdot f(\\theta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">port</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"mclose\">)</span></span></span></span> where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi></mrow><annotation encoding=\"application/x-tex\">f</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span></span></span></span> peaks at the 45-degree apex and returns to zero at the ports.</p>\n<blockquote>\n<p>And just to reiterate: this was all done 100% autonomously! The agents came up with the candidate geometries, swept parameters, and corrected the DRC violations all without intervention.</p>\n</blockquote>\n<p>This was about as simple as it gets for a design problem, so the result is interesting but not groundbreaking: the agents found a good parameterization through experimentation and then swept parameters to optimize. Next we tried some harder problems.</p>\n<h2>Experiment 2: Waveguide Crossing</h2>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/sketch_crossing.png\"\n    alt=\"Crossing challenge setup: two orthogonal waveguides intersecting in a 3×3 μm design region\"\n  />\n</figure>\n<p>Often when designing complex circuits, we need waveguides to cross so we can connect components together. Since this is all happening on a single plane, we need crossing components that allow the light to pass through without any power leaking into the perpendicular section.</p>\n<p><strong>Problem</strong>: Pass light straight through a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3.0</mn><mo>×</mo><mn>3.0</mn><mtext> </mtext><mi>μ</mi><mtext>m</mtext></mrow><annotation encoding=\"application/x-tex\">3.0 \\times 3.0\\,\\mu\\text{m}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">3.0</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">3.0</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">μ</span><span class=\"mord text\"><span class=\"mord\">m</span></span></span></span></span> intersection where two orthogonal waveguides cross, minimizing crosstalk into the perpendicular waveguide. The design should have 90-degree rotational symmetry.</p>\n<p><strong>FoM</strong>: <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0.5</mn><mo stretchy=\"false\">(</mo><msub><mi>T</mi><mi>h</mi></msub><mo>+</mo><msub><mi>T</mi><mi>v</mi></msub><mo stretchy=\"false\">)</mo><mo>−</mo><mn>0.25</mn><mo stretchy=\"false\">(</mo><mi>X</mi><msub><mi>T</mi><mi>h</mi></msub><mo>+</mo><mi>X</mi><msub><mi>T</mi><mi>v</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">0.5(T_h + T_v) - 0.25(XT_h + XT_v)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">0.5</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">v</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">0.25</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">v</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>. Basically balancing through transmission and adding an explicit extra penalty for crosstalk. Perfect score = 1.0.</p>\n<p><strong>Starting point</strong>: A simple plus-shaped cross (two perpendicular waveguides), FoM = 0.770.</p>\n<h3>Design Process</h3>\n<p>Claude tested circles, diamonds, ellipses, MMI regions, tapered transitions, and variable-width crosses. Every expansion at the intersection made things worse: circle (0.375), diamond (0.553), ellipse (0.517), square MMI (0.623).</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/progression_gif_waveguide_crossing.gif\"\n    alt=\"Animated progression of crossing designs submitted by the agents\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Crossing design progression.</strong> The agents explore a wide range of intersection\n    geometries before converging on a widened cross.\n  </figcaption>\n</figure>\n<p>GPT started by adding a wider central cross and found that this solution worked well off the bat. After it tuned parameters, the optimal uniform width was found to be right around 0.75 μm (FoM = 0.962).</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/blog_crossing_sweep.png\"\n    alt=\"Crossing width sweep showing sharp FoM peak at 0.75 μm\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Crossing width sweep.</strong> The sharp peak at 0.75 μm reveals a self-imaging\n    condition — small geometry changes degrade performance quickly.\n  </figcaption>\n</figure>\n<p>The sweep curve shows that the device is quite sensitive: 0.74 or 0.76 drops the FoM by 0.003. From looking at the field plots, we can see that this is because of a self-imaging condition in the center intersection region: the width is chosen so that the light in the junction interferes constructively at the output after crossing the perpendicular waveguide. This interference is inherently sensitive to any changes in the geometry, so it's not a great design.</p>\n<p>After some more prodding, GPT then found an improvement: adding another ~0.95 μm square dielectric pad at the center of a 0.74 μm cross (FoM = 0.968). Claude independently converged to the same design family, achieving 0.968 with a Gaussian-flared cross and a 0.96 μm pad. The two agents tied within solver noise.</p>\n<h3>Result</h3>\n<p><strong>FoM = 0.968</strong> (both agents). The optimal crossing is a 0.74 μm wide cross with a square center pad. Unlike the bend, variable-width (polygon) approaches didn't help — the crossing relies on a self-imaging condition that requires a well-defined multimode section. The center pad is the only refinement that provides a measurable improvement.</p>\n<p>{/* Winning crossing report image not included in source zip */}</p>\n<blockquote>\n<p>The takeaway from this: the crossing problem is almost entirely solved by only three parameters: the sizes of the rectangles in the design region. This shows that devices can work well with just a small number of good knobs and disciplined iteration — and the agents figured that out on their own.</p>\n</blockquote>\n<h2>Experiment 3: 1×2 Splitter</h2>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/sketch_splitter.png\"\n    alt=\"Splitter challenge setup: one input waveguide splitting into two outputs separated by 2.0 μm\"\n  />\n</figure>\n<p>It is important to be able to split the power in a waveguide into two or more arms so we can do routing or interfere the light with itself later on. A 1×2 splitter is a very common photonic component that seeks to do a 50-50 split with very low loss.</p>\n<p><strong>Problem</strong>: Split light from one input waveguide equally into two output waveguides separated by 2.0 μm, in a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>4.0</mn><mo>×</mo><mn>4.5</mn><mtext> </mtext><mi>μ</mi><mtext>m</mtext></mrow><annotation encoding=\"application/x-tex\">4.0 \\times 4.5\\,\\mu\\text{m}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">4.0</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">4.5</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">μ</span><span class=\"mord text\"><span class=\"mord\">m</span></span></span></span></span> design region.</p>\n<p><strong>FoM</strong>: <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><msqrt><mrow><msub><mi>T</mi><mtext>top</mtext></msub><mo>⋅</mo><msub><mi>T</mi><mtext>bottom</mtext></msub></mrow></msqrt></mrow><annotation encoding=\"application/x-tex\">2\\sqrt{T_\\text{top} \\cdot T_\\text{bottom}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.24em;vertical-align:-0.3564em;\"></span><span class=\"mord\">2</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8836em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">top</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">bottom</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8436em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewBox=\"0 0 400000 1296\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3564em;\"><span></span></span></span></span></span></span></span></span>, rewarding balanced 50/50 splitting via the geometric mean. Perfect score = 1.0.</p>\n<p><strong>Starting point</strong>: A big rectangle filling the design region, FoM = 0.507.</p>\n<h3>Design Process</h3>\n<p>This was the hardest challenge of the three (so far):</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/progression_gif_splitter.gif\"\n    alt=\"Animated progression of splitter designs submitted by the agents\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Splitter design progression.</strong> The agents iterate through Y-branches, separated\n    arms, and tapered designs.\n  </figcaption>\n</figure>\n<p>Both agents immediately tried Y-branches: simply taper the input section into two curved output arms. Claude's smooth Y-branch with cubic Hermite S-curves got FoM = 0.966, nearly perfect. GPT's initial Y-branches were similarly strong. <strong>But they all failed DRC</strong>. The two arms at the split point have zero gap, violating the 300 nm minimum.</p>\n<p>The agents then struggled to fix the DRC violations; they both pre-separated the arms by 300+ nm at the split point and added tapers to feed both arms. The forced gap scattered light because the single-lobed taper mode didn't match the two-lobed arm mode. Claude's best separated Y reached FoM = 0.812 after sweeping the split point position.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/blog_splitter_sweep.png\"\n    alt=\"Splitter FoM vs. split point position\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Split-point sweep.</strong> Claude's best DRC-clean separated Y reaches FoM = 0.812 at\n    the optimal split position.\n  </figcaption>\n</figure>\n<p>GPT continued iterating on the separated Y family and discovered a parameterization that Claude missed. After the split, it kept the arms wide for a short section before tapering them back to the output width. This let the wider arms better capture the splitting mode before narrowing down. GPT also tuned the arm width independently (0.57 μm instead of 0.50), brought the gap size down to the minimum of 300 nm, and added a width-exponent parameter controlling how aggressively the arms taper.</p>\n<p>The result was FoM = 0.906, which was a significant jump past Claude's 0.812 and into territory that Claude couldn't reach with the simpler parameterization.</p>\n<p>{/* GPT hold-and-taper report image not included in source zip */}</p>\n<p>Claude also tried:</p>\n<ol>\n<li>MMI splitters: a 2D sweep of 42 width/length combinations found FoM = 0.963 but failed DRC</li>\n<li>Directional couplers (0.221)</li>\n<li>Pixelated shapes (0.676) using adjoint optimization — though this likely reflects a limited optimization setup rather than a fundamental limitation of adjoint methods.</li>\n</ol>\n<p>But none of these matched GPT's result.</p>\n<h3>Result</h3>\n<p><strong>FoM = 0.906</strong> (GPT's hold-and-taper Y-branch) was the best DRC-clean result. Claude's best was 0.812 with a simpler parameterization. The raw (DRC-violating) Y-branch reached 0.966 for both agents. The 300 nm DRC gap constraint at the split junction was a major challenge.</p>\n<h3>Lessons</h3>\n<p>The splitter was fundamentally different from the bend and crossing. Both the bend and crossing were single-input single-output problems where light stays in one continuous structure. The splitter requires light to transition from one waveguide to two separate waveguides, and DRC forces a minimum gap at that transition.</p>\n<p>The gap between DRC-clean and DRC-free performance (0.906 vs. 0.966) was where the real engineering challenge lives. GPT's hold-and-taper design showed that there's still room for better parameterizations: the splitter rewarded more knobs than the bend or crossing did. This is the device where the agents' search strategies diverged the most, and where GPT's more incremental, parameter-by-parameter approach paid off.</p>\n<h2>Experiment 4: 3-Channel Wavelength-Division Demultiplexer</h2>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/sketch_demux.png\"\n    alt=\"Demux challenge setup: one input, three wavelength-selective output ports, 10×10 μm design region\"\n  />\n</figure>\n<p>The bend, crossing, and splitter were all single-wavelength, single-mode problems where the agent proposes a shape, gets a score, and iterates. They could be solved by parametric sweeps in low-dimensional design spaces where one or two dominant parameters (bend width, crossing width, split position) determine most of the performance.</p>\n<p>Next, we tried a more challenging problem involving wavelength selectivity, where the conventional geometric approaches failed miserably.</p>\n<p>Many photonic devices end up splitting information into different wavelength channels (wavelength division multiplexing), which allows them to process more information in parallel. Being able to split and recombine these wavelengths is a major challenge. In this design challenge, we had a single input containing light with a mix of three different wavelengths (1.45, 1.55, 1.65 μm) and tried to design a single device to direct each of these wavelengths to one of three separate output ports (top, center, bottom) through a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10.0</mn><mo>×</mo><mn>10.0</mn><mtext> </mtext><mi>μ</mi><mtext>m</mtext></mrow><annotation encoding=\"application/x-tex\">10.0 \\times 10.0\\,\\mu\\text{m}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">10.0</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">10.0</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">μ</span><span class=\"mord text\"><span class=\"mord\">m</span></span></span></span></span> design region.</p>\n<p><strong>FoM</strong>: Average normalized transmission into the correct output port across all three channels. Perfect score = 1.0 (each wavelength fully routed to its target).</p>\n<h3>Design Process</h3>\n<p>Every parametric approach, whether proposed by the agents or designed by hand, scored below 0.5:</p>\n<ul>\n<li>Simple MMI box: 0.016 (splits power but doesn't route by wavelength)</li>\n<li>3-way Y-branch: 0.24 (splits equally, no selectivity)</li>\n<li>Direct connect (straight + S-bends): 0.315 (good through, poor drops)</li>\n<li>Offset MMI, asymmetric tapers, random structures: all below 0.45</li>\n</ul>\n<p>The approach that (somewhat) worked was to break the design region into a grid of pixels, each with a material density between 0 and 1. Think of this as similar to a \"material density\" image. Starting from uniform grey (0.5), gradient-based optimization iterated on the pixels for 60 steps of Adam optimization and reached FoM = 0.935, an approach similar to training ML models. The optimizer found a complex density pattern that creates different interference pathways for each wavelength, routing each to its correct output port.</p>\n<blockquote>\n<p>But this design did not pass DRC. The pixel-based structure contains sub-wavelength features, tiny gaps, and isolated islands that no foundry would accept. And it was not binarized properly.</p>\n</blockquote>\n<figure class=\"article-figure\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/autoresearch/blog_demux_field_matrix.png\"\n    alt=\"WDM demux simulation reports showing field profiles for each wavelength channel\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Demux simulation reports.</strong> The pixel-based density optimization routes each\n    wavelength to its target port, but the resulting structure violates DRC.\n  </figcaption>\n</figure>\n<p>We'll explore this approach in a follow-up blog. But the takeaway was that the demultiplexer seemed to be a problem too challenging for the agents to tackle while passing DRC at this stage.</p>\n<h2>Summary</h2>\n<table>\n<thead>\n<tr>\n<th>Challenge</th>\n<th>Baseline</th>\n<th>Geometric Parameters</th>\n<th>Pixel-Based Density</th>\n<th>DRC</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Bend</td>\n<td>0.904</td>\n<td><strong>≈1.0</strong></td>\n<td>N/A</td>\n<td>PASS</td>\n</tr>\n<tr>\n<td>Crossing</td>\n<td>0.770</td>\n<td><strong>0.968</strong></td>\n<td>N/A</td>\n<td>PASS</td>\n</tr>\n<tr>\n<td>Splitter</td>\n<td>0.507</td>\n<td><strong>0.906</strong></td>\n<td>N/A</td>\n<td>PASS</td>\n</tr>\n<tr>\n<td>3-ch WDM demux</td>\n<td>0.016</td>\n<td>0.315</td>\n<td><strong>0.935</strong></td>\n<td>FAIL</td>\n</tr>\n</tbody>\n</table>\n<h2>Takeaways</h2>\n<h3>Both agents converge on the same physics (with a caveat)</h3>\n<p>For the bend and crossing, Claude and GPT arrived at the same optimal design families. The variable-width arc for the bend and the widened cross for the crossing exploit properties of the physics that a photonic designer could understand.</p>\n<p>However, an important caveat: the agents could see each other's previous submissions throughout the challenge, and they used this. I found Claude regularly taking inspiration from GPT's designs and proposing variations on them. So the convergence is not purely independent. The shared leaderboard acted as a forum for them to communicate. The fact that both agents ended up at the same designs is encouraging, but it's weaker evidence than if they had converged in complete isolation. A cleaner experiment would run each agent in a separate sandbox with no visibility into the other's submissions.</p>\n<h3>Parameterization matters far more than search strategy</h3>\n<p>The splitter was the exception. GPT's hold-and-taper parameterization (six tunable parameters: split position, gap, arm width, center exponent, hold fraction, width exponent) explored a richer design space than Claude's three-parameter separated Y. The extra degrees of freedom turned out to matter: they let the optimizer find designs that better balance the physics of mode splitting against the DRC gap constraint.</p>\n<h3>DRC is the binding constraint</h3>\n<p>In all three challenges, the best raw designs (ignoring DRC) significantly outperformed the best DRC-clean designs. The fabrication constraint is what limits performance. Both agents spent more iterations working around DRC failures than optimizing the actual photonics. A lot of the failed designs were not bad ideas, they were ideas that did not survive the interaction between wave physics and fabrication constraints. It is important therefore to consider a systematic way to incorporate DRC validity into the geometry generation process.</p>\n<h3>Simple designs win (for easy problems)</h3>\n<p>The bend champion is only one polygon. The crossing champion is two rectangles. The splitter champion is three polygons. Every attempt at complex multi-structure designs (corner mirrors, MMI regions, directional couplers, resonant cavities) performed worse than simple, well-tuned geometries. For simple problems, the best solutions were not inventive shapes but rather simple parameterizations.</p>\n<h3>Simple designs fail (for hard problems)</h3>\n<p>On the simpler problems (bend, crossing), hand design with parametric sweeps found near-optimal solutions. But on the harder problem, it was insufficient. Exploring the full space of pixel-based density optimization seemed to be the more promising path forward here. To be fair, the size of the design region is one variable where we could play with to increase the possible design degrees of freedom for simple parameterizations.</p>\n<h3>Where agentic design works and where it doesn't (yet)</h3>\n<p>The clearest pattern from these experiments: agents excel at low-dimensional parametric search with fast, scalar feedback. Give them a figure of merit, a few geometric knobs, and DRC guardrails, and they will systematically explore the space and do a pretty good job.</p>\n<p>But the demux shows the boundary. As problems get harder, agents need exposure to more advanced parameterizations and techniques based on pixel-based optimization.</p>\n<blockquote>\n<p>The interesting frontier for agent-based exploration may be between these two extremes. For example, allowing agents to systematically explore different parameterizations for problems that have perhaps a handful of design degrees of freedom. Too hard for a human designer to solve but also not too hard that it requires full pixel-based optimization. <strong>Suggestion: use autoresearch loops to automate the sweep and exploration process for photonic device design.</strong></p>\n</blockquote>\n<p>Can agents learn to guide topology optimization and post-process the results into manufacturable designs? That's the next question we want to answer.</p>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/autoresearch-photonic-design.md",
          "mime_type": "text/markdown",
          "title": "Can AI Agents Autonomously Design Components on Photonic Chips? markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/autoresearch-photonic-design.png",
          "mime_type": "image/png",
          "title": "Can AI Agents Autonomously Design Components on Photonic Chips? social image"
        }
      ],
      "_flexcompute": {
        "kind": "Essay",
        "tags": [
          "photonics",
          "inverse-design",
          "optimization",
          "agents"
        ],
        "markdown_url": "https://engineering.flexcompute.com/articles/autoresearch-photonic-design.md"
      }
    },
    {
      "id": "https://engineering.flexcompute.com/articles/photonic-inverse-design-45-lines/",
      "url": "https://engineering.flexcompute.com/articles/photonic-inverse-design-45-lines/",
      "title": "Designing a Photonic Chip Component with ~45 Lines of Python",
      "summary": "A compact introduction to photonic inverse design with Tidy3D, using a pre-built simulation and a ~45-line optimization loop.",
      "image": "https://engineering.flexcompute.com/images/og/photonic-inverse-design-45-lines.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/photonic-inverse-design-45-lines.png",
      "date_published": "2026-03-05T00:00:00.000Z",
      "date_modified": "2026-03-05T00:00:00.000Z",
      "authors": [
        {
          "name": "Tyler Hughes",
          "path": "/authors/tyler-hughes/",
          "url": "https://engineering.flexcompute.com/authors/tyler-hughes/"
        }
      ],
      "tags": [
        "Photonics",
        "Inverse Design",
        "Optimization",
        "Tidy3D"
      ],
      "content_html": "<p><strong>Photonic chips</strong> guide light through tiny waveguides etched into silicon, much like electrical wires carry current on a circuit board. These chips are increasingly important for high-speed data links, sensing, and quantum computing. Routing light around corners is surprisingly hard: a smooth 90-degree waveguide bend often needs a radius of several microns to keep loss low, and that takes up valuable chip real estate. What if a computer could design a structure that makes the bend in a fraction of the space?</p>\n<p>That's the idea behind <strong>inverse design</strong>: instead of designing a device and checking if it works, you specify <em>what</em> you want and let an algorithm figure out the geometry, pixel by pixel, using the same gradient-based methods that train neural networks.</p>\n<p>The idea isn't new. Structural engineers have used topology optimization to design bridges and aircraft parts since the 1980s, and <a href=\"https://link.springer.com/article/10.1007/s001580050176\">Sigmund's \"99 line topology optimization code\"</a> showed the core algorithm fits in a single MATLAB script. This post does the same for photonic inverse design: once the base simulation is given, the core optimization loop fits in <strong>~45 lines of Python</strong>.</p>\n<p>Let's build it.</p>\n<div class=\"article-overview\">\n  <p class=\"article-overview__eyebrow\">At a glance</p>\n  <div class=\"article-overview__grid\">\n    <section>\n      <h3>Goal</h3>\n      <p>Route 1.0 μm light around a 90-degree bend inside a 3x3 μm design region.</p>\n    </section>\n    <section>\n      <h3>Method</h3>\n      <p>Use Tidy3D's adjoint gradients to optimize a pixelized material layout with Adam.</p>\n    </section>\n    <section>\n      <h3>Outcome</h3>\n      <p>\n        In ten iterations, the design routes roughly 89% of the power into the desired output mode.\n      </p>\n    </section>\n  </div>\n</div>\n<h2>The Problem: Bending Light on a Chip</h2>\n<p>Light travels through a <strong>waveguide</strong>, a thin strip of high-refractive-index material (like silicon) surrounded by a lower-index material (like air). The light is confined to the strip by total internal reflection, similar to how fiber optics work.</p>\n<p>We want to route light at wavelength 1.0 μm around a <strong>90-degree corner</strong>. It enters horizontally from the left and must exit vertically downward. Between input and output sits a <strong>design region</strong>, a 3x3 μm square where the optimizer can freely place or remove material. The question is: <em>what pattern maximally routes the light from input to output?</em></p>\n<p>To simulate how light propagates through a given geometry, we use <a href=\"https://www.flexcompute.com/tidy3d/\">Tidy3D</a>, a cloud-based electromagnetic solver. Given a device geometry and material properties, Tidy3D solves Maxwell's equations and tells us where the light goes. Crucially, Tidy3D exposes an <a href=\"https://github.com/HIPS/autograd\">autograd</a>-based inverse-design workflow, which lets us compute gradients through the simulation (more on this in Step 3).</p>\n<p>The base simulation (waveguides, light source, output monitor, and absorbing boundary conditions) is pre-built and stored in <code>sim_base.yaml</code>. We load it and focus entirely on the optimization algorithm.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/simulation-setup.png\"\n    alt=\"The simulation setup. Light enters from the left through a horizontal waveguide and should exit downward through a vertical waveguide. The dashed box is the design region where we will optimize the material layout.\"\n  />\n  <figcaption>\n    <strong>Simulation setup.</strong> Light enters from the left through a horizontal waveguide and\n    should exit downward through a vertical waveguide. The dashed box is the design region where we\n    will optimize the material layout.\n  </figcaption>\n</figure>\n<pre><code class=\"language-python\">import autograd\nimport autograd.numpy as np\nimport tidy3d as td\nimport tidy3d.web as web\nfrom tidy3d.plugins.autograd import make_filter_and_project\n\nsim_base = td.Simulation.from_file(\"sim_base.yaml\")\n</code></pre>\n<h2>Step 1: From Design Variables to Simulation</h2>\n<p>We need a function that maps a set of <strong>design variables</strong> to a complete electromagnetic simulation. Each pixel in the design region gets a variable <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span>, a number between 0 and 1. We then convert <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span> to a <strong>permittivity</strong> value. Permittivity is the square of the refractive index and controls how light interacts with the material. Our material has refractive index <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo>=</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">n = 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span>, so its permittivity is <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>n</mi><mn>2</mn></msup><mo>=</mo><mn>4</mn></mrow><annotation encoding=\"application/x-tex\">n^2 = 4</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">4</span></span></span></span>. Air has permittivity 1.</p>\n<p>But we don't map <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span> to permittivity directly. Two transformations happen first:</p>\n<h3>Density filter</h3>\n<p>A <strong>convolutional filter</strong> with radius <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi></mrow><annotation encoding=\"application/x-tex\">R</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span></span></span></span> blurs each pixel's value with its neighbors. This is important because these devices will eventually be fabricated, and real manufacturing processes have a <strong>minimum feature size</strong> they can reliably produce. The filter ensures no feature in our design is smaller than <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi></mrow><annotation encoding=\"application/x-tex\">R</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span></span></span></span>, acting as a simple proxy for more sophisticated fabrication-aware design checks. We use <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi><mo>=</mo><mn>150</mn><mtext> </mtext><mtext>nm</mtext></mrow><annotation encoding=\"application/x-tex\">R = 150\\,\\text{nm}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">150</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord text\"><span class=\"mord\">nm</span></span></span></span></span>.</p>\n<h3>Tanh projection</h3>\n<p>After filtering, a <strong>tanh function</strong> pushes the smoothed values toward 0 or 1, controlled by a sharpness parameter <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span>. At low <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span>, the mapping is nearly linear, so the optimizer can explore intermediate values freely. At high <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span>, it becomes a hard threshold that forces every pixel to be pure material or pure air.</p>\n<p>The figure below shows the effect of these two transformations applied to random noise. Filtering smooths out fine features; projection pushes values toward binary. When applied during optimization, they guide the optimizer toward clean, fabricable geometries.</p>\n<figure class=\"article-figure article-figure--medium\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/filter-projection.png\"\n    alt=\"Effect of density filter (columns) and tanh projection (rows) applied to random noise. Without filtering (left), arbitrarily small features remain. The filter with R = 150 nm enforces a minimum feature size (right). Without projection (top), intermediate values persist. High beta projection pushes toward binary material and air (bottom).\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Filtering and projection.</strong> Filtering suppresses arbitrarily small features,\n    while increasing beta pushes the design toward a binary material-air pattern. We use both\n    together during optimization.\n  </figcaption>\n</figure>\n<pre><code class=\"language-python\">n_mat = 2.0                    # material refractive index\neps_mat = n_mat ** 2           # permittivity = n^2 = 4.0\ndesign_size = 3.0              # design region side length (um)\npixel_size = 1.0 / 50          # pixel resolution (um)\nradius = 0.150                 # filter radius R (um), sets minimum feature size\nnx = ny = int(design_size / pixel_size)\n\ndesign_region_geo = td.Box(center=(0, 0, 0), size=(design_size, design_size, td.inf))\n\nfilter_project = make_filter_and_project(radius=radius, dl=pixel_size)\n</code></pre>\n<p>Now we construct the function that takes our design parameters <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span>, applies the filter and projection to get a permittivity map, builds a structure from it, and adds it to the base simulation we loaded from file.</p>\n<pre><code class=\"language-python\">def make_sim(params, beta):\n    \"\"\"Map design variables through filter, projection, and into a simulation.\"\"\"\n    density = filter_project(params, beta=beta)\n    eps_data = 1.0 + (eps_mat - 1.0) * density\n    structure = td.Structure.from_permittivity_array(\n        eps_data=eps_data, geometry=design_region_geo,\n    )\n    return sim_base.updated_copy(\n        structures=list(sim_base.structures) + [structure],\n    )\n</code></pre>\n<h2>Step 2: Objective Function</h2>\n<p>We need a single number that tells us how well the device works. At the output waveguide, Tidy3D measures the <strong>mode amplitude</strong> <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>a</mi></mrow><annotation encoding=\"application/x-tex\">a</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">a</span></span></span></span>, a complex number describing how much light couples into the waveguide's guided mode. The power carried by that mode is <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">∣</mi><mi>a</mi><msup><mi mathvariant=\"normal\">∣</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">|a|^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em;\"></span><span class=\"mord\">∣</span><span class=\"mord mathnormal\">a</span><span class=\"mord\"><span class=\"mord\">∣</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span>, so our figure of merit is simply the output mode power:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">O</mi><mi mathvariant=\"normal\">M</mi></mrow><mo>=</mo><mi mathvariant=\"normal\">∣</mi><mi>a</mi><msup><mi mathvariant=\"normal\">∣</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">\\mathrm{FOM} = |a|^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FOM</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1141em;vertical-align:-0.25em;\"></span><span class=\"mord\">∣</span><span class=\"mord mathnormal\">a</span><span class=\"mord\"><span class=\"mord\">∣</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span>\n<p>A perfect device would have <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">O</mi><mi mathvariant=\"normal\">M</mi></mrow><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\mathrm{FOM} = 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FOM</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> (all input power reaching the output). In code: we build a simulation from our design variables, run it on the cloud, extract the mode amplitude, and return the power.</p>\n<pre><code class=\"language-python\">def objective(params, beta):\n    \"\"\"Run electromagnetic simulation and return output mode power.\"\"\"\n    sim = make_sim(params, beta)\n    data = web.run(sim, task_name=\"invdes\", verbose=False)\n    amps = data[\"mode\"].amps.sel(direction=\"-\", mode_index=0).values\n    return np.sum(np.abs(amps) ** 2)\n</code></pre>\n<h2>Step 3: Gradients via the Adjoint Method</h2>\n<p>To optimize, we need the gradient <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>d</mi><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">O</mi><mi mathvariant=\"normal\">M</mi></mrow><mi mathvariant=\"normal\">/</mi><mi>d</mi><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">d\\mathrm{FOM}/d\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">d</span><span class=\"mord\"><span class=\"mord mathrm\">FOM</span></span><span class=\"mord\">/</span><span class=\"mord mathnormal\">d</span><span class=\"mord mathnormal\">ρ</span></span></span></span> for every pixel: how does tweaking the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span> value of each pixel affect the output power? The brute-force approach would perturb each pixel one at a time and re-simulate. For our 150x150 grid, that's <strong>22,500 simulations</strong> per optimization step. Completely impractical.</p>\n<p>The <strong>adjoint method</strong> computes the <em>exact same gradient</em> using just <strong>two simulations</strong>, regardless of how many pixels there are:</p>\n<ol>\n<li><strong>Forward simulation</strong>: run the device normally, injecting light at the input and recording the electric field everywhere. This is the simulation we'd run anyway to evaluate the design.</li>\n<li><strong>Adjoint simulation</strong>: inject a special source <em>at the output monitor</em> that encodes the derivative of our objective function. This tells the simulation \"how much does the objective change if the field here changes?\" The resulting adjoint fields propagate backward through the device.</li>\n</ol>\n<p>After both simulations, the gradient at each pixel is simply the <strong>overlap of the forward and adjoint electric fields</strong>:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>F</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>ε</mi><mi>i</mi></msub></mrow></mfrac><mo>∝</mo><mi mathvariant=\"normal\">Re</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><msub><mi>E</mi><mi>i</mi></msub><mo>⋅</mo><msubsup><mi>E</mi><mi>i</mi><mrow><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">d</mi><mi mathvariant=\"normal\">j</mi></mrow></msubsup><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial F}{\\partial \\varepsilon_i} \\propto \\operatorname{Re}\\left(E_i \\cdot E_i^{\\mathrm{adj}}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2074em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">ε</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∝</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8em;vertical-align:-0.65em;\"></span><span class=\"mop\"><span class=\"mord mathrm\">Re</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">(</span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.967em;\"><span style=\"top:-2.4231em;margin-left:-0.0576em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">adj</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">)</span></span></span></span></span></span></span>\n<p>Two simulations instead of 22,500. Intuitively, the forward field tells you \"how strongly does this pixel interact with the input light?\" and the adjoint field tells you \"how strongly does this pixel influence the output?\" Their product gives the sensitivity of the objective to each pixel.</p>\n<p>This is the same principle behind <a href=\"https://jingnanshi.com/blog/autodiff.html\"><strong>backpropagation</strong></a> in neural networks. The adjoint simulation is the <a href=\"https://jingnanshi.com/blog/autodiff.html\">vector-Jacobian product (VJP)</a> of the forward electromagnetic solve, and both exploit the chain rule to avoid redundant computation (see <a href=\"https://doi.org/10.1021/acsphotonics.0c00327\">Minkov et al., 2020</a> for a detailed treatment connecting adjoint methods and automatic differentiation in photonics).</p>\n<p>Tidy3D implements the adjoint math as the VJP of its electromagnetic solver, so this second simulation happens automatically behind the scenes. When we wrap our objective in <code>autograd.value_and_grad</code>, Tidy3D runs both the forward and adjoint simulations and backpropagates the gradient through the entire computational pipeline (simulation, mode decomposition, filter, projection, and all).</p>\n<pre><code class=\"language-python\">val_and_grad = autograd.value_and_grad(objective)\n# val_and_grad is a function: given (params, beta), it returns (fom, gradient)\n# e.g. fom, grad = val_and_grad(params, beta=10)\n</code></pre>\n<h2>Step 4: Optimize</h2>\n<p>With cheap gradients in hand, we can use any gradient-based optimizer. The <code>autograd</code> library provides <strong>Adam</strong> out of the box. Adam is the same optimizer that trains most neural networks: it maintains running averages of the gradient (momentum) and its square (adaptive learning rate), giving more stable convergence than plain gradient ascent.</p>\n<p>Adam's <code>grad</code> function takes <code>(params, iteration)</code>. We use the iteration number to <strong>gradually increase <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span></strong>: early on, a low <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> keeps the design continuous, giving the optimizer freedom to explore many possible solutions; as the design matures, we ramp <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> up to push toward a binary (material or air) structure that can actually be fabricated. We also negate the gradient so that Adam <em>maximizes</em> our objective instead of minimizing.</p>\n<pre><code class=\"language-python\">from autograd.misc.optimizers import adam\n\nn_steps = 10\nparams0 = 0.5 * np.ones((nx, ny, 1))\nhistory, param_history = [], [np.array(params0)]\n\ndef neg_grad(params, i):\n    \"\"\"Negative gradient with projection schedule (Adam minimizes, we negate to maximize).\"\"\"\n    params = np.clip(params, 0, 1)\n    beta = 5 + 45 * i / max(n_steps - 1, 1)\n    fom, g = val_and_grad(params, beta)\n    history.append(float(fom))\n    param_history.append(np.array(params))\n    print(f\"  step {i:2d} | FOM = {fom:.4f} | beta = {beta:.1f}\")\n    return -g\n</code></pre>\n<h2>Run It</h2>\n<p>Starting from a uniform design (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi><mo>=</mo><mn>0.5</mn></mrow><annotation encoding=\"application/x-tex\">\\rho = 0.5</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0.5</span></span></span></span> everywhere, halfway between air and material), the optimizer discovers the structure from scratch. Each step runs two simulations on the cloud (forward + adjoint), computes the gradient across all 22,500 pixels, and updates the design.</p>\n<pre><code class=\"language-python\">params_opt = np.clip(adam(neg_grad, params0, num_iters=n_steps, step_size=0.3), 0, 1)\n</code></pre>\n<pre><code>  step  0 | FOM = 0.0022 | beta = 5.0\n  step  1 | FOM = 0.0531 | beta = 10.0\n  step  2 | FOM = 0.3975 | beta = 15.0\n  step  3 | FOM = 0.3932 | beta = 20.0\n  step  4 | FOM = 0.6367 | beta = 25.0\n  step  5 | FOM = 0.7139 | beta = 30.0\n  step  6 | FOM = 0.7795 | beta = 35.0\n  step  7 | FOM = 0.8487 | beta = 40.0\n  step  8 | FOM = 0.8861 | beta = 45.0\n  step  9 | FOM = 0.8856 | beta = 50.0\n</code></pre>\n<h2>Results</h2>\n<figure class=\"article-figure article-figure--results\">\n  <img\n    class=\"article-figure__desktop-image\"\n    src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/optimized-result.png\"\n    alt=\"Left: the optimized permittivity pattern. Right: the electromagnetic field intensity showing light bending from horizontal input to vertical output.\"\n    loading=\"lazy\"\n  />\n  <div class=\"article-figure__mobile-stack\">\n    <img\n      src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/optimized-design-panel.png\"\n      alt=\"Optimized permittivity pattern showing the discovered material layout inside the waveguide bend.\"\n      loading=\"lazy\"\n    />\n    <img\n      src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/optimized-field-panel.png\"\n      alt=\"Field intensity plot showing light bending from the horizontal input into the vertical output waveguide.\"\n      loading=\"lazy\"\n    />\n  </div>\n  <figcaption>\n    <strong>Final device and field response.</strong> The optimized permittivity pattern comes\n    first, with dark regions showing the high-index material and the light background showing air.\n    The corresponding field intensity, |E|^2, shows light entering from the left and bending\n    downward into the output waveguide.\n  </figcaption>\n</figure>\n<p>The final design looks nothing like what a human engineer would draw. There's no smooth curve, no gradual taper. Instead, the optimizer found a pattern of material and air that manipulates the electromagnetic field through interference to route the light around the corner.</p>\n<h3>Design evolution</h3>\n<p>Watch the design emerge from a uniform gray starting point. Early steps explore broadly (low <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span>, soft features); later steps sharpen into a clean binary design (high <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span>).</p>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/evolution.gif\"\n    alt=\"Animated evolution of the design over the optimization run, starting from a uniform gray design and sharpening into the final binary structure.\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Optimization trajectory.</strong> The design starts from a uniform gray initialization,\n    develops structure quickly in the early low-beta steps, and then sharpens into a binary pattern\n    as the projection becomes steeper.\n  </figcaption>\n</figure>\n<h3>Convergence</h3>\n<p>The animation shows how the geometry sharpens visually. The convergence trace shows the same story numerically: rapid early gains, then diminishing returns as the design binarizes.</p>\n<figure class=\"article-figure article-figure--compact\">\n  <img\n    src=\"https://engineering.flexcompute.com/images/photonic-inverse-design/convergence.png\"\n    alt=\"The FOM (output mode power) increases rapidly in the first few steps as the optimizer discovers the basic routing structure, then stabilizes as beta increases and the design binarizes.\"\n    loading=\"lazy\"\n  />\n  <figcaption>\n    <strong>Convergence.</strong> The output-mode power rises quickly once the optimizer finds the\n    basic routing pattern, then levels off as the design binarizes.\n  </figcaption>\n</figure>\n<h3>The Complete Pipeline</h3>\n<p>Zooming out, each optimization step does the same five things in order:</p>\n<div class=\"article-process\">\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">01</p>\n    <div>\n      <h3>Filter and project</h3>\n      <p>Map the raw design variables into a smooth, increasingly binary material distribution.</p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">02</p>\n    <div>\n      <h3>Build the simulation</h3>\n      <p>Insert that material layout into the pre-built Tidy3D model of the waveguide bend.</p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">03</p>\n    <div>\n      <h3>Run the forward solve</h3>\n      <p>Compute the output-mode power, which becomes the figure of merit.</p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">04</p>\n    <div>\n      <h3>Run the adjoint solve</h3>\n      <p>Backpropagate sensitivity information through the electromagnetic simulation.</p>\n    </div>\n  </section>\n  <section class=\"article-process__step\">\n    <p class=\"article-process__index\">05</p>\n    <div>\n      <h3>Update the design</h3>\n      <p>Use Adam to take a gradient step, then repeat with a slightly sharper projection.</p>\n    </div>\n  </section>\n</div>\n<p>Each iteration costs two electromagnetic simulations. The adjoint method makes this feasible. Without it, we'd need 22,500 simulations per step instead of 2.</p>\n<h2>Going Further</h2>\n<p>This post is a basic introduction to the technique, but a functional one. There are many more advanced variations for real problems in photonic device design. Production systems add:</p>\n<ul>\n<li><strong>3D simulations</strong>: real devices have finite thickness and vertical confinement</li>\n<li><strong>Broadband optimization</strong>: performance across a range of wavelengths, not just one</li>\n<li><strong>Fabrication constraints</strong>: minimum feature sizes, curvature limits, etch profiles</li>\n<li><strong>Multi-objective</strong>: multiple output ports, polarizations, robustness to manufacturing variation</li>\n</ul>\n<p>If you're interested in going deeper, check out these resources:</p>\n<ul>\n<li><a href=\"https://docs.flexcompute.com/projects/tidy3d/en/latest/notebooks/docs/features/autograd.html\">Inverse design examples</a> (wavelength demultiplexers, metalenses, mode converters, and more)</li>\n<li><a href=\"https://www.flexcompute.com/tidy3d/examples/notebooks/Autograd0Quickstart/\">Inverse design quickstart notebook</a> (a more complete worked example using the same autograd workflow)</li>\n<li><a href=\"https://www.flexcompute.com/tidy3d/learning-center/inverse-design/\">Inverse design learning center</a> (course-style introduction to adjoint optimization in Tidy3D)</li>\n</ul>\n<p>Want the exact working files for this post? <a href=\"https://engineering.flexcompute.com/downloads/photonic-inverse-design/photonic-inverse-design-companion.zip\">Download the companion bundle</a>. It includes the notebook, the Jupytext script export, <code>sim_base.yaml</code>, and the helper script that rebuilds the base simulation. Re-running the optimization requires a <a href=\"https://www.simulation.cloud/\">Tidy3D account</a>.</p>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/photonic-inverse-design-45-lines.md",
          "mime_type": "text/markdown",
          "title": "Designing a Photonic Chip Component with ~45 Lines of Python markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/photonic-inverse-design-45-lines.png",
          "mime_type": "image/png",
          "title": "Designing a Photonic Chip Component with ~45 Lines of Python social image"
        }
      ],
      "_flexcompute": {
        "kind": "Tutorial",
        "tags": [
          "photonics",
          "inverse-design",
          "optimization",
          "tidy3d"
        ],
        "markdown_url": "https://engineering.flexcompute.com/articles/photonic-inverse-design-45-lines.md"
      }
    },
    {
      "id": "https://engineering.flexcompute.com/articles/what-should-we-work-on-next/",
      "url": "https://engineering.flexcompute.com/articles/what-should-we-work-on-next/",
      "title": "\"What Should We Work On Next?\"",
      "summary": "The story of building an 80,000-line autodiff library almost entirely through AI agents — and the verification infrastructure that made it possible.",
      "image": "https://engineering.flexcompute.com/images/og/what-should-we-work-on-next.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/what-should-we-work-on-next.png",
      "date_published": "2026-02-26T00:00:00.000Z",
      "date_modified": "2026-02-26T00:00:00.000Z",
      "authors": [
        {
          "name": "Yannick Augenstein",
          "path": "/authors/yannick-augenstein/",
          "url": "https://engineering.flexcompute.com/authors/yannick-augenstein/"
        },
        {
          "name": "Frederik Schubert",
          "path": "/authors/frederik-schubert/",
          "url": "https://engineering.flexcompute.com/authors/frederik-schubert/"
        }
      ],
      "tags": [
        "AI Engineering",
        "Autodiff",
        "Verification"
      ],
      "content_html": "<p>import HarnessWorkflowFigure from '../../components/HarnessWorkflowFigure.astro';\nimport MutationPipelineFigure from '../../components/MutationPipelineFigure.astro';</p>\n<p>I have been building with AI coding agents for over a year.<sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1\" data-footnote-ref aria-describedby=\"footnote-label\">1</a></sup> Most of that time was unstructured: I used whatever worked, fixed problems as they came up, and did not pay much attention to the patterns. Then, starting around November 2025, I spent three months building a mostly personal internal autodiff library: graph-based, eagerly traced, NumPy-integrated, and almost entirely agent-written. By the end, it had reached roughly 80,000 lines of Python.</p>\n<p>I wanted a graph-first, NumPy-native library where the graph itself was explicit, inspectable, and available for transformations beyond autodiff. JAX is excellent, but it optimizes for a different set of tradeoffs: staged execution, JIT compilation, and the accelerator stack.</p>\n<p>Over those three months, my role shifted from reviewing every line of agent output to opening many sessions with a single question: \"what should we work on next?\" That shift did not happen only because the agents improved. It happened because the verification infrastructure improved enough that I could trust the floor: a minimum quality bar would hold regardless of what the agent produced.</p>\n<p>This article is about that infrastructure — what it looks like, how it grew, and why it matters more than you might think. But the infrastructure did not build itself. Every check exists because a human saw a failure and decided to formalize the fix. The code is not open source, so the point of this article is not the artifact itself, but the verification patterns and harness design that emerged while building it.</p>\n<h2>Directed development</h2>\n<p>This library did not start with agents. I had been sketching it on the side for a while — a design document, a basic tracing engine, some scaffolding. By the time I opened the first AI session, the repo had about a thousand lines of Python and a clear picture of what the library should be. What it did not have was momentum. A side project I picked up between other work, never for long enough to get past the foundation.</p>\n<p>The first session was a comparison: how did the library stack up against autograd and MyGrad?<sup><a href=\"#user-content-fn-2\" id=\"user-content-fnref-2\" data-footnote-ref aria-describedby=\"footnote-label\">2</a></sup> I was using the agent as a reviewer, asking for architectural opinions. When the agent got the target API wrong — proposing explicit tracing contexts when the library was designed for implicit tracing — I corrected it. The agent was a consultant. I was the authority.</p>\n<p>The second session set up CI: ruff with all rules enabled, mypy strict, <a href=\"https://hypothesis.readthedocs.io/en/latest/\">Hypothesis</a> property-based testing, conventional commits, and AGENTS.md.<sup><a href=\"#user-content-fn-3\" id=\"user-content-fnref-3\" data-footnote-ref aria-describedby=\"footnote-label\">3</a></sup> None of this was about agent harnesses; I did not even know the term yet. Engineering discipline and personal curiosity, which turned out to benefit agents enormously.</p>\n<p>What followed was a burst of rapid development. Over a handful of intense sessions, the codebase grew from a small prototype into something much larger: graph optimization passes, workflow orchestration, xarray integration, debugging and visualization tools. But the derivative engine remained the most verification-intensive part, and the one that drove most of the harness. My workflow settled into a ritual:</p>\n<blockquote>\n<p>\"review the current state of the repo, what do you think would be highest leverage to work on next?\"</p>\n</blockquote>\n<p>Let the agent propose priorities. Select or redirect. Ask for a detailed implementation plan with user stories, test strategy, and acceptance criteria. Review it. Paste it back with: \"PLEASE IMPLEMENT THIS PLAN.\"<sup><a href=\"#user-content-fn-4\" id=\"user-content-fnref-4\" data-footnote-ref aria-describedby=\"footnote-label\">4</a></sup></p>\n<h3>Where agents drift</h3>\n<p>The workflow was productive but not self-correcting. Pushing back mattered — and I had to push back constantly.</p>\n<p>On VJP coverage, the agent claimed near-complete parity with autograd. I was skeptical; autograd had far more VJPs than that.<sup><a href=\"#user-content-fn-5\" id=\"user-content-fnref-5\" data-footnote-ref aria-describedby=\"footnote-label\">5</a></sup> I was right — the comparison was incomplete. Compatibility shims revealed a similar pattern: the agent kept proposing backward-compatibility layers for a greenfield project with no users.</p>\n<p>The shims deserve their own mention because they are so universal. Old behavior kept via fallbacks and re-exports, new features built on top, instead of clean breaks. Even with explicit instructions, the agents reached for shims.<sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6\" data-footnote-ref aria-describedby=\"footnote-label\">6</a></sup> If you are building anything with coding agents, you will hit this.</p>\n<p>Files bloated too. Core modules grew past 1,800 lines. Test files reached similar sizes. A vicious cycle: longer files fill agent context faster, the agent gets worse at navigating the codebase, the code gets worse, the files grow further. The 500-line file limit I eventually imposed came directly from that pain.</p>\n<h3>Pushing back</h3>\n<p>One moment captures why human judgment still matters in this workflow.</p>\n<p>Autodiff frameworks need derivative rules for every operation, typically written as either VJPs (reverse-mode pullbacks) or JVPs (forward-mode pushforwards).<sup><a href=\"#user-content-fn-7\" id=\"user-content-fnref-7\" data-footnote-ref aria-describedby=\"footnote-label\">7</a></sup> This library had accumulated reverse-mode rules first, like most frameworks. In the middle of a session about expanding derivative coverage, the agent proposed deriving JVP rules from the existing VJP rules. That sounded reasonable.</p>\n<p>Then I asked the question that changed the architecture: \"Doesn't it scale better to make JVPs the default source of truth and derive VJPs from them where possible, rather than vice versa?\"</p>\n<p>For this codebase, yes. JVPs were the cleaner authoring primitive for much of the covered NumPy surface, and reverse-mode pullbacks could often be synthesized through the transpose machinery. Some operations still needed explicit reverse-mode exceptions for correctness or performance, but the default direction was backwards from what the agent proposed. Roughly twenty commits in a single day shifted the covered NumPy surface to a JVP-first policy.</p>\n<p>Later that evening, I checked the agent's work again: if JVPs were now the default source of truth, why did the rule layout still look overwhelmingly reverse-mode? The file structure told a different story than the runtime claims. The agent had wired the runtime correctly, but too much of the old reverse-mode formula structure was still in place. It took another full session — running overnight, largely autonomously — to make the migration real in the authored rule layout, not just the plumbing.</p>\n<p>A well-defined mathematical problem, clear correctness criteria, and a human catching both the architectural direction and the incomplete execution. If I hadn't asked, I probably wouldn't have caught it until the derivative layer was deeply entrenched — and unwinding it would have been painful.</p>\n<h2>The proto-harness</h2>\n<p>The JVP migration went (reasonably) well because the problem was mathematically well-defined. Most problems aren't. Every piece of verification infrastructure that followed was a response to something that went wrong.</p>\n<p>The first tool was <code>pre_pr.sh</code> — a small shell script born from watching the agent skip steps with each PR. It was the \"<a href=\"https://engineering.flexcompute.com/articles/agent-control-loop#unenforced-verification\">unenforced verification</a>\" failure mode from the first article, playing out in real time. The agent would forget to run mypy, or skip the VJP coverage check, or not rebuild docs. The script was my first attempt at \"one command to verify everything\": check for a clean working tree, rebase on main, run the linter, type checker, tests, coverage checks, grad contracts, and docs build, in sequence. If any step failed, the whole thing failed.</p>\n<p>The shell script could not keep up, so I replaced it with <code>quality.py</code> — a Python CLI consolidating the scripts into a single framework.</p>\n<p>But the commands to run verification differed across AGENTS.md, CI workflow files, and docs.<sup><a href=\"#user-content-fn-8\" id=\"user-content-fnref-8\" data-footnote-ref aria-describedby=\"footnote-label\">8</a></sup> And nothing was scoped: every check ran against the full codebase regardless of what changed.</p>\n<p>Each tool was a response to pain.</p>\n<h3>Boundaries</h3>\n<p>The architecture split into layers: a backend-agnostic <code>core</code>, a <code>numpy</code> integration layer, and a <code>grad</code> package for autodiff. The intent was clean separation — core should have zero knowledge of NumPy, so adding a CuPy or JAX backend later would be more tractable. It also gave agents clear lanes to work in.</p>\n<p>On the day this layered architecture was declared, the boundary checker caught its first violation within hours. Then another. The architecture was established, violated, fixed, violated again, and fixed again. The original checker only caught cross-layer internal-package imports; it missed bare <code>import numpy</code> entirely, so I had to extend it, again and again.</p>\n<p>The boundary checker ran continuously, but it only caught what it knew to look for. Ten weeks later, I manually scanned what the agent had built in the gradient package and found dozens of files with <code>import numpy as np</code>, well over a thousand <code>np.*</code> callsites, and a package that would not run without numpy installed even though its <code>pyproject.toml</code> declared no numpy dependency. After a big-bang refactor, the agent introduced a backend-agnostic proxy module — but exported the proxy object as <code>np</code>, not <code>xp</code>. Those files then did <code>from ..._backend_runtime import np</code>. Syntactically different from <code>import numpy as np</code>. Visually identical.</p>\n<p>Each round taught me to probe deeper. Was there <em>any</em> code in core that imported numpy, even lazily? Any mention of numpy in core, even as a variable name or string literal? There were — hardcoded <code>\"numpy\"</code> string literals, prefix-coupled registration, multiple compatibility wrapper modules. Every question I learned to ask was a check I should have automated earlier.</p>\n<h3>Curating context</h3>\n<p>Boundary enforcement was about rules within sessions. The next problem was continuity between them.</p>\n<p>I tried to scale development with sub-agents. In Claude Code, I built seven specialized agents: dispatch, test-gen, quality, architect, numpy-protocol, docs, debug. An agent team, each specialized on a domain. It did not stick. I burned a lot of tokens, but the output quality was arguably worse than if I had stuck with a single agent and manual steering. I think the problem was context: each agent starts from only a prompt and has to derive all its context from there. Handoff documents either included too little — and the receiving agent made wrong assumptions — or too much — and the agent could not distinguish signal from background.</p>\n<p>What did work was something simpler. Near the end of a long session, I was about to ask the agent to continue with the next phase. Instead, I asked it to write the prompt I should use for that next phase.</p>\n<p>The agent generated a detailed handoff prompt — project state, remaining work, constraints, validation commands — that I would then curate and paste into a fresh session. It was a \"relay\" pattern, where the prompt is a compressed representation of what matters: what was just done, what remains, what constraints apply.<sup><a href=\"#user-content-fn-9\" id=\"user-content-fnref-9\" data-footnote-ref aria-describedby=\"footnote-label\">9</a></sup></p>\n<p>Within days, I was running multiple agents in parallel. Each session got its own git worktree.<sup><a href=\"#user-content-fn-10\" id=\"user-content-fnref-10\" data-footnote-ref aria-describedby=\"footnote-label\">10</a></sup> I broke tasks down into work streams and dispatched them: \"Give me a prompt for each of these work streams, and tell me which ones I can kick off in parallel.\" Without <code>quality.py</code> running in each worktree, parallel agents would have been utter chaos. But now, each agent could independently verify its own changes, enabling a workflow that would not have been sustainable otherwise.</p>\n<h2>Properly, this time</h2>\n<p>Still, the quality gate turned red.<sup><a href=\"#user-content-fn-11\" id=\"user-content-fnref-11\" data-footnote-ref aria-describedby=\"footnote-label\">11</a></sup> The <code>jvp_grad_runtime_ratio</code> had drifted above the 10.0x threshold because of measurement noise on higher-order workloads. It was effectively blocking merges unrelated to performance.</p>\n<p>Around the same time, \"harness engineering\" was becoming a more explicit frame for what many teams were converging on. OpenAI had published <a href=\"https://openai.com/index/harness-engineering/\">their article</a> on the topic, and others were describing similar patterns.<sup><a href=\"#user-content-fn-12\" id=\"user-content-fnref-12\" data-footnote-ref aria-describedby=\"footnote-label\">12</a></sup> The general idea is simple: agent reliability comes from the environment, not just the model, and verification infrastructure deserves primary investment. Seeing the pattern named made the next step clear.</p>\n<p>I took a day off — no sessions, no commits. Then I decided to do it properly.</p>\n<p>The next session started with a single instruction: review the repo in light of OpenAI's harness engineering article and suggest how it should be restructured. That kicked off a hard cutover. <code>pre_pr.sh</code> and <code>quality.py</code> gave way to a dedicated harness, diff-scoped mutation testing, and JSON output with bounded content for agent-friendly context windows. That was the harness turning point.</p>\n<h3>Loop, mutate, gate</h3>\n<p>The harness is a CLI that wraps the repo's verification into three progressively broader commands:</p>\n<pre><code class=\"language-bash\">uv run python scripts/harness.py loop          # fast scoped, 180s budget\nuv run python scripts/harness.py mutate        # diff-scoped mutation\nuv run python scripts/harness.py gate          # full blocking merge gate\n</code></pre>\n<p>The agent no longer needs to know what verification steps exist or where they live. It runs a command and gets a go or no-go result.</p>\n<HarnessWorkflowFigure />\n<p><code>loop</code> is the tight inner cycle. It figures out what changed, expands through a dependency graph to find affected packages, and runs only the relevant checks — lint, type-checking, tests, quality gates — under a 180-second budget. If it runs out of time, it defers remaining checks rather than failing. In practice, this changed the agent's behavior: instead of running the full test suite after every edit — or worse, running nothing until the end — the agent started running <code>loop</code> after each logical change, catching issues while the context was still fresh.</p>\n<p><code>mutate</code> answers a different question: do the tests actually verify the changed behavior, or do they just execute it?</p>\n<p><code>gate</code> is the full merge requirement. All checks, no scoping, no budget. Everything must pass.</p>\n<p>At the time of writing, the harness runs dozens of checks across those three commands. Some of the most useful exist because of specific agent behaviors. The suppression guard blocks new <code># type: ignore</code> and <code># noqa</code> annotations — without it, the agent's first instinct when a type check fails is to suppress the error rather than fix it. The import boundary checker enforces architectural layering with AST-level analysis.</p>\n<p>Every command returns a JSON envelope with bounded content — at most 20 checks, 8 details per check, 240 characters per detail — so the agent's context window is not flooded with log output. A typical response looked like this (simplified):</p>\n<pre><code class=\"language-json\">{\n  \"ok\": false,\n  \"command\": \"harness loop\",\n  \"result\": {\n    \"scope\": { \"expanded_scopes\": [\"core\", \"grad\", \"numpy\", \"...\"] },\n    \"checks\": [\n      {\n        \"id\": \"pytest_scoped\",\n        \"status\": \"fail\",\n        \"details\": [\"FAILED test_tracer.py::test_record - AssertionError\"]\n      }\n    ],\n    \"summary\": { \"total\": 5, \"passed\": 4, \"failed\": 1 }\n  },\n  \"next_actions\": [{ \"command\": \"harness loop\", \"description\": \"Re-run after fix.\" }]\n}\n</code></pre>\n<p>The <code>next_actions</code> field is context-aware: after a successful <code>loop</code>, the harness suggests <code>mutate</code>; after a failure, it suggests the specific post-fix command.</p>\n<p>None of these techniques are new individually. Scope-aware test selection exists in Bazel, Nx, and plenty of CI tools. <a href=\"https://en.wikipedia.org/wiki/Mutation_testing\">Mutation testing</a> has been around for decades.<sup><a href=\"#user-content-fn-13\" id=\"user-content-fnref-13\" data-footnote-ref aria-describedby=\"footnote-label\">13</a></sup> The pieces needed to be wired together in a specific way: JSON output bounded for agent context windows, a progression from fast-and-scoped to slow-and-complete, and every check degrading toward strictness when anything is uncertain.<sup><a href=\"#user-content-fn-14\" id=\"user-content-fnref-14\" data-footnote-ref aria-describedby=\"footnote-label\">14</a></sup></p>\n<h3>What to check</h3>\n<p>The core mechanic that makes the harness practical is scope resolution — turning \"what files changed\" into \"what checks to run.\" It starts with <code>git diff origin/main</code> to get changed paths, matches each against a path map that routes file paths to one of nine package scopes, then expands through a dependency graph using BFS. In simplified form, the config looked like this:</p>\n<pre><code class=\"language-toml\">[scope.path_map]\n\"packages/core/\"            = \"core\"\n\"packages/numpy/\"           = \"numpy\"\n\"packages/grad/\"            = \"grad\"\n\"packages/grad_numpy/\"      = \"grad_numpy\"\n\"packages/xarray/\"          = \"xarray\"\n\n[scope.dependencies]\ncore       = []\ngrad       = [\"core\"]\nnumpy      = [\"core\"]\ngrad_numpy = [\"core\", \"grad\", \"numpy\"]\nxarray     = [\"core\", \"numpy\"]\n</code></pre>\n<p>Change a file in <code>packages/core</code> and BFS expands to all dependent packages. Change only <code>packages/xarray</code> and only xarray checks run. Anything unrecognized — unmapped paths, missing merge-base, diff failures — falls back to the full gate.</p>\n<p>This is what enables the 180-second loop budget. Without scoped checks, every change triggers every check — too slow for a tight agent loop. With scoping, a change to one leaf package triggers seconds of verification, not minutes.</p>\n<h3>No survivors</h3>\n<p>Coverage measures execution. <a href=\"https://en.wikipedia.org/wiki/Mutation_testing\">Mutation testing</a> measures verification.</p>\n<p>An agent can (and will) write a test like this:</p>\n<pre><code class=\"language-python\">def test_gradient_scaling():\n    result = scale_gradient(x, factor=0.5)\n    assert result is not None  # 100% coverage, 0% verification\n</code></pre>\n<p>That test covers every line of <code>scale_gradient</code>. But flip <code>factor > 0</code> to <code>factor >= 0</code> inside the function, and the test still passes. It is not testing the behavior it claims to cover.</p>\n<MutationPipelineFigure />\n<p>The idea came from a Slack conversation with <a href=\"https://engineering.flexcompute.com/authors/frederik-schubert\">Frederik</a>: mutation testing is expensive on the whole codebase, but what if you scope it to the diff? Mutating only the five to twenty lines you just changed makes the cost manageable. The pipeline: find changed source lines via <code>git diff</code>, filter to lines actually executed by tests via coverage, generate AST-level mutations (comparison flips, boolean inversions, arithmetic swaps, constant flips), select at most twelve mutants with breadth across files, and run the relevant tests against each. If the tests still pass after a mutation, the mutant survived — the tests do not verify the behavior they claim to cover.</p>\n<p>The policy is strict: 100% kill rate on changed lines. If any mutant survives, the PR is blocked. And <code>require_changed_tests = true</code> adds another constraint: if you change runtime code, you must also change or add tests. No silent runtime changes. The mutation check then verifies that those tests actually catch real behavioral differences, not just execute code paths.</p>\n<p>Pure refactors and equivalent mutants occasionally cause friction. The tradeoff is worth it when the entity writing your tests is an AI that optimizes for making them pass rather than making them meaningful. Diff-scoping is what makes it practical. Full-codebase mutation testing is a research project. Diff-scoped mutation testing is a CI check.</p>\n<h2>The new normal</h2>\n<p>What changed after the harness is not that the agent writes better code. What changed is that bad code gets caught — missing tests, broken boundaries, skipped steps, silent regressions — before it compounds.</p>\n<p>Across 287 commits from November 27, 2025 to February 24, 2026, with February 16 as the harness cutover, the share of commits touching test files rose from about 41% to 76%, consistent with <code>require_changed_tests</code>.<sup><a href=\"#user-content-fn-15\" id=\"user-content-fnref-15\" data-footnote-ref aria-describedby=\"footnote-label\">15</a></sup> The fix ratio held roughly flat at about 11%. Commits also got larger — averaging about 1,187 insertions versus 668 pre-harness — suggesting more confidence in landing bigger changes when the harness catches mistakes.</p>\n<p>But the more interesting change was behavioral. I had always asked the agent to propose priorities — that ritual started early. But before the harness, I treated the proposals as suggestions and drove each session with specific goals: \"implement this JVP,\" \"fix the module layout,\" \"split these files.\" After, I could actually follow the agent's lead. Most sessions started the same way — \"what should we work on next?\" — and this time I meant it. The default shifted from \"I tell you what to build\" to \"you tell me what needs building, and I decide whether to approve.\"</p>\n<p>That shift felt strange the first few times. Asking an AI \"what should we work on?\" and actually trusting the answer requires believing the floor will catch whatever goes wrong. The harness was that floor. It could not prevent the agent from writing mediocre abstractions or introducing unnecessary complexity — architectural judgment still requires a human. But it could enforce that tests existed, that they verified behavior, that imports respected boundaries, that type annotations were real<sup><a href=\"#user-content-fn-16\" id=\"user-content-fnref-16\" data-footnote-ref aria-describedby=\"footnote-label\">16</a></sup>, that suppressions were not growing. The minimum quality was no longer me. It was the tooling.</p>\n<p>The relay pattern from the earlier sprints would not have scaled without it. Parallel agents exacerbate every problem a single agent has — more drift, more skipped steps, more context confusion — and the harness was what kept them honest.</p>\n<p>When something went wrong, the response changed too. Finding a bug in <code>np.pad</code> tracing, I did not just ask for a fix. I also asked why it slipped through, what would have prevented it, and how the instructions or harness should change so the same class of bug would not recur. Every bug became a harness improvement opportunity. After feature sessions, I started asking: was the harness useful? what did it catch? did mutation tests surface anything? They almost always had — mutation testing reliably catches weak tests that the agent wrote to pass, not to verify. The system learned from its failures, not through any kind of machine learning, but through a human treating each failure as evidence that a check was missing.<sup><a href=\"#user-content-fn-17\" id=\"user-content-fnref-17\" data-footnote-ref aria-describedby=\"footnote-label\">17</a></sup></p>\n<h2>What transfers</h2>\n<p>This implementation is purely Python, and that shows in the choice of technologies. AST-based mutation testing, pytest-driven coverage, ruff and mypy as lint and type gates — these are ecosystem-specific. The implementation is shaped by its context. But the exercise transfers: figure out what your agent keeps getting wrong, build a check for it, and wire that check into a loop the agent cannot skip. The specific checks will differ. The discipline should not.</p>\n<p>Start with the check that would have caught the last thing that went wrong. The harness will grow from there.</p>\n<p>I built this library intentionally as a learning experience for myself — paying attention to the patterns, documenting what went wrong, formalizing each fix. I suspect much of it will be familiar to anyone who has spent real time building with agents. The stacks and checks will vary. The habit should not: when the agent fails, turn the failure into a constraint, a check, or a better handoff, then keep going.</p>\n<section data-footnotes class=\"footnotes\"><h2 class=\"sr-only\" id=\"footnote-label\">Footnotes</h2>\n<ol>\n<li id=\"user-content-fn-1\">\n<p>This article was written with AI assistance. I dictated raw thoughts using dictation software, then worked with Claude to turn them into prose — iterating paragraph by paragraph and pushing back whenever something did not sound like me. The research was AI-assisted too: I used a local transcript indexing and search tool, and had agents crawl through three months of git history to surface the timelines and stats behind the claims here. The result is more thoroughly researched than what I would have produced on my own, which is part of the point. <a href=\"#user-content-fnref-1\" data-footnote-backref=\"\" aria-label=\"Back to reference 1\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-2\">\n<p><a href=\"https://github.com/HIPS/autograd\">Autograd</a> by Maclaurin, Duvenaud, and Johnson is the original NumPy autodiff library — elegant, influential, and the reason most of us know reverse-mode AD can feel native to Python. I had used it extensively, and it was the main reference point in my head. <a href=\"https://github.com/rsokl/MyGrad\">MyGrad</a> by Ryan Soklaski takes a different approach — a Tensor object with NumPy ufunc/function overrides rather than autograd's tracing tape — and its Hypothesis-heavy testing style directly influenced this library's. <a href=\"#user-content-fnref-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 2\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-3\">\n<p>Autodiff is the perfect application for property-based testing — you can express real mathematical invariants (gradient correctness, chain rule composition, forward/reverse agreement) and let the framework try to break them. <a href=\"#user-content-fnref-3\" data-footnote-backref=\"\" aria-label=\"Back to reference 3\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-4\">\n<p>The plans were remarkably specific. A typical one would include exact file paths, function signatures, test case names, and threshold constants — for example, <code>AUTO_MIN_NODES_FOR_ANY_OPT = 128</code>, <code>AUTO_MIN_NODES_FOR_CSE = 512</code>. The agent produced these from its analysis of the codebase; I reviewed and approved. <a href=\"#user-content-fnref-4\" data-footnote-backref=\"\" aria-label=\"Back to reference 4\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-5\">\n<p>Autograd was the reference point I had in mind when I was sanity-checking the agent's parity claim. <a href=\"#user-content-fnref-5\" data-footnote-backref=\"\" aria-label=\"Back to reference 5\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-6\">\n<p>My best guess at the cause: post-training fine-tuning rewards \"safe\" code — strong compatibility guarantees, no breaking changes, defensive patterns. The training data is overwhelmingly production code with real users, where preserving backward compatibility is the right default. The model has no way to distinguish that context from a greenfield repo with zero consumers. Sensible instinct, wrong situation. <a href=\"#user-content-fnref-6\" data-footnote-backref=\"\" aria-label=\"Back to reference 6\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-7\">\n<p>For readers who want to dive deeper: JAX's <a href=\"https://docs.jax.dev/en/latest/notebooks/autodiff_cookbook.html\">Autodiff Cookbook</a> is an excellent introduction to forward- and reverse-mode differentiation, and their <a href=\"https://docs.jax.dev/en/latest/notebooks/Custom_derivative_rules_for_Python_code.html\">Custom derivative rules</a> page explains exactly the JVP-first approach adopted here — define a JVP rule, and the framework derives VJPs automatically by transposing the linear computation. <a href=\"#user-content-fnref-7\" data-footnote-backref=\"\" aria-label=\"Back to reference 7\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-8\">\n<p>The naive question is: why not just keep them in sync? In practice, a growing codebase accumulates multiple places that encode the same information — agent instructions, CI configs, contributor docs, READMEs — and no matter how explicit the instructions are, they drift. Each source gets updated in its own context, by a different session or a different agent, and nobody notices the divergence until something breaks. This is hard to stay on top of even with human contributors; with agents that read whatever file they find first, it compounds fast. The harness solved this by making the single CLI the only source of truth — AGENTS.md says \"run harness loop,\" CI runs \"harness gate,\" and neither needs to enumerate individual steps. <a href=\"#user-content-fnref-8\" data-footnote-backref=\"\" aria-label=\"Back to reference 8\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-9\">\n<p>Both Claude Code and Codex CLI had automatic context compaction by this point — summarizing conversation history when the context window fills up. The relay pattern solves a different problem. Auto-compaction tries to preserve everything; the relay deliberately discards, starting a fresh session with only what the human judges relevant. The compression is lossy by design — that is the point. <a href=\"#user-content-fnref-9\" data-footnote-backref=\"\" aria-label=\"Back to reference 9\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-10\">\n<p>Around this time, I shifted most implementation work from Claude Code to Codex. Different tools for different strengths — Codex sessions averaged five hours for heavy, instruction-driven implementation; Claude Code sessions averaged under two hours for analysis, architecture review, and focused tasks. <a href=\"#user-content-fnref-10\" data-footnote-backref=\"\" aria-label=\"Back to reference 10\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-11\">\n<p>This library's graph-based tracing is inherently heavier than autograd's flat tape — you pay for node allocation, edge management, and scope tracking on every operation. Early benchmarks showed very large overhead. A dedicated performance sprint across several parallel workstreams brought this down significantly, and a 10x ratio was set as the acceptable threshold: slow enough to reflect the architectural cost, fast enough to be usable. The threshold was wired into <code>quality.py</code> as a blocking gate — which worked until measurement noise on higher-order workloads pushed it past 10x on runs where nothing performance-related had changed. <a href=\"#user-content-fnref-11\" data-footnote-backref=\"\" aria-label=\"Back to reference 11\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-12\">\n<p>The term gained traction in early 2026. Mitchell Hashimoto's <a href=\"https://mitchellh.com/writing/my-ai-adoption-journey\">\"My AI Adoption Journey\"</a> described the practice of engineering a solution for every agent mistake so it never recurs — each line in his AGENTS.md traced to a specific past failure. OpenAI's <a href=\"https://openai.com/index/harness-engineering/\">\"Harness engineering\"</a> made the case at scale using Codex. Anthropic demonstrated it by having <a href=\"https://www.anthropic.com/engineering/building-c-compiler\">sixteen parallel Claude agents build a C compiler</a> in Rust. Cursor showed what happens at the extreme — <a href=\"https://cursor.com/blog/scaling-agents\">agents building a browser from scratch</a>, running unattended for a week. Martin Fowler's <a href=\"https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html\">analysis</a> provided conceptual framing. Can Duruk's <a href=\"https://blog.can.ac/2026/02/12/the-harness-problem/\">\"The Harness Problem\"</a> argued the harness is the bottleneck, not the model. <a href=\"#user-content-fnref-12\" data-footnote-backref=\"\" aria-label=\"Back to reference 12\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-13\">\n<p>Google runs diff-based mutation testing on every code change to its monorepo, using the same core idea: generate mutants only in changed lines, use coverage data to select relevant tests, suppress unproductive mutations. Their system serves tens of thousands of developers. See Petrovic and Ivankovic, <a href=\"https://research.google/pubs/pub46584/\">\"State of Mutation Testing at Google\"</a> (ICSE 2018). <a href=\"#user-content-fnref-13\" data-footnote-backref=\"\" aria-label=\"Back to reference 13\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-14\">\n<p>This is the <a href=\"https://engineering.flexcompute.com/articles/agent-control-loop#unenforced-verification\">\"fail closed\" principle</a> from the first article: when the system cannot determine whether something is safe, it should assume it is not. Every ambiguity resolves toward more checking, not less. <a href=\"#user-content-fnref-14\" data-footnote-backref=\"\" aria-label=\"Back to reference 14\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-15\">\n<p>\"Touched test files\" means the commit's diffstat includes at least one file under a <code>tests/</code> directory. \"Fix\" commits are classified by conventional commit prefix (<code>fix:</code>). Insertions are raw <code>git log --stat</code> totals. <a href=\"#user-content-fnref-15\" data-footnote-backref=\"\" aria-label=\"Back to reference 15\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-16\">\n<p>In a typed codebase, agents do not have to guess what a function expects or returns. Types are how agents navigate a large codebase without reading every implementation. In Python that discipline is optional rather than enforced by the language, so tools like mypy strict have to carry the load. <a href=\"#user-content-fnref-16\" data-footnote-backref=\"\" aria-label=\"Back to reference 16\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-17\">\n<p>This extended to the agent instructions themselves. Early on, I established a rule that <code>AGENTS.md</code> should self-update: if an agent followed a rule and it still led to the wrong outcome, the rule needed to be refined. The instructions file became a living document maintained by the agents who consumed it. <a href=\"#user-content-fnref-17\" data-footnote-backref=\"\" aria-label=\"Back to reference 17\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n</ol>\n</section>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/what-should-we-work-on-next.md",
          "mime_type": "text/markdown",
          "title": "\"What Should We Work On Next?\" markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/what-should-we-work-on-next.png",
          "mime_type": "image/png",
          "title": "\"What Should We Work On Next?\" social image"
        }
      ],
      "_flexcompute": {
        "kind": "Case Study",
        "tags": [
          "ai-engineering",
          "autodiff",
          "verification"
        ],
        "series": "AI Engineering",
        "series_order": 2,
        "markdown_url": "https://engineering.flexcompute.com/articles/what-should-we-work-on-next.md"
      }
    },
    {
      "id": "https://engineering.flexcompute.com/articles/agent-control-loop/",
      "url": "https://engineering.flexcompute.com/articles/agent-control-loop/",
      "title": "The Agent Control Loop — Engineering for Tolerance",
      "summary": "Why agent reliability isn't magic model behavior — it's an environment where correctness is continuously verified. A framework for deciding when and how to delegate to AI agents.",
      "image": "https://engineering.flexcompute.com/images/og/agent-control-loop.png",
      "banner_image": "https://engineering.flexcompute.com/images/og/agent-control-loop.png",
      "date_published": "2026-01-19T00:00:00.000Z",
      "date_modified": "2026-01-19T00:00:00.000Z",
      "authors": [
        {
          "name": "Frederik Schubert",
          "path": "/authors/frederik-schubert/",
          "url": "https://engineering.flexcompute.com/authors/frederik-schubert/"
        },
        {
          "name": "Yannick Augenstein",
          "path": "/authors/yannick-augenstein/",
          "url": "https://engineering.flexcompute.com/authors/yannick-augenstein/"
        }
      ],
      "tags": [
        "AI Engineering",
        "AI Agents",
        "Verification"
      ],
      "content_html": "<p>Consider two recent experiments with coding agents. Similar ambition. Opposite outcomes.</p>\n<p>In the <a href=\"https://cursor.com/blog/scaling-agents\">first</a>, a team pointed hundreds of agents at a browser project. In a week they produced roughly a million lines of code. By coordination metrics it was a success: parallel work, lots of merged PRs, visible throughput. But when the project went public, <a href=\"https://embedding-shapes.github.io/cursor-implied-success-without-evidence/\">outside observers pointed to failing CI and questioned how much of that visible throughput translated into a clean, working system</a>.</p>\n<p>In the second, as described in <a href=\"https://approachwithalacrity.com/p/claude-is-not-a-senior-engineer-yet\"><em>Claude is not a senior engineer (yet)</em></a>, a single engineer connected Claude to an automated browser testing suite (Playwright) and an error monitoring tool (Sentry). The agent wrote code, ran the tests, read the error traces, and fixed its own bugs. Ninety minutes later, it worked.</p>\n<p>I am not trying to offer a definitive postmortem on either case. I am using them as contrasting examples of a broader engineering pattern.</p>\n<p>The pattern is a standard engineering concept: <strong>tolerance.</strong></p>\n<h2>Tolerance: How much drift can you afford?</h2>\n<p>Mechanical engineering abandoned binary \"works/doesn't work\" thinking decades ago. A bridge doesn't just \"work\". It tolerates a specific load variance under specific conditions. We ask about allowable margin of error, <em>i.e.</em>, the acceptable error band around the ideal.</p>\n<p>Software <em>engineering</em> has tolerances, too.</p>\n<p>Tolerance isn't one number. In software it decomposes into dimensions like correctness, security, latency, cost, reversibility, and blast radius (think error budgets). UI copy may be flexible on exact wording but not on brand tone. A refactor may tolerate new implementation details but not behavior changes.</p>\n<p>Some tasks are <strong>high tolerance</strong>: exploratory prototyping, quick internal tools, one-off scripts, early drafts. Drift is acceptable because the goal is discovery and speed.</p>\n<p>Other tasks are <strong>low tolerance</strong>: production infrastructure, security boundaries, customer-facing behavior, billing and permissions. Here \"close enough\" isn't a solution. It's a failure that may not show up immediately, but will surface later as incidents and churn.</p>\n<p>The difference between those two agent experiments was that one treated a low-tolerance problem with a high-tolerance process.</p>\n<p>That mismatch shows up as a control problem.</p>\n<h2>Open Loops vs. Closed Loops</h2>\n<p>In an <strong>open loop</strong>, the agent writes code and opens pull requests, but verification bottlenecks at human review minutes, hours, or days later. The delay between action and verification lets error accumulate. Drift becomes visible only after it's expensive.</p>\n<p>In a <strong>closed loop</strong>, the agent makes a change and immediately runs verification against hard constraints. The loop itself damps error. Closed loops require fast, reliable feedback; slow or flaky verification re-opens the loop.</p>\n<h3>The Two Experiments Compared</h3>\n<table>\n<thead>\n<tr>\n<th></th>\n<th>Browser Project (Open Loop)</th>\n<th>LLM + Tests + Traces (Closed Loop)</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>Feedback signal</strong></td>\n<td>Informational (throughput, mergeability, activity)</td>\n<td>Structural (tests pass, errors resolved)</td>\n</tr>\n<tr>\n<td><strong>Verification timing</strong></td>\n<td>After the fact, by humans</td>\n<td>Every turn, by the agent</td>\n</tr>\n<tr>\n<td><strong>Termination condition</strong></td>\n<td>PR merged</td>\n<td>Constraints satisfied</td>\n</tr>\n<tr>\n<td><strong>Outcome</strong></td>\n<td>Implied success; required human fixes</td>\n<td>Working fix in 90 minutes</td>\n</tr>\n</tbody>\n</table>\n<p>Read this way, the browser project optimized for coordination metrics, while the Claude setup optimized for correctness under feedback. The point is not that many agents are inherently bad. The point is that open loops amplify drift when verification is weak. <strong>Agent reliability isn't magic model behavior</strong>, it's an environment where correctness is continuously verified.</p>\n<p>But verification has a precondition: success must be expressible as constraints the agent can actually verify.</p>\n<blockquote>\n<p>For a picture of what the browser experiment <em>could</em> have looked like with human coordination and a tight loop, see <a href=\"https://emsh.cat/one-human-one-agent-one-browser/\">One Human + One Agent = One Browser From Scratch</a>.</p>\n</blockquote>\n<h2>The Ambiguity Gap</h2>\n<p>The fundamental challenge with agents is that <strong>intent</strong> is latent (in your head), while <strong>evidence</strong> is explicit (text documents in the repo).</p>\n<p><strong>Ambiguity is the distance between intent and evidence.</strong></p>\n<p>When you delegate to a human engineer, they bridge that gap with judgment: they ask clarifying questions, infer missing context, notice anomalies, and sanity-check against domain knowledge.</p>\n<p>When you delegate to an AI agent, it can't feel that gap. It needs <strong>measurable constraints</strong> to know whether it has actually crossed from \"plausible output\" to \"correct result.\"</p>\n<p>Without constraints, the agent will still do as instructed. It will just optimize for the easiest proxy it can satisfy: producing output, closing tickets, merging PRs. Not correctness and integration.</p>\n<p>A practical predictor of these outcomes is:</p>\n<p><strong>Can you describe success in terms of constraints the agent can verify?</strong></p>\n<p>If you can, agents compound your effort. If you can't, you're doing exploration, and you should treat the output as exploration.</p>\n<h2>Four Failure Modes (and their fixes)</h2>\n<p>When agents appear unreliable, it's usually a failure of the <em>surrounding system design</em> rather than the model itself. We see four common patterns.</p>\n<ol className=\"failure-modes\">\n  <li id=\"undefined-specs\">\n    <p className=\"failure-mode__title\">Undefined Specs</p>\n    <p>\n      You have intent, but no mechanism to verify it. The requirements are unsettled, or the\n      definition of done is just a feeling, like \"make onboarding feel simpler.\"\n    </p>\n    <p className=\"failure-mode__fix\">\n      <strong>Fix:</strong> Don't delegate the decision-making. Use the agent to prototype and\n      explore, but treat the output as raw material that helps you write the spec, <em>not</em> as\n      the final product. If you don't know what done looks like, the agent won't either.\n    </p>\n  </li>\n  <li id=\"hidden-context\">\n    <p className=\"failure-mode__title\">Hidden Context</p>\n    <p>\n      The constraints exist, but they're trapped in a meeting note or a Slack thread. Unlike\n      undefined specs, the spec exists here. It just isn't where the agent can read it. Think of the\n      edge-case permission rule that came up once in discussion but never made it into the repo.\n    </p>\n    <p className=\"failure-mode__fix\">\n      <strong>Fix:</strong> Treat context as code. If a constraint isn't captured in versioned,\n      linkable artifacts (<code>AGENTS.md</code>, RFCs/ADRs, schemas), it doesn't exist for the\n      agent.\n    </p>\n  </li>\n  <li id=\"unenforced-verification\">\n    <p className=\"failure-mode__title\">Unenforced Verification</p>\n    <p>\n      The specs exist and are accessible, but the agent isn't forced to check them. Tests are nice\n      to have. CI failures don't block merges. The system rewards speed or volume over correctness.\n      The result is a workflow where \"it probably works\" is treated as progress.\n    </p>\n    <p className=\"failure-mode__fix\">\n      <strong>Fix:</strong> Verification must be a termination condition. CI gates must fail closed.\n      Pre-commit hooks tighten the loop. If the tests don't pass, the agent hasn't finished.\n    </p>\n  </li>\n  <li id=\"inadequate-constraints\">\n    <p className=\"failure-mode__title\">Inadequate Constraints</p>\n    <p>\n      The agent is verifying, but the constraints are too weak or too game-able. Tests pass, yet the\n      system is still wrong: coverage is thin, assertions encode the wrong intent, or non-functional\n      requirements (performance, security, UX) aren't represented. For example, unit tests stay\n      green while latency quietly doubles.\n    </p>\n    <p className=\"failure-mode__fix\">\n      <strong>Fix:</strong> Widen the constraint surface. Add invariants and golden tests for\n      critical flows, static analysis (types, linters), and where it matters, property\n      tests/fuzzing. For production-adjacent changes, pair verification with observability and\n      rollback criteria.\n    </p>\n  </li>\n</ol>\n<h2>Deciding What to Delegate</h2>\n<p>As models get smarter and faster, and context windows expand, the temptation is to throw them at larger, fuzzier problems. But a smarter, faster agent in a fuzzy environment mostly produces the wrong thing faster. It cannot know what it cannot read.</p>\n<p>To decide whether to delegate, ask:</p>\n<ul>\n<li>Can the agent verify success on its own?</li>\n<li>How much drift can you tolerate if it gets the answer slightly wrong?</li>\n</ul>\n<p>That gives four cases:</p>\n<ol>\n<li><strong>Verifiable, high tolerance.</strong> Let it run and spot-check. Examples: generating release notes from merged PRs; drafting meeting notes.</li>\n<li><strong>Verifiable, low tolerance.</strong> Delegate with gates. Examples: fixing a failing unit test; fixing a customer-reported bug.</li>\n<li><strong>Not yet verifiable, high tolerance.</strong> Use the agent for exploration, then extract constraints from what you learn. Examples: exploring UI layouts for a new feature; brainstorming marketing copy.</li>\n<li><strong>Not yet verifiable, low tolerance.</strong> Don't delegate the decision yet. First use the agent to produce the artifacts that make the work verifiable. Examples: draft a permission matrix, define invariants, write escalation-path tests, prototype policy-as-code.</li>\n</ol>\n<p>High-level work (architecture, strategy, trade-offs) often starts in the hardest case: low verifiability, low tolerance. Assumptions hide best there. But high-level work is usually decomposable. Break it into constrained subtasks, then delegate those.</p>\n<p>For example, \"defining multi-tenant permissions\" is low tolerance and low verifiability at the start. Don't delegate the decision; delegate the work of making it verifiable: draft a permission matrix and invariants, write tests for escalation paths, prototype a policy-as-code layer. Once those constraints exist, implementation becomes a low-tolerance but verifiable task.</p>\n<h2>The Acceleration of Debt</h2>\n<p>None of this is new engineering wisdom. What changes with agents is the rate at which small omissions compound.</p>\n<p>Humans bridge gaps socially: they ask questions, notice contradictions, remember \"that one incident from last year,\" and hesitate when something feels off. Agents don't get those dampeners. They will happily produce plausible work until the system forces contact with reality.</p>\n<p>That's why what used to be technical debt becomes <em>context failure</em>. If a constraint (decision records, schemas, invariants, style guides, runbooks) isn't captured in the repo, it effectively doesn't exist for an agent. <strong>Treat context as code</strong>, and treat verification as the termination condition, not a suggestion.</p>\n<p>The future workflow isn't exotic. It's the old best practices, made load-bearing by speed. Start with the last thing your agent got wrong. Turn it into a constraint or a check. Wire it into the loop, then repeat.</p>",
      "attachments": [
        {
          "url": "https://engineering.flexcompute.com/articles/agent-control-loop.md",
          "mime_type": "text/markdown",
          "title": "The Agent Control Loop — Engineering for Tolerance markdown"
        },
        {
          "url": "https://engineering.flexcompute.com/images/og/agent-control-loop.png",
          "mime_type": "image/png",
          "title": "The Agent Control Loop — Engineering for Tolerance social image"
        }
      ],
      "_flexcompute": {
        "kind": "Essay",
        "tags": [
          "ai-engineering",
          "agents",
          "verification"
        ],
        "series": "AI Engineering",
        "series_order": 1,
        "markdown_url": "https://engineering.flexcompute.com/articles/agent-control-loop.md"
      }
    }
  ]
}