anontruder - meddler (Page 2)

What is an AI agent?

Strong conceptual framing for agent boundaries, when not to use agents, and practical decomposition.

Zoe Walker

9 Jun 2026 · 1 min read

Agents • Cookbook

Hands-on examples for coding agents end to end with realistic tool and memory patterns.

Ethan Shaw

9 Jun 2026 · 1 min read

Tools | OpenAI API

Deep dive into web/file/tool-search patterns that materially change agent capability and reliability.

Maya Collins

9 Jun 2026 · 1 min read

Agents SDK | OpenAI API

Reference for orchestrating multi-step, tool-using agent systems with explicit application control.

Noah Bennett

9 Jun 2026 · 1 min read

Agents | OpenAI Developers

Practical guide for architecture, control flow, safety, and eval loops in production agents.

Liam Carter

9 Jun 2026 · 1 min read

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks th...

Ava Brooks

26 May 2026 · 1 min read

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository con...

Owen Blake

25 May 2026 · 1 min read

What Do Evolutionary Coding Agents Evolve?

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, ye...

Nina Reed

19 May 2026 · 1 min read

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is...

Leo Parker

18 May 2026 · 1 min read

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application. We introduce WebGameBenc...

Aria Patel

17 May 2026 · 1 min read

From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

Generative artificial intelligence (GenAI) and agentic systems are moving software engineering from code-centric production toward intent-centric human-agent work in which natural language, repository context, tools, tes...

Zoe Walker

10 May 2026 · 1 min read

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Software engineering agents are increasingly deployed in evaluable engineering environments, yet post-failure recovery remains costly, manual, and ad hoc. Existing systems expose traces or generate follow-up feedback, bu...

Ethan Shaw

9 May 2026 · 1 min read

SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

We introduce SWE Atlas, a benchmark suite for coding agents spanning three professional software engineering workflows: Codebase Q&A (124 tasks), Test Writing (90 tasks), and Refactoring (70 tasks). SWE Atlas differs fro...

Maya Collins

8 May 2026 · 1 min read

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight...

Noah Bennett

5 May 2026 · 1 min read

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time...

Liam Carter

27 Mar 2026 · 1 min read

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation a...

Ava Brooks

3 Mar 2026 · 1 min read

Your Code Agent Can Grow Alongside You with Structured Memory

While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded...

Owen Blake

25 Feb 2026 · 1 min read

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and invest...

Nina Reed

20 Feb 2026 · 1 min read