meddler meddler
  • Home
  • About
  • AI Agents
  • Coding Agents
  • Reading List
  • Open Source AI
  • Skills Map
  • Quick Search ⌘K
  • More
    Benchmarks Security Tutorials Lifecycle Topics Authors Contact
Controls
Search ⌘K Theme Auto
Menu
  • Home
  • About
  • Contact
Coverage
  • AI Agents
  • Coding Agents
  • Reading List
  • Benchmarks
  • Security
  • Tutorials
  • Open Source AI
Directory
  • Skills Map ✦
  • Topics
  • Authors
  • Privacy
  • Terms

anontruder

Hi I'm anontruder
LA 62 posts
Agents SDK | OpenAI API ai-agents-2-2

Agents SDK | OpenAI API

Reference for orchestrating multi-step, tool-using agent systems with explicit application control.

  • Go to the profile of  anontruder
Noah Bennett
9 Jun 2026 · 1 min read
Agents | OpenAI Developers ai-agents-2-2

Agents | OpenAI Developers

Practical guide for architecture, control flow, safety, and eval loops in production agents.

  • Go to the profile of  anontruder
Liam Carter
9 Jun 2026 · 1 min read
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems ai-agents-2-2

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks th...

  • Go to the profile of  anontruder
Ava Brooks
26 May 2026 · 1 min read
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations ai-agents-2-2

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository con...

  • Go to the profile of  anontruder
Owen Blake
25 May 2026 · 1 min read
What Do Evolutionary Coding Agents Evolve? ai-agents-2-2

What Do Evolutionary Coding Agents Evolve?

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, ye...

  • Go to the profile of  anontruder
Nina Reed
19 May 2026 · 1 min read
Code as Agent Harness ai-agents-2-2

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is...

  • Go to the profile of  anontruder
Leo Parker
18 May 2026 · 1 min read
WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games ai-agents-2-2

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application. We introduce WebGameBenc...

  • Go to the profile of  anontruder
Aria Patel
17 May 2026 · 1 min read
From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability ai-agents-2-2

From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

Generative artificial intelligence (GenAI) and agentic systems are moving software engineering from code-centric production toward intent-centric human-agent work in which natural language, repository context, tools, tes...

  • Go to the profile of  anontruder
Zoe Walker
10 May 2026 · 1 min read
Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents ai-agents-2-2

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Software engineering agents are increasingly deployed in evaluable engineering environments, yet post-failure recovery remains costly, manual, and ad hoc. Existing systems expose traces or generate follow-up feedback, bu...

  • Go to the profile of  anontruder
Ethan Shaw
9 May 2026 · 1 min read
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution ai-agents-2-2

SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

We introduce SWE Atlas, a benchmark suite for coding agents spanning three professional software engineering workflows: Codebase Q&A (124 tasks), Test Writing (90 tasks), and Refactoring (70 tasks). SWE Atlas differs fro...

  • Go to the profile of  anontruder
Maya Collins
8 May 2026 · 1 min read
ProgramBench: Can Language Models Rebuild Programs From Scratch? ai-agents-2-2

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight...

  • Go to the profile of  anontruder
Noah Bennett
5 May 2026 · 1 min read
ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation ai-agents-2-2

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time...

  • Go to the profile of  anontruder
Liam Carter
27 Mar 2026 · 1 min read
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? ai-agents-2-2

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation a...

  • Go to the profile of  anontruder
Ava Brooks
3 Mar 2026 · 1 min read
Your Code Agent Can Grow Alongside You with Structured Memory ai-agents-2-2

Your Code Agent Can Grow Alongside You with Structured Memory

While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded...

  • Go to the profile of  anontruder
Owen Blake
25 Feb 2026 · 1 min read
Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs? ai-agents-2-2

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and invest...

  • Go to the profile of  anontruder
Nina Reed
20 Feb 2026 · 1 min read
Beyond Quantity: Trajectory Diversity Scaling for Code Agents ai-agents-2-2

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of qua...

  • Go to the profile of  anontruder
Leo Parker
3 Feb 2026 · 1 min read
BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization ai-agents-2-2

BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and out of distribution. Existing...

  • Go to the profile of  anontruder
Aria Patel
29 Dec 2025 · 1 min read
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents ai-agents-2-2

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software system...

  • Go to the profile of  anontruder
Zoe Walker
14 Dec 2025 · 1 min read
meddler meddler

meddler

Explore

  • AI Agents
  • Coding Agents
  • Reading List
  • Topics
  • Open Source AI

Company

  • About
  • Authors
  • Contact
  • Podcast

Legal

  • Privacy Policy
  • Terms of Use
  • Cookie Policy
  • Editorial Policy
© 2026 meddler. All rights reserved.
RSS Sitemap Support