Pillars
Pillar
Generative Engine Optimization: How to Earn AI Citations
Generative engine optimization decides whether ChatGPT and AI Overviews cite you. The 2026 playbook: crawlers, llms.txt, and AI share of voice.
June 11, 202616 min
Pillar
AI Coding Agent Economics: Real ROI and Cost per Pull Request
AI coding ROI, demystified: real cost per pull request ($1-$30 in tokens, $20-$80 all-in), why the 4:1 claim doesn't hold up, and when local-first agents beat the cloud.
June 11, 202620 min
Pillar
SWE-bench Pro vs Verified: Can You Trust Coding Benchmarks?
SWE-bench Verified was deprecated after 59.4% of its hard tasks had flawed tests. What SWE-bench Pro and the DeepSWE audit reveal about coding agent benchmarks.
June 10, 202618 min
More analysis
AI Coding Agent Economics: Real ROI and Cost per Pull Request
Frontier labs now ship more AI-written code than human-written code, but the viral ROI numbers are wrong. Here is the money math that survives CFO scrutiny. 20 min Context Rot and the Dumb Zone: Engineering Past 100k Tokens
Bigger context windows didn't fix attention. Past roughly 100k tokens agents get lost in the middle, and the fix is architectural, not bigger. 11 min SWE-bench Pro vs Verified: Can You Trust Coding Benchmarks?
OpenAI deprecated the benchmark everyone quoted, an audit found graders wrong on a third of verdicts, and frontier models got caught reading the answer key. Here is what actually measures a coding agent in 2026. 18 min AGENTS.md vs CLAUDE.md vs Cursor Rules: Config Done Right
The config files are your agent's control plane. Get the three-tier permission model and context budgeting right, or watch instruction adherence rot. 9 min The Ralph Wiggum Loop: Why Stateless Agents Beat Smart Ones
Wiping the agent's memory every iteration sounds like sabotage. It's actually the most reliable way anyone has found to run a coding agent for hundreds of turns. 9 min Reasoning-First LLMs: Make Models Reason, Not Rationalize
Your model's chain of thought is a narrative, not a derivation. Here is the stack that forces it to actually compute the answer. 11 min
Frontier labs now ship more AI-written code than human-written code, but the viral ROI numbers are wrong. Here is the money math that survives CFO scrutiny. 20 min Context Rot and the Dumb Zone: Engineering Past 100k Tokens
Bigger context windows didn't fix attention. Past roughly 100k tokens agents get lost in the middle, and the fix is architectural, not bigger. 11 min SWE-bench Pro vs Verified: Can You Trust Coding Benchmarks?
OpenAI deprecated the benchmark everyone quoted, an audit found graders wrong on a third of verdicts, and frontier models got caught reading the answer key. Here is what actually measures a coding agent in 2026. 18 min AGENTS.md vs CLAUDE.md vs Cursor Rules: Config Done Right
The config files are your agent's control plane. Get the three-tier permission model and context budgeting right, or watch instruction adherence rot. 9 min The Ralph Wiggum Loop: Why Stateless Agents Beat Smart Ones
Wiping the agent's memory every iteration sounds like sabotage. It's actually the most reliable way anyone has found to run a coding agent for hundreds of turns. 9 min Reasoning-First LLMs: Make Models Reason, Not Rationalize
Your model's chain of thought is a narrative, not a derivation. Here is the stack that forces it to actually compute the answer. 11 min
