High signal articles on AI, engineering, or whatever

Agents

Benchmarks

Best Practices

Career

Codex

Economics

Engineering

Hiring

History

LLM

Labor Market

Leadership

Macro

Open Source

Prompt Engineering

Research

Testing

Thinking

Tools

clippings

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

2026-03-10 · Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, Bing Zhao · arxiv.org

“”

clippings

Engineering

LLM

A discussion of the limits of LLM based software engineering, specifically the gaps on long term and complex maintainability type issues.

BullshitBench v2 Results: Anthropic & Qwen 3.5 Lead in Bullshit Detection

2026-03-07 · Adam Holter · www.linkedin.com

“This may be a fascinating may to assess how actually useful a model is on some level, if not having a fawning servant is the aim.”

Benchmarks

Research

LLM

Only two model families score above 60% on bullshit detection: Anthropic's latest models and Qwen 3.5. Reasoning models score lower, not higher — they build around wrong premises rather than reject them.

The 2026 Global Intelligence Crisis

2026-03-07 · Frank Flight · citadelsecurities.com

“A clear take on where we stand in AI adoption and particular value creation as of late February. The limits of "displacement" are clear.”

Economics

Labor Market

Macro

Citadel Securities is an award-winning global market-maker across a broad array of fixed income and equity products.

Harness engineering: leveraging Codex in an agent-first world

2026-02-13 · Ryan Lopopolo · openai.com

“”

Agents

Engineering

Codex

Conrete experiments in what it takes to go full Ralph Wiggam and OpenAI learnings

TBM 406: Seeing Everything, Understanding Nothing

2026-02-13 · John Cutler · cutlefish.substack.com

“Essential reading on the dangers of over-contextualizing in AI systems.”

Leadership

Thinking

The context trap - AI is supercharging legacy leadership assumptions about context and control.

The Harness Problem

2026-02-12 · Can Bölük · blog.can.ac

“The edit tool is the variable that matters most for coding agents.”

Agents

Engineering

Improving 15 LLMs at coding in one afternoon. Only the harness changed.

Perplexity Computer Launches AI Tool with Autonomous Capabilities

2026-02-07 · Tim O'Neill · linkedin.com

“Analysis of where AI tools are headed in 2026.”

Agents

Tools

The race is to the top right, where AI agents work autonomously and have real control over your desktop.

AINews: Anthropic's Agent Autonomy Study

2026-02-06 · Swyx · latent.space

“Data on how Claude Code usage grew from 25 mins to 45+ mins.”

Agents

Research

Anthropic's study of its own API usage patterns measuring AI agent autonomy in practice.

OpenAI's Agent-First Codebase Learnings

2026-02-05 · Alex Lavaee · alexlavaee.me

“OpenAI's Harness team produced ~1,500 merged PRs with 3 engineers.”

Agents

Engineering

5-month experiment: build and ship a real product with zero manually-written code.

Anthropic Releases SKILLS for AI Agents

2026-02-04 · Dallin Bentley · linkedin.com

“A simple but fundamental shift toward file-system-based agent memory.”

Agents

Tools

Files that live alongside your AI agent. The agent can read these files just like it would when working with a codebase.

The "AI Grifter" Crowd and Claude Code

2026-02-02 · Giorgio Vilardo · linkedin.com

“This post captures the architectural shift from GUI to CLI-based AI agents and why leveraging linux + file system is just such a great foundation.”

Agents

Tools

Moving away from the "VS Code clone sidebar" and towards CLI agents.

Just-in-Time Catching Test Generation at Meta

2026-01-30 · Matthew BeckerYifei ChenNicholas CochranPouyan GhasemiAbhishek GulatiMark Harman*Zachary HaluzaMehrdad HonarkhahHerve RobertJiacheng LiuWeini LiuSreeja ThummalaXiaoning YangRui XinSophie Zeng · arxiv.org

“”

Testing

Engineering

Research

Meta's paper on Just-in-Time test generation — automatically generating catching tests at the point of code change to detect regressions before they merge, evaluated across Meta's production codebase.

AGENTS.md Outperforms Skills in Our Agent Evals

2026-01-28 · Jude Gao · vercel.com

“Passive context beats active retrieval for AI coding agents.”

Agents

Engineering

A compressed 8KB docs index embedded directly in AGENTS.md achieved a 100% pass rate, while skills maxed out at 79%.

So You Want to Hire a Forward Deployed Engineer

2026-01-20 · Tiffany Siu · review.firstround.com

“FDEs help build incrementally more valuable products from concrete use cases.”

Career

Engineering

Hiring

What FDEs actually do and how to hire the right one for your team.

2026 Interview Questions I'm Asking Engineers

2026-01-15 · Punn Kam · linkedin.com

“Already thinking about what this means for hiring in a world where the IDE and hand-coding is not important”

Career

Hiring

\"You're in the middle of a refactor and the model says 8% context left before auto-compaction. What do you do?\"

Effective Harnesses for Long-Running Agents

2025-11-26 · Anthropic · anthropic.com

“The moment that announced we were all in the harness engineering phase of AI engineernig, which as of this writing we remain in.”

Agents

Engineering

Creating a more effective harness for long-running agents, inspired by human engineers.

Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity

2025-11-08 · Multiple Authors · arxiv.org

“MSR 2026 paper on the tradeoffs of AI-assisted coding.”

Agents

Research

How Cursor AI increases short-term velocity and long-term complexity in open-source projects.

When ChatGPT Broke an Entire Field: An Oral History

2025-04-30 · John Pavlus · quantamagazine.org

“Fascinating oral history of the AI paradigm shift.”

History

Research

How LLMs upended the field of NLP in just a few years.

How We Solved Hallucination in LLMs with Open Source Code

2024-12-15 · Leon Chlon, PhD · linkedin.com

“Revolutionary approach to hallucination detection.”

Research

Open Source

LLM hallucinations aren't bugs - they're compression artifacts.

The Prompt Report

2024-12-12 · Sander Schulhoff · learnprompting.org

“80+ page survey of all prompting techniques.”

Prompt Engineering

Research

The most comprehensive study of prompting ever done - 1,500+ academic papers analyzed.

Building Effective AI Agents

2024-10-01 · Erik Schluntz & Barry Zhang · anthropic.com

“The definitive guide to agent architecture from the team behind Claude in the early days. So many unknown unknowns.”

Agents

Best Practices

Best practices and patterns for building production AI agents.

Claude Squad

2024-06-15 · SMTG-AI · github.com

“Multi-agent orchestration for terminal-based AI coding. There are many competitors now, but a very interesting early implementation more or less subsumed by coding harnesses of big providers.”

Tools

Open Source

Manage multiple AI terminal agents like Claude Code, Aider, Codex, OpenCode, and Amp.

The Rise of the AI Engineer

2023-09-01 · Swyx · latent.space

“Required reading for anyone building with LLMs professionally.”

Career

Engineering

Why AI engineering is becoming the hottest role in tech.

LLM Powered Autonomous Agents

2023-06-23 · Lilian Weng · lilianweng.github.io

“The canonical reference for understanding agent design patterns.”

LLM

Agents

Research

Comprehensive overview of agent architectures and patterns.