High signal articles on AI, engineering, or whatever

AI
Agents
Benchmarks
Best Practices
Career
Codex
Economics
Engineering
Hiring
History
LLM
Labor Market
Leadership
Macro
Open Source
Prompt Engineering
Research
Testing
Thinking
Tools
clippings
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
2026-03-10 · Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, Bing Zhao · arxiv.org

clippings
AI
Engineering
LLM

A discussion of the limits of LLM based software engineering, specifically the gaps on long term and complex maintainability type issues.

BullshitBench v2 Results: Anthropic & Qwen 3.5 Lead in Bullshit Detection
2026-03-07 · Adam Holter · www.linkedin.com

This may be a fascinating may to assess how actually useful a model is on some level, if not having a fawning servant is the aim.

AI
Benchmarks
Research
LLM

Only two model families score above 60% on bullshit detection: Anthropic's latest models and Qwen 3.5. Reasoning models score lower, not higher — they build around wrong premises rather than reject them.

The 2026 Global Intelligence Crisis
2026-03-07 · Frank Flight · citadelsecurities.com

A clear take on where we stand in AI adoption and particular value creation as of late February. The limits of "displacement" are clear.

AI
Economics
Labor Market
Macro

Citadel Securities is an award-winning global market-maker across a broad array of fixed income and equity products.

Harness engineering: leveraging Codex in an agent-first world
2026-02-13 · Ryan Lopopolo · openai.com

AI
Agents
Engineering
Codex

Conrete experiments in what it takes to go full Ralph Wiggam and OpenAI learnings

TBM 406: Seeing Everything, Understanding Nothing
2026-02-13 · John Cutler · cutlefish.substack.com

Essential reading on the dangers of over-contextualizing in AI systems.

AI
Leadership
Thinking

The context trap - AI is supercharging legacy leadership assumptions about context and control.

The Harness Problem
2026-02-12 · Can Bölük · blog.can.ac

The edit tool is the variable that matters most for coding agents.

AI
Agents
Engineering

Improving 15 LLMs at coding in one afternoon. Only the harness changed.

Perplexity Computer Launches AI Tool with Autonomous Capabilities
2026-02-07 · Tim O'Neill · linkedin.com

Analysis of where AI tools are headed in 2026.

AI
Agents
Tools

The race is to the top right, where AI agents work autonomously and have real control over your desktop.

AINews: Anthropic's Agent Autonomy Study
2026-02-06 · Swyx · latent.space

Data on how Claude Code usage grew from 25 mins to 45+ mins.

AI
Agents
Research

Anthropic's study of its own API usage patterns measuring AI agent autonomy in practice.

OpenAI's Agent-First Codebase Learnings
2026-02-05 · Alex Lavaee · alexlavaee.me

OpenAI's Harness team produced ~1,500 merged PRs with 3 engineers.

AI
Agents
Engineering

5-month experiment: build and ship a real product with zero manually-written code.

Anthropic Releases SKILLS for AI Agents
2026-02-04 · Dallin Bentley · linkedin.com

A simple but fundamental shift toward file-system-based agent memory.

AI
Agents
Tools

Files that live alongside your AI agent. The agent can read these files just like it would when working with a codebase.

The "AI Grifter" Crowd and Claude Code
2026-02-02 · Giorgio Vilardo · linkedin.com

This post captures the architectural shift from GUI to CLI-based AI agents and why leveraging linux + file system is just such a great foundation.

AI
Agents
Tools

Moving away from the "VS Code clone sidebar" and towards CLI agents.

Just-in-Time Catching Test Generation at Meta
2026-01-30 · Matthew BeckerYifei ChenNicholas CochranPouyan GhasemiAbhishek GulatiMark Harman*Zachary HaluzaMehrdad HonarkhahHerve RobertJiacheng LiuWeini LiuSreeja ThummalaXiaoning YangRui XinSophie Zeng · arxiv.org

AI
Testing
Engineering
Research

Meta's paper on Just-in-Time test generation — automatically generating catching tests at the point of code change to detect regressions before they merge, evaluated across Meta's production codebase.

AGENTS.md Outperforms Skills in Our Agent Evals
2026-01-28 · Jude Gao · vercel.com

Passive context beats active retrieval for AI coding agents.

AI
Agents
Engineering

A compressed 8KB docs index embedded directly in AGENTS.md achieved a 100% pass rate, while skills maxed out at 79%.

So You Want to Hire a Forward Deployed Engineer
2026-01-20 · Tiffany Siu · review.firstround.com

FDEs help build incrementally more valuable products from concrete use cases.

Career
Engineering
Hiring

What FDEs actually do and how to hire the right one for your team.

2026 Interview Questions I'm Asking Engineers
2026-01-15 · Punn Kam · linkedin.com

Already thinking about what this means for hiring in a world where the IDE and hand-coding is not important

AI
Career
Hiring

\"You're in the middle of a refactor and the model says 8% context left before auto-compaction. What do you do?\"

Effective Harnesses for Long-Running Agents
2025-11-26 · Anthropic · anthropic.com

The moment that announced we were all in the harness engineering phase of AI engineernig, which as of this writing we remain in.

AI
Agents
Engineering

Creating a more effective harness for long-running agents, inspired by human engineers.

Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity
2025-11-08 · Multiple Authors · arxiv.org

MSR 2026 paper on the tradeoffs of AI-assisted coding.

AI
Agents
Research

How Cursor AI increases short-term velocity and long-term complexity in open-source projects.

When ChatGPT Broke an Entire Field: An Oral History
2025-04-30 · John Pavlus · quantamagazine.org

Fascinating oral history of the AI paradigm shift.

AI
History
Research

How LLMs upended the field of NLP in just a few years.

How We Solved Hallucination in LLMs with Open Source Code
2024-12-15 · Leon Chlon, PhD · linkedin.com

Revolutionary approach to hallucination detection.

AI
Research
Open Source

LLM hallucinations aren't bugs - they're compression artifacts.

The Prompt Report
2024-12-12 · Sander Schulhoff · learnprompting.org

80+ page survey of all prompting techniques.

AI
Prompt Engineering
Research

The most comprehensive study of prompting ever done - 1,500+ academic papers analyzed.

Building Effective AI Agents
2024-10-01 · Erik Schluntz & Barry Zhang · anthropic.com

The definitive guide to agent architecture from the team behind Claude in the early days. So many unknown unknowns.

AI
Agents
Best Practices

Best practices and patterns for building production AI agents.

Claude Squad
2024-06-15 · SMTG-AI · github.com

Multi-agent orchestration for terminal-based AI coding. There are many competitors now, but a very interesting early implementation more or less subsumed by coding harnesses of big providers.

AI
Tools
Open Source

Manage multiple AI terminal agents like Claude Code, Aider, Codex, OpenCode, and Amp.

The Rise of the AI Engineer
2023-09-01 · Swyx · latent.space

Required reading for anyone building with LLMs professionally.

AI
Career
Engineering

Why AI engineering is becoming the hottest role in tech.

LLM Powered Autonomous Agents
2023-06-23 · Lilian Weng · lilianweng.github.io

The canonical reference for understanding agent design patterns.

AI
LLM
Agents
Research

Comprehensive overview of agent architectures and patterns.

Copyright © 2026 Thomas Neil