Nobody Tests Their AI Agent Skills. Here's Why That's a Problem (and How to Fix It)
47,000+ agent skills across 6,300+ repos. Almost none tested beyond a vibe check. Here's a 4-layer testing architecture that fixes that.
> Complexity is often just an explanation problem.
# the SEMS engineering algorithm
def ship(code):
if simple(code):
return "🎉 ship it"
while not simple(code):
code = refactor(code) # delete > add
return ship(code)
$ ship(production)
> complexity detected: 9001
> applying SEMS…
> ✓ simple. shipped.47,000+ agent skills across 6,300+ repos. Almost none tested beyond a vibe check. Here's a 4-layer testing architecture that fixes that.
How we built Automan - a static, YAML-driven internal portal entirely with Claude Code, no servers, no frameworks.
An internal framework for packaging team domain knowledge, workflows, and automation into reusable Claude modules - shareable, versioned, and improvable.
Stop arguing about quotes and line breaks. Black eliminates formatting debates entirely and gives back hours each week.
Use Cookiecutter to bootstrap consistent pytest projects across an organization. Same skeleton, different teams, zero copy-paste drift.
FastMCP turns Model Context Protocol servers into a few decorators. A walkthrough - plus why this matters for testing automation.
A practical method for designing extensible test automation infrastructure using a JSON schema as the single source of truth and Claude Code as a development partner.
Stop the fix-run-fail-repeat cycle. pytest-check lets you collect multiple assertion failures in a single test run.
Run OLLaMA 3 locally and wire it into PyCharm via Continue. A free, offline 'copilot-alike' that keeps your code on your machine.
One client class, two test frameworks. How to share API interaction code between Locust load tests and pytest functional tests.