Construí 2 servidores MCP que convierten a Claude en analista financiero y auditor SEO
14 Apr 2026, 9:13 amSi trabajas con agentes de IA (Claude, Cursor, Windsurf...), probablemente ya has oído hablar de MCP — Model Context Protocol. Si no, aquí va el resumen rápido.
Qué es MCP y por qué debería importarte
MCP es un estándar abierto creado por Anthropic que define cómo los modelos de lenguaje se conectan con herramientas externas. Piensa en ello como el USB-C de la IA: un protocolo universal para que cualquier agente pueda usar cualquier herramienta, sin importar quién la construyó.
Antes de MCP, cada integración era ad-hoc. Querías que Claude leyera tu base de datos? Plugin custom. Que Cursor accediera a tu API interna? Otro plugin custom, incompatible con el anterior. MCP estandariza todo esto: defines un servidor con herramientas, y cualquier cliente MCP puede usarlas.
El ecosistema está explotando: hay más de 17.000 servidores publicados y 97 millones de descargas npm al mes del SDK oficial. Pero — y este es el punto clave — la gran mayoría son wrappers básicos de APIs. Le pasan un JSON al modelo y ya. Sin lógica, sin análisis, sin valor añadido.
Yo construí dos servidores que intentan ir más allá: no solo devuelven datos crudos, sino que los procesan, calculan métricas derivadas y te dan conclusiones accionables. La diferencia entre recibir un precio de $142.50 y recibir "NVDA muestra sesgo fuertemente alcista con Golden Cross confirmado, RSI en 62 y MACD positivo".
FinanceKit MCP — Inteligencia de mercados financieros
17 herramientas (v1.2.0 — 12 base + 5 premium nuevas). El valor diferencial es el motor de análisis técnico: no se limita a traer datos de Yahoo Finance. Calcula 10 indicadores técnicos, detecta patrones y te da un veredicto en texto plano.
Lo que puedes pedirle a Claude:
"Hazme un análisis técnico de NVDA"
Y obtienes:
- Sesgo general: STRONGLY BULLISH (cuantificado: 3.0 señales alcistas vs 0.5 bajistas)
- RSI, MACD, Bandas de Bollinger, SMA(20/50/200), EMA, ADX, Estocástico, ATR, OBV
- Detección de patrones: Golden Cross, Death Cross, sobrecompra/sobreventa
- 7 señales detalladas como: "MACD alcista — línea por encima de la señal, ambas positivas. Fuerte impulso al alza."
Otras herramientas incluidas:
| Herramienta | Descripción |
|---|---|
stock_quote |
Cotización en tiempo real de cualquier ticker |
company_info |
Perfil completo de una empresa |
multi_quote |
Varias cotizaciones en una sola llamada |
crypto_price |
Precio de criptos via CoinGecko |
crypto_trending |
Criptos en tendencia |
crypto_top_coins |
Top criptos por market cap |
technical_analysis |
Análisis técnico completo (10 indicadores) |
price_history |
Histórico de precios |
compare_assets |
Comparativa de activos con Sharpe ratio y drawdown |
portfolio_analysis |
Análisis de cartera con desglose por sector |
market_overview |
Vista general: S&P 500, NASDAQ, DOW, VIX, top movers |
"Compara AAPL, MSFT y GOOGL en los últimos 6 meses"
Claude te devuelve una tabla comparativa con retorno, volatilidad, Sharpe ratio, max drawdown y correlación entre activos. Listo para tomar decisiones.
SiteAudit MCP — Auditorías web instantáneas
11 herramientas (v1.2.0 — 8 base + 3 premium nuevas). Dale cualquier URL y ejecuta 20+ checks de SEO, analiza cabeceras de seguridad, verifica certificados SSL, mide rendimiento y puede comparar varios sitios.
Lo que puedes pedirle a Claude:
"Audita stripe.com"
Resultado:
- Puntuación global: 92/100 (Grado A) — SEO: 94, Performance: 90, Security: 90
- Problemas específicos: "Título demasiado largo (66 chars)", "46/75 imágenes sin alt text"
- Estado del certificado SSL y fecha de expiración
- Performance: 66ms de respuesta, compresión gzip, sin redirecciones
La herramienta estrella: compare_sites
"Compara mitienda.com contra amazon.es y pccomponentes.com"
En un solo comando obtienes una comparativa lado a lado de SEO, seguridad y rendimiento contra tu competencia. Ideal para auditorías competitivas o para convencer a un cliente de que necesita mejorar.
Todas las herramientas:
| Herramienta | Descripción |
|---|---|
full_audit |
Auditoría completa (SEO + seguridad + rendimiento) |
seo_audit |
20+ checks de SEO |
security_audit |
Cabeceras de seguridad + SSL |
performance_audit |
Métricas de rendimiento |
compare_sites |
Comparativa entre sitios |
lighthouse_audit |
Puntuaciones Lighthouse y Core Web Vitals |
check_links |
Detección de enlaces rotos (verificación concurrente) |
check_robots_txt |
Análisis del robots.txt |
No necesita API keys. Instala y usa. (Lighthouse usa la API gratuita de Google PageSpeed Insights — 25K llamadas/día.)
Instalación en 30 segundos
⭐ Opción recomendada: MCPize (cero setup)
La forma más rápida. Sin terminal, sin instalar Python, sin clonar repos. Funciona al momento en Claude Desktop, Cursor, Windsurf o Claude Code:
👉 Instalar FinanceKit en MCPize (gratis, 100 llamadas/mes)
👉 Instalar SiteAudit en MCPize (gratis, 100 llamadas/mes)
O añade directamente a tu config:
{
"mcpServers": {
"financekit": {
"url": "https://financekit-mcp.mcpize.run/mcp"
},
"siteaudit": {
"url": "https://siteaudit-mcp.mcpize.run/mcp"
}
}
}
Por qué MCPize
- ✅ Zero setup — sin
uv, sinpip, sin clonar nada - ✅ Siempre actualizado — nuevos tools y mejoras automáticas
- ✅ Tier gratis generoso — 100 llamadas/mes para probar
- ✅ Escala fácil — Pro desde $19-29/mes cuando lo necesitas
- ✅ Sin dolor de cabeza — no gestionas uptime, rate limits ni updates
Planes y precios
FinanceKit: Free ($0) → Hobby ($9) → Pro ($29) → Team ($79) → Business ($179) → Enterprise ($499)
SiteAudit: Free ($0) → Hobby ($7) → Pro ($19) → Agency ($49) → Agency Plus ($119) → Enterprise ($349)
Bundle Pro Combo: FinanceKit + SiteAudit por $39/mes (ahorro 19%)
💻 Opción avanzada: self-hosted
Si prefieres correrlo en tu máquina (también es gratis, pero gestionas actualizaciones e infraestructura tú):
Claude Code CLI
claude mcp add financekit -- uvx --from financekit-mcp financekit
claude mcp add siteaudit -- uvx --from siteaudit-mcp siteaudit
Claude Desktop / Cursor / Windsurf (local)
{
"mcpServers": {
"financekit": {
"command": "uvx",
"args": ["--from", "financekit-mcp", "financekit"]
},
"siteaudit": {
"command": "uvx",
"args": ["--from", "siteaudit-mcp", "siteaudit"]
}
}
}
Via pip o Smithery
pip install financekit-mcp siteaudit-mcp
O desde Smithery con un click.
Para la mayoría de usuarios, MCPize es la mejor opción. Solo elige self-hosted si eres dev avanzado y quieres control total.
Conversaciones de ejemplo
Una vez instalados, hablas con Claude de forma natural:
Finanzas:
- "Dame la cotización de Tesla y un análisis técnico"
- "Qué criptos están trending ahora?"
- "Analiza mi cartera: 40% AAPL, 30% GOOGL, 20% AMZN, 10% BTC"
- "Compara el rendimiento de NVDA vs AMD en el último año"
Web:
- "Audita la web de mi empresa: midominio.com"
- "Compara nuestro SEO contra los 3 competidores principales"
- "Revisa las cabeceras de seguridad de api.miservicio.com"
- "Hay enlaces rotos en docs.miproyecto.dev?"
Claude usa las herramientas automáticamente. No necesitas recordar nombres de funciones ni parámetros.
v1.2.0 — 8 herramientas premium nuevas
Acabo de lanzar la v1.2.0 con 8 tools premium (incluidas en Pro tier y superiores):
FinanceKit (+5 tools, ahora 17 en total):
-
risk_metrics— VaR (95%), Sharpe, Sortino, Beta vs benchmark, Max Drawdown -
correlation_matrix— Correlaciones entre activos + score de diversificación -
earnings_calendar— Próximos earnings + histórico de EPS estimado vs reportado -
options_chain— Calls/puts con volumen, open interest, volatilidad implícita -
sector_rotation— Los 11 sectores GICS rankeados por performance
SiteAudit (+3 tools, ahora 11 en total):
-
accessibility_audit— Checks WCAG: alt text, labels en formularios, jerarquía de headings, ARIA -
schema_validator— Extrae y valida Schema.org JSON-LD con verificación de campos requeridos por tipo -
competitor_gap_analysis— Audita tu web vs hasta 5 competidores y devuelve gaps + área prioritaria
Próximamente en v1.3: alertas inteligentes, backtesting, crawler de sitio completo, reportes PDF white-label, auditorías programadas.
Stack técnico
Para los curiosos:
- FastMCP 3.2 como framework del servidor MCP
- yfinance + CoinGecko API para datos financieros
- ta (technical analysis) para el cálculo de indicadores
- BeautifulSoup para parsing HTML (SiteAudit)
- Google PageSpeed Insights API para Lighthouse
- Cache con TTL para minimizar llamadas a APIs externas
- Distribución via PyPI, Smithery, MCPize, Glama
Por qué construí esto (y por qué en español importa)
MCP está en un momento de inflexión. El ecosistema crece exponencialmente pero la inmensa mayoría del contenido, la documentación y las herramientas están en inglés. Busca "servidor MCP tutorial" en español y encontrarás... casi nada. Eso es un problema y una oportunidad.
Estos dos servidores resuelven problemas reales que yo mismo tenía:
Análisis financiero: quería poder preguntarle a Claude sobre cualquier acción o cripto y obtener un análisis técnico real con indicadores calculados, no una respuesta genérica basada en datos de entrenamiento de hace meses. Los datos de mercado cambian cada segundo — necesitas datos en tiempo real para tomar decisiones.
Auditorías web: auditaba webs de clientes manualmente usando 5 herramientas distintas (Google PageSpeed, SecurityHeaders.com, SSL Labs, Screaming Frog...). Ahora es una frase en lenguaje natural y tengo el resultado completo en 3 segundos.
La comunidad hispanohablante de desarrolladores es enorme — somos el segundo idioma más hablado del mundo como lengua nativa. Si estás construyendo herramientas para desarrolladores o trabajando con agentes de IA, MCP es probablemente el protocolo más importante que puedes aprender ahora mismo. Y cuanto antes lo adoptes, mejor posicionado estarás.
Cómo construir tu propio servidor MCP
Si estos dos servidores te inspiran a crear el tuyo, el proceso es más sencillo de lo que parece. Con FastMCP (Python), un servidor básico son ~30 líneas de código:
from fastmcp import FastMCP
mcp = FastMCP("mi-servidor")
@mcp.tool()
def mi_herramienta(parametro: str) -> str:
"""Descripción de lo que hace la herramienta."""
# Tu lógica aquí
return resultado
if __name__ == "__main__":
mcp.run()
Defines funciones con el decorador @mcp.tool(), les pones un docstring descriptivo (esto es lo que el modelo lee para saber cuándo usarla), y FastMCP se encarga del resto: serialización, protocolo, transporte.
El truco está en añadir inteligencia encima de los datos. No te limites a devolver un JSON crudo de una API. Procesa, calcula, interpreta. Eso es lo que hace que un servidor MCP sea realmente útil.
Links
| Recurso | FinanceKit | SiteAudit |
|---|---|---|
| GitHub | financekit-mcp | siteaudit-mcp |
| PyPI | financekit-mcp | siteaudit-mcp |
| MCPize | financekit-mcp | siteaudit-mcp |
| Landing | financekit-mcp | siteaudit-mcp |
Ambos son open source (MIT). PRs bienvenidos.
Estoy preparando más servidores MCP orientados a trading y productividad. Si tienes ideas o quieres colaborar, déjame un comentario o abre un issue en GitHub.
Si este artículo te resultó útil, también escribí la versión en inglés con algunos detalles técnicos adicionales.
Y tú, qué servidores MCP usas o estás construyendo?
Beyond Meta Tags: The SRE’s Guide to Ranking in 2026
14 Apr 2026, 9:08 amWe have been told for years that "Content is King." But in the high-stakes world of 2026, if your infrastructure is sluggish, your king is invisible.
Working at The Good Shell, I’ve spent the last few months analyzing a recurring pattern among high-growth SaaS and Web3 startups: they have world-class frontend talent and aggressive SEO targets, yet their organic growth is stagnant. After auditing several stacks, the diagnosis is almost always the same. It’s not the keywords. It's the "Technical Debt" living in the infrastructure.
If you are a developer or an SRE, this is why your infrastructure is the most powerful SEO tool you have.
1. The Death of the "Static" SEO Mindset
SEO used to be about what was on the page. Now, it’s about how that page is delivered. Google’s crawlers now operate with a strictly optimized "Crawl Budget."
If your server takes 800ms to respond because your K8s ingress is misconfigured or your database queries are unindexed, Googlebot will simply leave. It’s not that your content isn't good—it’s that Google cannot afford the computational cost to wait for your server.
The takeaway: A slow TTFB (Time to First Byte) is an immediate ranking penalty
2. The Hydration Trap in Modern Frameworks
We all love Next.js, Remix, and Nuxt. But "Hydration" is often where SEO goes to die.
When your infrastructure isn't tuned for Streaming SSR (Server-Side Rendering), the browser spends too much time executing JavaScript before the page becomes "Stable." This tanks your CLS (Cumulative Layout Shift) and LCP (Largest Contentful Paint).
At The Good Shell, we recently helped a client move logic from the heavy main server to the Edge. By utilizing Edge Middleware to handle geo-location and A/B testing instead of doing it at the origin, we dropped the LCP by 1.2 seconds. That change alone moved them from the second page of Google to the top 3 spots for their main keywords.
3. Scaling Infrastructure vs. Search Stability
One thing people rarely discuss is how infrastructure instability affects indexation.
Imagine Googlebot crawls your site during a deployment. If your CI/CD pipeline doesn't handle Zero-Downtime Deployments correctly, or if your health checks are too slow to pull a failing pod out of the rotation, the crawler hits a 5xx error.
To Google, a 5xx error isn't just a temporary glitch; it's a signal of unreliability. If it happens twice, your crawl frequency drops.
Pro-tip: Use tools like Prometheus and Grafana not just to monitor "Uptime," but to monitor "Crawl Health." If you see an increase in 4xx/5xx errors coinciding with your deployment windows, your SEO is bleeding.
4. The FinOps of SEO: Efficiency is a Feature
There is a direct correlation between resource efficiency and performance. An over-provisioned, messy Kubernetes cluster is often a slow one.
When we talk about FinOps (Cloud Cost Optimization), we aren't just saving money. We are removing the overhead that adds latency.
Over-instrumentation: Too many sidecars in your service mesh can add micro-latencies that aggregate.
Database Contention: Slow DB responses kill your TTFB.
By cleaning up the architecture, you aren't just lowering the AWS bill; you are giving Googlebot a "green light" to crawl more of your site, faster.
Conclusion: The Bridge
Technical SEO in 2026 is no longer about "tricking" a search engine. It’s about building a bridge between Marketing and SRE.
If you want to stay competitive:
Move logic to the Edge whenever possible.
Audit your TTFB with the same intensity you audit your code.
Bring SREs into the SEO conversation. Infrastructure isn't just a cost center; it's the foundation of your growth strategy. If the foundation is shaky, the skyscraper will never reach the clouds.
I’m curious—how many of you have seen a direct correlation between infrastructure upgrades and organic traffic? Let’s discuss in the comments.
Building Production-Grade Observability: OpenTelemetry + Grafana Stack
14 Apr 2026, 9:05 amStop guessing what's broken in production. Here's a complete, deploy-it-this-week observability stack built on OpenTelemetry and Grafana — the same stack I've deployed for three clients in the last 18 months.
This isn't a toy setup. This is production-grade: traces, metrics, and logs unified under a single pane of glass, with auto-instrumentation for the most common runtimes, alerting that pages on symptoms not causes, and dashboards your non-SRE teammates can actually read.
What you'll build:
OpenTelemetry Collector (gateway mode) for vendor-agnostic telemetry collection
Grafana Tempo for distributed tracing
Prometheus + Grafana Mimir for metrics at scale
Loki for structured log aggregation
Grafana dashboards with pre-built SLO panels
AlertManager rules tied to error budgets
Prerequisites: Kubernetes 1.25+, Helm 3, basic familiarity with YAML. Estimated time: 3–5 hours end to end.
Why OpenTelemetry? The vendor-lock argument settled once and for all
You’ve heard it before: “Just use Datadog.” Then the bill arrives. Or “Use Prometheus alone.” Then you lose traces.
OpenTelemetry (OTel) is the single CNCF standard for generating and exporting telemetry data. Here’s why it wins:
One instrumentation, many backends: Instrument your app once with OTel SDKs. Send to Tempo, Jaeger, Datadog, or New Relic simultaneously.
No vendor lock-in: Your telemetry data remains in your control (S3 for traces, block storage for metrics).
Automatic context propagation: Trace IDs flow seamlessly across services, even across different languages (Java → Python → Node.js).
Future-proof: New backends emerge? Point your OTel Collector there. No code changes.
The bottom line: OTel is the USB-C of observability. Stop writing custom exporters.
Architecture overview: Collector, Backends, Visualization
Here’s what you’re deploying:
[Your App] --(OTLP)--> [OTel Collector (Gateway)] --+--> [Tempo] (traces)
+--> [Mimir] (metrics)
+--> [Loki] (logs)
|
[Grafana] (visualization)
|
[AlertManager] (paging)
OTel Collector (Gateway mode): Receives OTLP from all services. Validates, batches, and routes telemetry. Single ingress point.
Tempo: Object-storage-backed tracing. Cheap, scalable, no indexing costs.
Mimir: Horizontally scalable Prometheus-compatible metrics store.
Loki: Log aggregation with low-cost object storage.
Grafana: Unified UI with Explore, dashboards, and alerting.
AlertManager: Deduplicates, groups, and routes alerts to PagerDuty/Slack.
Storage requirements (minimal): 50GB for Loki, 100GB for Tempo (can use S3/GCS/MinIO), 50GB for Mimir.
Installing the OTel Collector (gateway mode Helm values)
Create otel-collector-values.yaml
mode: deployment # gateway mode (as opposed to daemonset for agent mode)
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
attributes:
actions:
- key: environment
value: production
action: upsert
exporters:
otlp/tempo:
endpoint: "tempo-distributor:4317"
tls:
insecure: true
prometheusremotewrite/mimir:
endpoint: "http://mimir-distributor:8080/api/v1/push"
loki:
endpoint: "http://loki-gateway:3100/loki/api/v1/push"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, attributes]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite/mimir]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
Deploy
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector -f otel-collector-values.yaml
Auto-instrumentation: Java, Python, Node.js, Go
No code changes for traces/metrics/logs. Use OTel's auto-instrumentation agents.
Java (Spring Boot, any JVM app)
ENV JAVA_TOOL_OPTIONS="-javaagent:/otel/opentelemetry-javaagent.jar"
ENV OTEL_SERVICE_NAME=payment-service
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
Python (Django, Flask, FastAPI)
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
otel-instrument \
--service_name checkout-service \
--exporter_otlp_endpoint http://otel-collector:4317 \
python app.py
Node.js (Express, NestJS)
npm install @opentelemetry/auto-instrumentations-node
npx opentelemetry-instrument \
--service_name=api-gateway \
--exporter_otlp_endpoint=http://otel-collector:4317 \
node server.js
Go (manual instrumentation required, but minimal)
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
)
func initTracer() {
exporter, _ := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint("otel-collector:4317"),
otlptracegrpc.WithInsecure())
// ... standard setup (5 lines)
}
Verify: Check Collector logs for TraceID spans.
Deploying Tempo for distributed tracing
Tempo is designed for cost-effective tracing. It stores traces in object storage (S3/MinIO) and indexes only by trace ID.
tempo-values.yaml
tempo:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: minio.minio:9000
access_key: "minioadmin"
secret_key: "minioadmin"
insecure: true
pool:
max_workers: 100
queue_depth: 10000
overrides:
defaults:
ingestion:
rate_limit_bytes: 15000000 # 15MB/s
burst_size_bytes: 20000000
distributor:
config:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
Deploy
helm repo add grafana https://grafana.github.io/helm-charts
helm upgrade --install tempo grafana/tempo -f tempo-values.yaml
Query Tempo from Grafana: Add data source → Tempo → URL: http://tempo-query-frontend:16686
Prometheus + Mimir for long-term metrics storage
Mimir replaces single-instance Prometheus. It provides horizontal scaling, replication, and long-term retention.
mimir-values.yaml
mimir:
structuredConfig:
blocks_storage:
backend: s3
s3:
endpoint: minio.minio:9000
bucket_name: mimir-blocks
access_key_id: "minioadmin"
secret_access_key: "minioadmin"
insecure: true
ingester:
ring:
replication_factor: 3 # for HA
ruler:
rule_path: /data/rules
alertmanager_url: http://alertmanager:9093
ingester:
replicas: 3
distributor:
replicas: 2
querier:
replicas: 2
Deploy
helm upgrade --install mimir grafana/mimir -f mimir-values.yaml
Migrate existing Prometheus data
promtool tsdb create-blocks-from-rules --rules-file=recording-rules.yaml data/
Then point Prometheus remote write to http://mimir-distributor:8080/api/v1/push.
Loki for log aggregation with structured querying
Loki is like Prometheus for logs. It indexes only labels, not full text, making it cheap at scale.
loki-values.yaml
loki:
storage:
type: s3
s3:
endpoint: minio.minio:9000
bucketnames: loki-chunks
access_key_id: "minioadmin"
secret_access_key: "minioadmin"
s3forcepathstyle: true
insecure: true
schemaConfig:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: s3
schema: v12
index:
prefix: loki_index_
period: 24h
limits_config:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_global_streams_per_user: 10000
chunk_store_config:
max_look_back_period: 672h # 28 days
Deploy
helm upgrade --install loki grafana/loki -f loki-values.yaml
Query example (LogQL)
{namespace="production", app="payment-service"} |= "error"
| json
| latency_ms > 500
| line_format "{{.trace_id}} - {{.message}}"
Grafana: Connecting all three data sources
grafana-values.yaml
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus-Mimir
type: prometheus
url: http://mimir-query-frontend:8080/prometheus
access: proxy
isDefault: true
- name: Tempo
type: tempo
url: http://tempo-query-frontend:16686
access: proxy
jsonData:
tracesToLogs:
datasourceUid: 'loki'
tags: ['service.name', 'pod']
serviceMap:
enabled: true
- name: Loki
type: loki
url: http://loki-gateway:3100
access: proxy
jsonData:
derivedFields:
- name: trace_id
matcherRegex: 'trace_id=(\w+)'
url: '$${__value.raw}'
datasourceUid: 'tempo'
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'slo'
orgId: 1
folder: 'SLO Dashboards'
type: file
options:
path: /var/lib/grafana/dashboards
Deploy
helm upgrade --install grafana grafana/grafana -f grafana-values.yaml
Test correlation: In Loki, find a log with trace_id=abc123. Click it → jumps to Tempo trace. In Tempo, see affected service → jumps to Mimir metrics for that service.
Building your first SLO dashboard (template included)
Save as slo-dashboard.json and mount into Grafana
{
"title": "SLO Dashboard - Payment Service",
"panels": [
{
"title": "Availability (30d SLI)",
"targets": [{
"expr": "sum(rate(http_requests_total{status!~'5..'}[$__range])) / sum(rate(http_requests_total[$__range]))",
"legendFormat": "Availability SLI"
}],
"thresholds": [
{"color": "red", "value": null, "op": "lt", "valueType": "absolute", "value": 0.995},
{"color": "yellow", "value": null, "op": "lt", "valueType": "absolute", "value": 0.999},
{"color": "green", "value": null, "op": "gte", "valueType": "absolute", "value": 0.999}
]
},
{
"title": "Error Budget Remaining (30d)",
"targets": [{
"expr": "(1 - (sum(rate(http_requests_total{status=~'5..'}[30d])) / sum(rate(http_requests_total[30d])))) / 0.999",
"legendFormat": "Budget remaining"
}],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"min": 0,
"max": 1,
"color": {"mode": "thresholds"},
"thresholds": [
{"color": "red", "value": null, "op": "lt", "value": 0.7},
{"color": "yellow", "value": null, "op": "lt", "value": 0.9},
{"color": "green", "value": null, "op": "gte", "value": 0.9}
]
}
}
},
{
"title": "Latency P99 (30d SLI)",
"targets": [{
"expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[$__range])) by (le))",
"legendFormat": "P99 latency"
}]
}
]
}
SLO math explained
Availability target: 99.9% → error budget = 0.1% of requests can fail.
Budget remaining: (actual_availability - target) / (1 - target) → 1.0 means on track, 0 means exhausted.
AlertManager: Alerting on symptoms, not causes
Bad alert: "CPU on pod payment-7d8f9 is 92%" (cause)
Good alert: "Payment service error budget exhausted" (symptom)
alertmanager-config.yaml
route:
group_by: ['alertname', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'pagerduty-critical'
routes:
- match:
severity: critical
receiver: pagerduty-critical
continue: false
- match:
severity: warning
receiver: slack-warnings
receivers:
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: <your-pd-key>
severity: critical
- name: 'slack-warnings'
slack_configs:
- api_url: <webhook>
channel: '#alerts-warning'
Prometheus alerting rule example (slo-alerts.yaml)
groups:
- name: slo
rules:
- alert: ErrorBudgetExhausted
expr: |
(1 - (sum(rate(http_requests_total{status=~"5.."}[30d]))
/ sum(rate(http_requests_total[30d])))) / 0.999 < 0.2
for: 5m
labels:
severity: critical
service: "{{$labels.service}}"
annotations:
summary: "Error budget for {{$labels.service}} is below 20%"
description: "Remaining budget: {{$value | humanizePercentage}}"
Deploy
kubectl create configmap alertmanager-config --from-file=alertmanager.yaml=alertmanager-config.yaml
helm upgrade --install prometheus prometheus-community/prometheus \
--set alertmanager.enabled=true \
--set alertmanager.configFromSecret=alertmanager-config
The 3 dashboards every on-call engineer needs
Stop building 50-panel dashboards. Start with these three.
Dashboard 1: Service Health (RED method)
Rate (requests per second) per endpoint
Errors (5xx rate, grouped by status code)
Duration (P50, P95, P99 latency)
Saturation (CPU/memory per pod, queue depth)
PromQL snippets
# Rate
sum(rate(http_requests_total[1m])) by (service, endpoint)
# Error ratio
sum(rate(http_requests_total{status=~"5.."}[1m])) / sum(rate(http_requests_total[1m]))
# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
Dashboard 2: Trace Explorer
Top 10 slowest traces in last hour
Trace heatmap (duration vs. timestamp)
Service dependency graph (from Tempo service graph)
High-error traces panel (filter by status.error=true)
Dashboard 3: The "Burndown" Chart
Error budget remaining (daily trend line)
SLO burn rate (1h, 6h, 24h windows)
Multi-burn alert status (green/yellow/red)
Top offending services by error budget consumption
Why this works: On-call opens Dashboard 1 → sees elevated latency → clicks a trace in Dashboard 2 → finds slow database query → checks Dashboard 3 to decide if paging SREs is urgent.
Final checklist for production readiness
Before you sleep soundly:
Ingestion testing: curl a test span/metric/log through the Collector.
Retention: Set Mimir 30d, Tempo 14d, Loki 30d (adjust to compliance).
Auth: Add Grafana OAuth (Google/GitHub) and basic auth for Mimir/Loki ingesters.
Backups: Object storage (MinIO/S3) should have versioning enabled.
Alert testing: Silence a service, verify PagerDuty gets the page.
Runbook: Link each alert to a Confluence doc (e.g., "ErrorBudgetExhausted → https://wiki/runbooks/slo").
What’s next? Add OpenTelemetry for your database (PostgreSQL, Redis, MongoDB) using OTel collector receivers. Or add synthetic monitoring with Blackbox exporter.
You now have the same stack that cost my clients $0/month (excluding storage) instead of $15k/month for Datadog. Ship it.
Your Cron Jobs Are Silently Failing. Here’s How to Know in 30 Seconds.
14 Apr 2026, 9:00 am"My database backup script broke 11 days before I found out. Credentials got rotated, pg_dump started erroring, and cron just kept running it on schedule like nothing was wrong. No email. No alert. Eleven days of no backups. I only found out because I needed to restore something."
Sound familiar? If you've run cron jobs in production, you've probably been here.
Cron doesn't know your job failed
This is the part that gets people. Cron's job is to start your command at the time you told it to. That's it. If the command exits 1, cron doesn't care. If it hangs forever, cron doesn't care. If the server reboots and the cron daemon doesn't come back up, nobody cares.
You find out when:
- A customer asks why their data is stale
- A queue fills up because the consumer job stopped
- You manually check a dashboard and notice the last run was 9 days ago
The fix is one line
After your job completes successfully, ping an external URL. If the ping stops arriving, you get alerted. That's the whole idea.
# before
0 2 * * * /scripts/backup-db.sh
# after
0 2 * * * /scripts/backup-db.sh && curl -fsS https://cronsignal.io/ping/abc123
The && means the curl only fires if the script exits 0. Script fails? No ping. Script hangs? No ping. Server goes down? No ping. In all cases, you hear about it.
I built CronSignal for this because I wanted something stupid simple. You create a check, tell it how often to expect a ping, and add the curl. If the ping is late, it hits you on email, Slack, Discord, Telegram, or webhook. Setup is maybe 30 seconds.
3 monitors are free. If you need more, it's $5/month flat for unlimited. No per-monitor pricing nonsense.
GitHub Actions has its own version of this problem
If you use schedule triggers in GitHub Actions, you've probably noticed they're... unreliable. GitHub can delay scheduled runs by minutes or hours. If your repo goes 60 days without a push, GitHub silently disables the schedule entirely. No warning.
I made a GitHub Action for this:
name: Nightly Build
on:
schedule:
- cron: '0 2 * * *'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm run build
- run: npm test
- name: Ping CronSignal
if: success()
uses: CronSignal/ping-action@v1
with:
check-id: ${{ secrets.CRONSIGNAL_CHECK_ID }}
If the workflow gets delayed, skipped, or disabled, you know about it the same day instead of weeks later.
Works with basically anything
The curl pattern works anywhere. Not just crontab:
-
systemd timers —
ExecStartPostdirective - Kubernetes CronJobs — final container step
-
Laravel —
$schedule->command('...')->after(function() { Http::get('...'); }) -
Django/Celery —
requests.get()at the end of your task -
Node —
fetch()call after your logic
The point is: don't monitor the scheduler. Monitor whether the job actually finished. Those are different things.
Anyway
If you're running cron jobs without monitoring, you're going to have a bad time eventually. The fix is one curl command and 30 seconds of setup.
CronSignal if you want to try it. Or use any heartbeat monitoring service — Healthchecks.io, Dead Man's Snitch, whatever. Just use something. The && curl pattern works the same regardless.
PHP-FPM, workers and goroutines: what actually happens under load
14 Apr 2026, 9:00 amThe API runs on a 4 GB RAM VPS, Nginx in front, PHP-FPM configured with 50 workers. A traffic spike — nothing exceptional, a marketing campaign — and in 8 seconds the pool is saturated. CPU at 30%, server down. Monitoring showing 502s and 504s in bursts. The bottleneck wasn't the CPU. It was RAM and the exhausted pool.
Six months later, migration to Go. Not out of hype, but because we understood the model. The difference in behavior under load wasn't about raw speed — it was about how each runtime handles concurrency.
Note: if you're looking for a general language comparison, I wrote a dedicated article. This one goes one level deeper: the mechanics.
The PHP-FPM model in one picture
Nginx (or Apache) acts as a reverse proxy: it handles TLS, serves static files, buffers incoming requests. It doesn't run PHP. PHP-FPM maintains a pool of forked OS processes ready to execute PHP.
The flow is simple:
client → Nginx → PHP-FPM queue → [worker pool]
↑
if pool full: queue → timeout → 502/504
Each worker is an independent OS process. It consumes between 30 and 60 MB of RAM at minimum, depending on what the application loads into memory. This memory is not shared between workers — each has its own memory space. The pool is sized at configuration time, not at actual load.
Saturation isn't a bug. It's the expected behavior of the model.
[www]
pm = dynamic
pm.max_children = 50 ; 50 workers × 50 MB = 2.5 GB RAM reserved
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.max_requests = 500 ; Recycle workers after 500 req (avoids memory leaks)
With this configuration on a 4 GB VPS, PHP-FPM can consume up to 2.5 GB of RAM before the application even starts actually processing data.
What happens when the pool is full
Exact sequence: a request arrives, Nginx buffers it, PHP-FPM tries to assign a worker. If pm.max_children is reached, the request waits in queue. If the queue is full or fastcgi_read_timeout expires, Nginx returns a 504. Clients see an error. The CPU, meanwhile, is idle.
The math is brutal: 50 workers × 50 MB = 2.5 GB of RAM consumed just for the PHP pool, before logs, cache, Nginx itself. Concurrency is bounded by RAM, not by compute power.
This model has a direct consequence on persistent connections. An SSE or WebSocket keeps a worker occupied for its entire lifetime. 50 simultaneous SSE connections = 50 blocked workers = pool saturated for everything else.
ps --no-headers -o rss -C php-fpm | awk '{sum+=$1} END {print sum/1024 " MB"}'
This command gives the actual RAM consumption of all php-fpm processes in production. Useful to run before sizing the pool.
The Go model — goroutines and M:N scheduling
Go doesn't fork processes. It doesn't maintain a thread pool. Its runtime implements an M:N scheduler: N goroutines multiplexed onto M OS threads, where M corresponds to GOMAXPROCS (defaults to the number of cores).
A goroutine starts with a stack of 8 KB — versus around 8 MB for an OS thread. This stack grows dynamically if needed, but stays lightweight while the goroutine is blocked on I/O. The runtime parks it and uses the OS thread for something else.
With net/http, each incoming connection spawns a goroutine. 10,000 simultaneous connections ≈ 80 MB of goroutine stacks. On a PHP-FPM setup with 50 workers, you'd be at 2.5 GB and drowning in 502s long before that.
There's no PHP-FPM, no worker pool. The Go binary is the server. Nginx can still sit in front for TLS, compression, static file caching, rate limiting — but not to manage application concurrency.
On goroutine lifecycle and leak risks, I detailed the patterns to avoid in the article Goroutine leaks in Go: detect, understand, fix.
package main
import (
"log"
"net/http"
"time"
)
func handler(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("OK"))
}
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/", handler)
srv := &http.Server{
Addr: ":8080",
Handler: mux,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
}
log.Println("Listening on :8080")
log.Fatal(srv.ListenAndServe())
}
Server timeouts are non-negotiable in production. Without ReadTimeout, a slow client can hold a connection open indefinitely, and the associated goroutine never gets released.
Degradation under load — compared behavior
PHP-FPM: cliff-edge degradation
Below max_children, everything works normally. Beyond it: queue, timeout, 502. Degradation is binary — the service responds or it doesn't. No middle ground, no latency that gradually climbs. The server goes from "operational" to "erroring" in a few seconds.
Go: progressive degradation
Goroutines accumulate in memory. Latency climbs linearly. There's no "pool full" — as long as RAM allows, Go keeps accepting connections. With context.WithTimeout correctly propagated, slow requests release their goroutines cleanly on expiry.
Without timeout — potentially orphaned goroutine:
func handler(w http.ResponseWriter, r *http.Request) {
result := fetchFromDB() // can take 30 seconds
w.Write(result)
// if the client disconnects, the goroutine keeps running
}
With context timeout — clean release:
func handler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
defer cancel()
result, err := fetchFromDBCtx(ctx)
if err != nil {
http.Error(w, "timeout", http.StatusGatewayTimeout)
return
}
w.Write(result)
// when the client disconnects, r.Context() is cancelled → ctx too → fetchFromDBCtx exits cleanly
}
The difference in behavior under load between the two models often comes down to this context propagation. In Go, a goroutine that isn't properly anchored to a context becomes a goroutine leak — invisible until RAM saturates.
Operational consequences
The choice isn't "PHP is slow, Go is fast". It's a question of fit between the concurrency model and the project's constraints.
PHP + Nginx wins when:
- Shared hosting, CMS (WordPress, Drupal), Composer ecosystem already in place
- Traffic below ~1,000 req/min, existing PHP team, legacy code
- FTP or simple git push deployment without system access
Go wins when:
- High-frequency APIs (> 10,000 req/min), WebSockets, SSE at volume
- Constrained VPS budget: 512 MB of RAM can sustain thousands of lightweight Go connections
- Low-footprint microservices, single binary to deploy
For the concrete case of SSE with PHP and the necessary workarounds, I detailed the approach in the article SSE, PHP-FPM and chatbox: working with workers.
Criterion
PHP + Nginx/FPM
Go (net/http)
Concurrency unit
OS process (~50 MB)
Goroutine (~8 KB)
Natural limit
Pool size (RAM)
Total RAM (progressive degradation)
Behavior at saturation
Queue + timeout + 502
Latency climbs, connections held
Persistent connections (SSE, WS)
1 blocked worker
1 sleeping goroutine (~8 KB)
2 GB RAM VPS
~30-40 workers max
~100k lightweight connections
Deployment
FTP, shared hosting, CMS ready
Single binary, systemd
Shared hosting
Yes (everywhere)
No (VPS minimum)
CMS/libs ecosystem
Huge
Minimal on the classic web side
Best for
Sites, CMS, API < 1k req/min
High-frequency API, real-time, microservices
Conclusion
Most Go migrations I've seen — or done — start from a bad surprise with PHP-FPM under load. A surprise that was avoidable with a load test upfront, before going to production.
PHP-FPM is robust and predictable. Its only real flaw: it's opaque until the moment the pool is full. No gradual warning, no graceful degradation. Once you understand this mechanic, you choose with full awareness — and you often stay on PHP, just better sized.
The right tool isn't the one that holds up best. It's the one whose breaking point you understand.
Building a Monetized API (Part 2 of 4)
14 Apr 2026, 9:00 amThis is Part 2 of the "Building a Monetized API" series. In Part 1, we set up the Zuplo API gateway for our Vercel-hosted changelog API, imported endpoints, added authentication, and configured rate limiting. In this post, we're adding the monetization layer on top of that.
Setting Up Meters and Features
Everything starts in the Zuplo monetization service. Before you create any plans, you need to define what you're actually charging for. That means setting up meters (the things you count) and features (the things customers get access to).
For our changelog API, we're metering API requests and gating access to an MCP server (which we'll add in Part 3).
Create a Meter
Go to Services > Monetization Service in your Zuplo project. Add a blank meter and call it requests, with an event name of requests. This meter will count every API request that gets made.
Define Features
Features map to what shows up on your pricing table. Not all features work the same way. Some are metered (counted against a usage limit), some are boolean (on or off), and some are purely for display on the pricing page. We need three:
Requests: linked to the requests meter. This is the usage-based feature that gets counted against a limit. When a subscriber makes an API call, this feature's counter increments.
MCP Server: not linked to any meter. This is a boolean feature: you either have access or you don't. Plans can grant or deny it.
Monthly Fee: not linked to any meter. This represents the flat subscription cost on paid plans. It shows the price on the pricing table but doesn't gate anything.
Creating the Plans
With meters and features in place, you can start defining plans. We're building three: Free, Starter, and Pro.
Free Plan
Add a new plan called free with monthly billing. This is a single-phase plan that runs indefinitely (or until the subscriber cancels).
Add the requests feature to the phase with a free pricing model. Set the usage limit to 20. That's intentionally low for demonstration purposes, but it shows how hard limits work on free plans. Once a free user hits 20 requests in a billing period, they're cut off.
Starter Plan ($29.99/month)
The Starter plan introduces two concepts: a flat monthly fee and graduated overage pricing.
First, add the monthly fee feature as a flat fee of $29.99, paid in advance once per month.
Then add the requests feature using a tiered pricing model with graduated mode. Set up two tiers:
- First 5,000 requests: $0 (included in the monthly fee)
- 5,001 to unlimited: $0.10 per request
Set the usage limit to 5,000 and toggle on soft limit. This is important. A soft limit means that when a subscriber hits 5,000 requests, access doesn't stop. Instead, every request beyond 5,000 gets billed at $0.10 each. At the end
of the billing cycle, Stripe charges the subscriber for $29.99 plus whatever overage they incurred.
Finally, add the MCP server feature with a free pricing model and a boolean entitlement set to true. Starter subscribers get MCP server access included in their monthly fee.
Pro Plan ($99.99/month)
The Pro plan follows the exact same setup steps as Starter, just with different values: $99.99/month, 50,000 included requests, and $0.01 per request overage (also with a soft limit). MCP server access is included. There's nothing new to configure here. If you set up the Starter plan, you already know how to do this one.
Publish and Reorder
Once all three plans are created as drafts, reorder them in the pricing table so they display as Free, Starter, then Pro. Then publish them. These become the live plans available on your developer portal.
If you want to change plans later, you create new drafts and publish them when ready. They'll replace the existing versions or sit alongside them as additional options.
💡 Monetization Plans and Pricing — Details on phases, free trials, pricing models, and advanced plan configuration.
Connecting Stripe
Before monetization can work end to end, you need a payment provider. Go to the monetization service settings and configure your Stripe integration by pasting in your Stripe secret key. You can find this in the Stripe Dashboard under Developers >
API keys.
Use your Stripe test key (sk_test_...) during development. The sandbox environment lets you simulate the full checkout flow with fake credit card numbers from the Stripe documentation. Swap to your production key when you go live.
💡 Stripe Integration — Full setup instructions for connecting Stripe to the Zuplo monetization service.
Adding the Monetization Policy to Endpoints
With plans, meters, and Stripe configured, you still need to tell your endpoints to actually meter requests. This is done with the monetization inbound policy.
The monetization policy does double duty. It handles both API key authentication and request metering in a single policy, so you don't need a separate API key auth policy anymore. If you had one set up previously (like we did in Part 1 for testing), remove it.
The policy configuration is straightforward. The only thing you need to specify is the meter name and the increment value:
{
"handler": {
"export": "MonetizationInboundPolicy",
"module": "$import(@zuplo/runtime)",
"options": {
"meters": {
"requests": 1
}
}
}
}
Apply this policy to all your endpoints. In the policy chain, make sure monetization comes first (before rate limiting and any other policies) since it needs to authenticate the API key and meter the request before anything else happens.
💡 Monetization Policy — Full policy reference including metering options, caching, and advanced configuration.
Enabling Monetization on the Developer Portal
The developer portal doesn't know about monetization by default. You need to add the monetization plugin to your Zudoku configuration.
In your project's docs folder, open the Zudoku config file and import the monetization plugin:
import { zuploMonetizationPlugin } from "@zuplo/zudoku-plugin-monetization";
Then add it to your plugins array:
plugins: [
// ...other plugins
zuploMonetizationPlugin(),
],
Once saved, the developer portal pulls in all the plan data, pricing tables, and subscription management UI from the monetization service automatically.
💡 Developer Portal Setup — Full setup instructions for enabling monetization in the developer portal.
Testing the Full Flow
With everything wired up, the developer portal now shows a pricing table with all three plans. Here's what the subscriber experience looks like:
Sign up and subscribe. A new user logs in (or signs up) on the developer portal. They see the pricing table and can select a plan. Subscribing to the free plan skips Stripe checkout entirely. Paid plans redirect to a Stripe checkout page.
Get API keys. After subscribing, the user lands on a "My Subscriptions" page with their subscription details, usage analytics, and API keys. Keys are provisioned automatically and can be rolled or deleted from this page.
Make requests. API keys work immediately. The user can test directly from the interactive API reference in the developer portal, where their key is pre-populated in the auth dropdown.
Track usage. Every request increments the meter. The subscription page shows real-time usage against the plan's limit. On the free plan with a hard limit of 20, the 21st request gets blocked. On paid plans with soft limits, requests beyond the included amount get billed as overage.
Upgrade plans. Subscribers can switch plans from the subscription management page. Upgrading to a paid plan triggers Stripe checkout. Downgrading is available too. Plan changes take effect immediately, and the previous plan shows
as expired in the subscription history.
View subscribers. Back in the Zuplo monetization service, the subscribers table shows every customer, their subscription history, and their current plan status.
What we built in Part 2
At this point, the monetization layer is fully wired up:
- [x] Meters tracking every API request
- [x] Three plans (Free, Starter, Pro) with hard and soft limits
- [x] Stripe connected for checkout and billing
- [x] Monetization policy replacing the standalone API key auth
- [x] Developer portal with a self-serve pricing table and subscription management
- [x] End-to-end flow tested: sign up, subscribe, get keys, make requests, track usage
Everything from Part 1 (origin auth, consumer isolation, rate limiting) still works. The monetization policy took over API key authentication, but the custom header policy that sets the gateway secret and consumer ID didn't need to change at all.
What's Next
In Part 3, we'll add an MCP server to the project and write custom code to feature-gate it so that only paid subscribers can access it. After that, in Part 4, we'll polish the developer portal to make it look production-ready.
IPv6 Is “The Future of the Internet” — So Why Did It Break My Streaming App in 2024?
14 Apr 2026, 8:59 amA personal debugging incident that turned into an industry-wide infrastructure audit.
Last week I spent 45-50 minutes convinced my LG WebOS TV or my ISP had quietly broken something. JioHotstar — India's dominant streaming platform — was refusing to play anything. Every title. Every time. Error code DR-6006_X: "We are having trouble playing this video right now."
I did what everyone does. Restarted the router. Restarted the TV. Unplugged everything and waited. Reinstalled the app. Nothing changed, because none of that was the problem.
The fix, once I found it, took ten seconds: I forced my LG TV to use IPv4 directly from the TV's own network settings — leaving my router free to run IPv6 for every other device on the network. JioHotstar worked immediately.
That's a cleaner fix than it sounds. The router doesn't lose IPv6. Your phone, laptop, and other devices are unaffected. Only the TV talks IPv4. But the real question isn't how I fixed it — it's why this broke in the first place, and what it says about where the industry actually stands on IPv6 readiness in 2024.
The short answer: not as far along as anyone wants to admit.
What Actually Failed — and Why Restarting Never Would Have Fixed It
To understand the failure, you need to understand what happens when a smart TV tries to play protected streaming content.
When your LG TV connects to JioHotstar, it doesn't just fetch a video file. It first resolves DNS to locate the platform's servers, negotiates a session, contacts a DRM (Digital Rights Management) license server to verify you're entitled to watch the content, receives a cryptographic key, and then begins streaming. The DR-6006_X error code sits in that DRM handshake layer — not in the video delivery itself. The content never starts because the license exchange never completes.
Here's where IPv6 enters. Modern home routers run what's called a dual-stack configuration — both IPv4 and IPv6 simultaneously. When a device makes a DNS query, it typically receives both A records (IPv4 addresses) and AAAA records (IPv6 addresses). Devices are supposed to implement a mechanism called Happy Eyeballs (RFC 8305) — racing both connection types and falling back gracefully if one fails.
LG's WebOS, based on observed behavior, does not implement this fallback reliably. It preferentially routes traffic over IPv6 and appears to fail silently when that path encounters a problem. Since that preference persists on every reconnection, restarting the router or TV changes nothing — you reconnect over the same path every single time.
The most likely explanation for the failure, based on symptoms and error behavior, is that some part of the playback stack — whether DRM license delivery, CDN routing, or session token validation — doesn't handle IPv6 connections reliably in certain network configurations. I can't confirm exactly where the chain breaks without packet-level access to both sides. But the fix was consistent, repeatable, and immediate — which points clearly at the transport layer, not the content or the account.
This Isn't Unique to One Platform. It's an Industry-Wide Pattern.
What makes this incident worth writing about is that it isn't unusual. IPv6 compatibility failures in streaming and connected devices follow a remarkably consistent pattern across the industry.
Streaming platforms broadly have CDN routing behavior that differs meaningfully between IPv4 and IPv6. CDN providers maintain separate peering agreements for IPv6 traffic, and edge node coverage isn't uniform — a regional PoP (Point of Presence) may have IPv6 routes that are technically announced but practically unreliable in certain geographies. Users on these paths see buffering on fast connections, or quality adaptation that behaves erratically — symptoms almost impossible to attribute to IP version without infrastructure-level visibility.
Some smart home devices — cameras, doorbells, smart speakers — are quietly problematic on IPv6-preferred networks. Most embedded firmware was written assuming IPv4. Device discovery protocols like mDNS and SSDP behave differently in dual-stack environments, and the majority of IoT vendors have never included IPv6-preferred configurations in their QA test matrix. The result is intermittent connectivity that looks exactly like hardware failure or ISP instability.
Enterprise SaaS applications carry a specific class of IPv6 bug: session token validation tied to IP address. Several categories of HR, ERP, and authentication platforms were built when binding a session to an IPv4 address seemed like reasonable security practice. In dual-stack environments, where the same user can appear at different addresses during a session depending on which path the OS chooses, this breaks authentication flows in ways that are genuinely hard to reproduce and diagnose.
The pattern is consistent: the application works, the network works, but the intersection of a modern network configuration and legacy application assumptions produces a failure that looks random from the outside.
Why the Industry Keeps Deprioritizing This — An Honest Analysis
The economic reasoning behind IPv6 neglect is worth understanding clearly, because it explains why this problem persists despite being well-known.
"It works on IPv4 — what's the business case?" This is the dominant internal conversation at most product companies, and it's genuinely hard to argue against on a quarterly basis. IPv4 still functions. Most users are still on IPv4-dominant configurations. IPv6 failures are intermittent, hard to reproduce in standard QA environments, and — most importantly — users blame their ISP or their device, not the platform. The error rate doesn't surface in dashboards as an IPv6 problem. It shows up as generic playback failures, support tickets, or quietly churned users. The platform never sees the root cause.
Third-party dependency chains are real. DRM systems are not built in-house. Streaming platforms rely on Widevine (Google), FairPlay (Apple), and PlayReady (Microsoft) licensing infrastructure. If any component in that chain — license delivery endpoints, session APIs, token validation services — doesn't fully support IPv6, the platform inherits that limitation regardless of how well their own code handles it. Fixing it means waiting on vendor roadmaps.
CDN IPv6 support is uneven at the edge. Major providers like Akamai, Cloudflare, and AWS CloudFront have strong IPv6 support at their primary nodes. But regional edge coverage is not uniform — particularly in markets like India, Southeast Asia, and parts of Africa. IPv6 route announcements can be technically active while practically unreliable, creating what networking engineers call "black hole routes." Traffic arrives at the edge and disappears. This is invisible unless you're monitoring IPv6 path performance as a separate metric from IPv4.
QA environments default to IPv4. This is arguably the most systemic issue of all. Most developer laptops, staging environments, and CI/CD pipelines run on IPv4. IPv6 failures are never surfaced in development because the development environment can't produce them. By the time the code reaches production users with IPv6-preferred home networks, the bug has been shipped, tested against, and forgotten.
What IPv6 Readiness Actually Looks Like in Practice
For engineering and infrastructure teams, the baseline is:
- Add IPv6 explicitly to your QA matrix. Run a staging environment on an IPv6-preferred network. Test every authentication flow, every DRM handshake, every CDN segment request against both stacks — independently and together.
- Audit your third-party dependencies. Your DRM vendor, CDN configuration, session management layer, analytics endpoints, and error reporting infrastructure. One IPv4-only dependency can silently break the entire user flow.
- Instrument by IP version. Your observability stack should tag requests by IP version so you can see IPv6 error rates as a distinct signal — not buried inside aggregate failure rates where it's invisible.
- Don't trust OS-level fallback on smart TV platforms. WebOS, Tizen, Android TV, and FireOS all handle Happy Eyeballs differently. Build explicit connection retry logic with IP version awareness into your client applications rather than assuming the platform handles it correctly.
For end-users dealing with this today:
- The cleanest fix is to force IPv4 directly in your TV's network settings rather than disabling IPv6 on the router. This keeps your router and all other devices on IPv6 — only the TV talks IPv4. No network-wide compromise needed.
- If your TV doesn't expose IP version settings directly, creating a separate SSID with IPv6 disabled for smart TVs and IoT devices is the next best option.
- If you're on a mesh network (Eero, Google Nest, Orbi), check whether IPv6 is enabled by default in the admin panel — many ship with it on, and most don't advertise it clearly.
The Bigger Picture
IPv6 was standardized in 1998. IPv4 address exhaustion has been a formally declared crisis since 2011. In 2024, a user on a modern home network running the protocol the industry has called "the future" for two decades can hit silent, inexplicable streaming failures — and the standard advice is still "restart your router."
This isn't a failure of any single company. It's the accumulated result of thousands of individually rational decisions — by platform teams, CDN vendors, device manufacturers, and DRM providers — to defer IPv6 readiness because IPv4 still works for most users most of the time.
The problem with "most users most of the time" is that it's actively changing. Jio, Airtel, and BSNL in India are all accelerating IPv6 deployment. The population of users on IPv6-preferred networks is growing faster than the industry is closing the compatibility gaps. And because these failures are invisible in aggregate metrics — they look like ISP problems, device problems, anything but platform problems — there's no forcing function to fix them.
The 45 minutes I spent debugging my TV is trivial. Multiplied across millions of users who never find the fix, it's churn, eroded trust, and support volume that gets categorized incorrectly and never traced back to its root cause.
IPv6 readiness is no longer a future concern for streaming platforms, IoT vendors, and enterprise software teams. It is a present-tense gap that the industry's standard testing practices are structurally incapable of detecting.
The router restart won't fix it. The QA matrix needs to.
Have you hit IPv6 compatibility issues on streaming platforms or connected devices? I'd be genuinely interested in what you found — drop it in the comments below.
Composition vs Compound Components in React
14 Apr 2026, 8:58 amIn React, reusable UI usually follows two patterns: composition components and compound components. They are closely related, but they differ mainly in how state is handled and how the API is exposed.
Composition Components (no Context)
This is the most straightforward approach. You build UI by combining components and passing data through props.
Everything is explicit: each component receives what it needs directly from its parent.
This makes the system very flexible and easy to reuse, since every part is independent. You can take a subcomponent and use it anywhere without constraints.
The downside is that as the UI grows, you often end up with prop drilling and a lot of wiring code. Sharing behavior between related parts also becomes harder because there is no shared state mechanism.
In short: simple, explicit, and flexible, but can become verbose in complex UIs.
import Airbnb from "./Airbnb";
<Airbnb.Card>
<Airbnb.ImageWrapper>
<Airbnb.Image src={el.image} alt={el.title} />
<Airbnb.Heart like={el.isFavorite} onClick={() => {}} />
<Airbnb.Recomendation />
</Airbnb.ImageWrapper>
<Airbnb.Content>
<Airbnb.Title title={el.title}>
<Airbnb.Valoration average={el.average} reviews={el.reviews} />
</Airbnb.Title>
<Airbnb.Description description={el.description} />
<Airbnb.Price price={el.price} nights={el.nights} />
<Airbnb.Cancelation />
</Airbnb.Content>
</Airbnb.Card>;
👉 Try it in practice: Composition Components Challenge
Compound Components (with Context)
This pattern introduces shared state using React Context. Instead of passing props through every level, a parent component provides state and child components consume it directly.
This creates a tighter relationship between components. From the outside, the API becomes very clean because users don’t need to manage all the props manually.
It works especially well for structured UI systems where components are meant to work together (like cards, modals, tabs, etc.).
The trade-off is that the relationship becomes implicit. Subcomponents depend on a parent provider and are not really meant to be used in isolation. Debugging can also be slightly less obvious because the data flow is hidden inside Context.
In short: more structured, cleaner API, but less flexible and more coupled.
import Airbnb from "./Airbnb";
<Airbnb.Card data={el}>
<Airbnb.ImageWrapper>
<Airbnb.Image />
<Airbnb.Heart />
<Airbnb.Recomendation />
</Airbnb.ImageWrapper>
<Airbnb.Content>
<Airbnb.Title>
<Airbnb.Valoration />
</Airbnb.Title>
<Airbnb.Description />
<Airbnb.Price />
<Airbnb.Cancelation />
</Airbnb.Content>
</Airbnb.Card>;
👉 Try it in practice: Compound Components Challenge
Export styles
There are two common ways to expose these components.
Named exports give maximum flexibility and are easy to tree-shake, but they don’t communicate relationships between components very well.
import { Card, ImageWrapper, Image } from "./Airbnb";
<Card data={el}>
<ImageWrapper>
<Image />
</ImageWrapper>
</Card>;
Namespaced exports group everything under a single object, which makes the structure much clearer and is often preferred for compound components or design systems. The trade-off is slightly more verbosity and sometimes less optimal tree-shaking.
import Airbnb from "./Airbnb";
<Airbnb.Card data={el}>
<Airbnb.ImageWrapper>
<Airbnb.Image />
</Airbnb.ImageWrapper>
</Airbnb.Card>;
When to use each
Use composition when components are independent and you want full flexibility with explicit data flow.
Use compound components when the UI pieces are strongly related, share state, and benefit from a controlled and clean API.
Summary
Composition is about flexibility and explicitness.
Compound components are about structure and shared state.
Both are just different ways of organizing the same idea: composition in React.
AGENTS.md Is Not Enough: Building Project Memory for AI Coding Agents
14 Apr 2026, 8:56 amMost AI coding workflows still treat project context as chat
state.
That works until you switch tools, start a new session, or try to
keep multiple agent-specific files aligned:
AGENTS.mdCLAUDE.md.cursorrulesGEMINI.md- Copilot instructions
- separate MCP configs
Those files drift. New sessions start from zero. The project keeps
learning, but the next agent never sees it.
I built agentsge to move project intelligence into the
repository itself.
- Site: https://agents.ge
- Repo: https://github.com/larsen66/agentsge
- npm: https://www.npmjs.com/package/agentsge
## The problem
Most teams using AI coding tools are quietly accumulating the same
failure mode:
- One tool gets updated instructions.
- Another tool still reads old rules.
- Session knowledge stays trapped in chat history.
- A new agent has to rediscover the same architecture, conventions, and hidden constraints again.
In practice, the repo ends up with fragmented context.
You might have one file telling Claude Code how to behave, another
file for Cursor, a third for Copilot, and no durable place for
“this is how this project actually works”.
The repo has code.
The team has tacit knowledge.
The agents get a partial, drifting copy of both.
## What I wanted instead
I wanted a simple primitive:
- one repository-owned memory layer
- one place for durable rules and project knowledge
- one source of truth that multiple agent tools can inherit
That became:
-
AGENTS.mdas the entrypoint -
.agents/as the durable project memory
The core model is:
AGENTS.mdsays what to do..agents/remembers what the
project learned.
## What agentsge does
agentsge is an open-source CLI that makes a repository agent-
ready.
Run:
bash
npx agentsge init
It creates a versioned .agents/ directory and scaffolds the repo
for AI agent onboarding.
Typical structure:
.agents/
config.yaml
rules/
_capture.md
knowledge/
_index.md
architecture/
patterns/
lessons/
conventions/
dependencies/
skills/
mcp/
config.yaml
This gives the repo a place to store:
- project metadata
- mandatory rules
- architecture decisions
- recurring implementation patterns
- bug lessons
- team conventions
- MCP definitions
All in markdown and YAML, stored in git.
## Why not just use AGENTS.md?
AGENTS.md is useful, but by itself it is not enough.
It solves entrypoint instructions.
It does not solve durable project memory.
A single markdown file is fine for:
- “read these rules”
- “start here”
- “follow this workflow”
It is much worse for:
- evolving architecture knowledge
- accumulating lessons from bugs
- keeping typed project memory organized
- syncing multiple agent surfaces over time
A repo needs both:
- a front door
- a memory layer
AGENTS.md is the front door.
.agents/ is the memory layer.
## Why not just keep everything in README?
Because README has a different job.
README should explain the project to humans at a broad level.
Project memory is operational and specific.
Examples of things that belong in durable project knowledge but
not in README:
- “This subsystem looks independent, but breaks if env X is
missing”
- “The bug looked like a frontend issue, but the real cause was a
backend cache race”
- “We always extend this adapter instead of writing directly to
that integration point”
- “This dependency exists because the obvious alternative failed
in production”
That kind of knowledge is too detailed, too volatile, or too
implementation-specific for README, but too important to lose.
## The knowledge model
I kept the memory model intentionally small.
There are five knowledge types:
- architecture
- pattern
- lesson
- convention
- dependency
That gives enough structure to be useful without turning the repo
into a database.
Examples:
- architecture: why a structural decision was made, including
rejected alternatives
- pattern: a reusable implementation shape across multiple files
- lesson: a bug where the symptom pointed away from the cause
- convention: a team rule that is not obvious from code alone
- dependency: a non-obvious reason a package or workaround exists
The goal is not to create documentation overhead.
The goal is to preserve the kinds of context that future agents
and future contributors would otherwise have to rediscover.
## Cross-agent sync
Another problem I wanted to fix was tool fragmentation.
Even if the repo has good context, every tool wants it in a
different shape.
So agentsge can sync agent-facing surfaces from the project’s
source of truth.
That means .agents/ can drive:
- AGENTS.md
- CLAUDE.md
- .cursorrules
- GEMINI.md
- MCP config targets for different tools
Instead of manually maintaining parallel copies, the repo owns the
knowledge and tool-specific files stay thin.
## Automatic knowledge capture
One part I found especially interesting was capture.
Most useful project knowledge is not written during setup.
It emerges while doing real work.
A bug gets fixed.
A weird constraint gets discovered.
A pattern becomes obvious after the third repeated change.
So agentsge also supports hook-based capture.
The idea is:
- session starts
- file changes get logged
- session ends
- the diff is analyzed
- candidate knowledge items go into pending/
- a human accepts or rejects them
That keeps the system from becoming either:
- a static config generator
- or an uncontrolled note dump
## What I like about this approach
A few properties matter to me:
### 1. The project owns the context
Not the vendor.
Not the chat thread.
Not one specific tool.
### 2. It stays in git
That means:
- reviewable
- diffable
- portable
- easy to delete
- easy to move between tools
### 3. It is intentionally boring
Markdown and YAML are not magical.
That is the point.
### 4. It separates entrypoint from memory
This turned out to be a much cleaner model than trying to stuff
everything into one file.
## Current state
The public site and docs are here:
- https://agents.ge
- https://agents.ge/docs
The site is prerendered and crawlable, and I also added:
- route-level metadata
- structured data
- robots.txt
- sitemap.xml
- llms.txt
- llms-full.txt
because I wanted the project to be understandable not just to
users, but also to search engines and LLM-based discovery systems.
## Example
The basic starting point is still intentionally simple:
npx agentsge init
After that, the repo can onboard an agent with:
- project config
- required rules
- existing knowledge
- reusable workflows
- synced entrypoints
## What I’m trying to learn
I’m still interested in feedback on a few open questions:
1. Is .agents/ a good primitive, or is this too much structure?
2. Is the five-type knowledge model about right, or too
opinionated?
3. Is cross-agent sync genuinely useful, or just compensating for
poor tooling ecosystems?
4. How much project memory should be explicit versus auto-
captured?
If this sounds useful, I’d love feedback.
- Site: https://agents.ge
- Repo: https://github.com/larsen66/agentsge
- npm: https://www.npmjs.com/package/agentsge
Если хочешь, я могу сразу ещё подготовить:
- более короткую версию статьи
- Reddit post
- X thread на 5-7 твитов
- первую картинку/cover text для dev.to article
› используй md форматирование
• # AGENTS.md Is Not Enough: Building Project Memory for AI Coding
Agents
How I stopped repo context from drifting across Claude Code,
Cursor, Codex, Copilot, and other AI coding tools.
Most AI coding workflows still treat project context as chat
state.
That works until you switch tools, start a new session, or try to
keep multiple agent-specific files aligned:
- AGENTS.md
- CLAUDE.md
- .cursorrules
- GEMINI.md
- Copilot instructions
- separate MCP configs
Those files drift. New sessions start from zero. The project keeps
learning, but the next agent never sees it.
I built agentsge to move project intelligence into the repository
itself.
- Site: https://agents.ge (https://agents.ge)
- Repo: https://github.com/larsen66/agentsge
(https://github.com/larsen66/agentsge)
- npm: https://www.npmjs.com/package/agentsge
(https://www.npmjs.com/package/agentsge)
## The problem
Most teams using AI coding tools are quietly accumulating the same
failure mode:
1. One tool gets updated instructions.
2. Another tool still reads old rules.
3. Session knowledge stays trapped in chat history.
4. A new agent has to rediscover the same architecture,
conventions, and hidden constraints again.
In practice, the repo ends up with fragmented context.
You might have one file telling Claude Code how to behave, another
file for Cursor, a third for Copilot, and no durable place for
“this is how this project actually works”.
The repo has code.
The team has tacit knowledge.
The agents get a partial, drifting copy of both.
## What I wanted instead
I wanted a simple primitive:
- one repository-owned memory layer
- one place for durable rules and project knowledge
- one source of truth that multiple agent tools can inherit
That became:
- AGENTS.md as the entrypoint
- .agents/ as the durable project memory
The core model is:
> AGENTS.md says what to do. .agents/ remembers what the project
> learned.
## What agentsge does
agentsge is an open-source CLI that makes a repository agent-
ready.
Run:
npx agentsge init
It creates a versioned .agents/ directory and scaffolds the repo
for AI agent onboarding.
Typical structure:
.agents/
config.yaml
rules/
_capture.md
knowledge/
_index.md
architecture/
patterns/
lessons/
conventions/
dependencies/
skills/
mcp/
config.yaml
This gives the repo a place to store:
- project metadata
- mandatory rules
- architecture decisions
- recurring implementation patterns
- bug lessons
- team conventions
- MCP definitions
All in Markdown and YAML, stored in git.
## Why not just use AGENTS.md?
AGENTS.md is useful, but by itself it is not enough.
It solves entrypoint instructions.
It does not solve durable project memory.
A single Markdown file is fine for:
- “read these rules”
- “start here”
- “follow this workflow”
It is much worse for:
- evolving architecture knowledge
- accumulating lessons from bugs
- keeping typed project memory organized
- syncing multiple agent surfaces over time
A repo needs both:
- a front door
- a memory layer
AGENTS.md is the front door.
.agents/ is the memory layer.
## Why not just keep everything in README?
Because README has a different job.
README should explain the project to humans at a broad level.
Project memory is operational and specific.
Examples of things that belong in durable project knowledge but
not in README:
- “This subsystem looks independent, but breaks if env X is
missing”
- “The bug looked like a frontend issue, but the real cause was a
backend cache race”
- “We always extend this adapter instead of writing directly to
that integration point”
- “This dependency exists because the obvious alternative failed
in production”
That kind of knowledge is too detailed, too volatile, or too
implementation-specific for README, but too important to lose.
## The knowledge model
I kept the memory model intentionally small.
There are five knowledge types:
- architecture
- pattern
- lesson
- convention
- dependency
That gives enough structure to be useful without turning the repo
into a database.
Examples:
- architecture: why a structural decision was made, including
rejected alternatives
- pattern: a reusable implementation shape across multiple files
- lesson: a bug where the symptom pointed away from the cause
- convention: a team rule that is not obvious from code alone
- dependency: a non-obvious reason a package or workaround exists
The goal is not to create documentation overhead.
The goal is to preserve the kinds of context that future agents
and future contributors would otherwise have to rediscover.
## Cross-agent sync
Another problem I wanted to fix was tool fragmentation.
Even if the repo has good context, every tool wants it in a
different shape.
So agentsge can sync agent-facing surfaces from the project’s
source of truth.
That means .agents/ can drive:
- AGENTS.md
- CLAUDE.md
- .cursorrules
- GEMINI.md
- MCP config targets for different tools
Instead of manually maintaining parallel copies, the repo owns the
knowledge and tool-specific files stay thin.
## Automatic knowledge capture
One part I found especially interesting was capture.
Most useful project knowledge is not written during setup.
It emerges while doing real work.
A bug gets fixed.
A weird constraint gets discovered.
A pattern becomes obvious after the third repeated change.
So agentsge also supports hook-based capture.
The idea is:
- session starts
- file changes get logged
- session ends
- the diff is analyzed
- candidate knowledge items go into pending/
- a human accepts or rejects them
That keeps the system from becoming either:
- a static config generator
- or an uncontrolled note dump
## What I like about this approach
A few properties matter to me.
### 1. The project owns the context
Not the vendor.
Not the chat thread.
Not one specific tool.
### 2. It stays in git
That means it is:
- reviewable
- diffable
- portable
- easy to delete
- easy to move between tools
### 3. It is intentionally boring
Markdown and YAML are not magical.
That is the point.
### 4. It separates entrypoint from memory
This turned out to be a much cleaner model than trying to stuff
everything into one file.
## Current state
The public site and docs are here:
- https://agents.ge (https://agents.ge)
- https://agents.ge/docs (https://agents.ge/docs)
The site is prerendered and crawlable, and I also added:
- route-level metadata
- structured data
- robots.txt
- sitemap.xml
- llms.txt
- llms-full.txt
because I wanted the project to be understandable not just to
users, but also to search engines and LLM-based discovery systems.
## Example
The basic starting point is still intentionally simple:
npx agentsge init
After that, the repo can onboard an agent with:
- project config
- required rules
- existing knowledge
- reusable workflows
- synced entrypoints
## What I’m trying to learn
I’m still interested in feedback on a few open questions:
1. Is .agents/ a good primitive, or is this too much structure?
2. Is the five-type knowledge model about right, or too
opinionated?
3. Is cross-agent sync genuinely useful, or just compensating for
poor tooling ecosystems?
4. How much project memory should be explicit versus auto-
captured?
If this sounds useful, I’d love feedback.
- Site: https://agents.ge (
- Repo: https://github.com/larsen66/agentsge
- npm: https://www.npmjs.com/package/agentsge
Founders Build, Devs Fix: The Reality of Vibe Coding Tools in 2026
14 Apr 2026, 8:55 amIf you scroll through X or LinkedIn these days, you'll see a recurring boast: a non-technical founder who built a fully functioning SaaS over the weekend just by typing natural language prompts into an AI. Vibe coding has taken the startup world by storm, and on the surface, it’s thrilling. The barrier to entry for software development has never been lower.
But there is a stark difference between a weekend prototype and a production-grade application. As the founder of Redwerk, a software development agency, I'm watching a fascinating and somewhat chaotic trend unfold.
While non-technical founders are spinning up impressive minimum viable products (MVPs) with AI, they inevitably hit a wall when it comes to scaling, complex integrations, or preventing production bugs. This is creating a massive divide between the tools founders use and the tools developers rely on.
To understand this difference, I recently surveyed my team on their AI coding habits. What we found highlights the difference between building a flashy demo and engineering a defensible product.
The Vibe Coding Stack: Dev Tools vs. Founder Tools
Not all vibe coding tools are created equal. I want to draw a clean map of the vibe coding tool landscape because Cursor and Lovable do not belong in the same bucket, even though both are technically vibe coding tools. The ecosystem has splintered into distinct categories, each serving a totally different purpose.
Web-Based App Builders (The Founder’s Playground)
Tools like Lovable, Figma Make, Bolt.new, and Replit are built for people who want to go from idea to a running prototype without ever opening a terminal. You describe a screen; the AI generates the UI and the wiring behind it, and you iterate in a chat panel. They're brilliant for zero-to-one work: validating a concept with real users, building a clickable demo for a pitch deck, or testing whether a UX hypothesis even makes sense before paying anyone to build it properly. This is great for speed and validating ideas, but it is often terrible for scaling.
AI-Powered IDEs (The Developer's Sidekick)
GitHub Copilot, Cursor, and JetBrains Junie live inside the editor where the engineer is already working. They don't try to abstract the code away; they sit next to it. To get value out of them, you have to already understand what you're asking for. They reward people who can read a diff, structure a prompt around an existing module, and recognize when a suggestion is subtly wrong. They require a deep understanding of code architecture to guide the AI effectively. Instead of avoiding code, you are augmenting your ability to write and navigate it.
CLI/Terminal-Based Agents (The Senior Developer's Orchestrator)
Claude Code, OpenAI Codex / ChatGPT in agent mode, Gemini CLI, and various Antigravity-style wrappers run in the terminal and touch the filesystem directly. They can plan, edit multiple files, run tests, and iterate on errors without a human intervening at each step. However, this tier is also where the most spectacular failures happen, because "agentic" means the tool can do real damage between coffee sips.
Open-Source Automation & Autonomous Agents (The Hybrid Toolkit for Both)
For teams that need complex workflows without vendor lock-in, open-source tools are bridging the gap between coding and operations. This category serves both camps. Platforms like n8n allow tech-savvy founders and developers alike to visually orchestrate intricate, AI-driven backend processes. Meanwhile, autonomous frameworks like OpenManus and OpenClaw act almost like independent junior engineers, capable of executing multi-step tasks across your environment. These tools offer the granular control, self-hosting capabilities, and flexibility that experienced teams demand, while still remaining accessible enough for operations-minded founders.
Reality Check: What Devs Actually Think About AI Code
I asked my team how vibe coding is impacting their daily workflows. I ran an internal survey across my engineering team — designers, full-stack devs, mobile devs, and a couple of architects. The respondents had between several months and two years of hands-on time with tools spanning every category above: Cursor, Claude Code, GitHub Copilot, Gemini, ChatGPT/Codex, JetBrains Junie, Figma Make, and a handful of others. Honestly, the results weren’t surprising.
The wins are real, but narrow. When I asked how much AI-generated code they actually ship without modification, the answers clustered between 0–40% for most respondents. The pattern was clear: in tightly scoped, well-defined tasks (boilerplate, CLI scripts, isolated UI components, autocomplete inside an existing project), AI output goes in nearly clean. In anything that touches multiple files, has business logic, or requires architectural judgment, it doesn't.
Almost no one ships AI code unreviewed. 89% spend moderate-to-significant time correcting output: 22% need substantial rework, 67% require regular corrections, and only 11% need minor adjustments, mainly experienced Claude Code users. No one said AI code is good to go as-is. According to a recent CodeRabbit report, AI-generated code amplifies vulnerabilities by 2.74x and is 75% more likely to have logic or correctness issues.
The pain points share a common thread. When I asked about the biggest headaches, the same themes came up over and over:
Hallucination, especially as context grows. One engineer put it bluntly: "Hallucination if the context window becomes too big." Another said the AI confidently suggests wrong API endpoints when working against a real API.
Prompt-language friction. Vague prompts like "create a dashboard" produce useless output. Engineers had to learn to write prompts that read more like technical specs: explicit user flow, expected output format, constraints, and edge cases.
Multi-file refactoring is where it falls apart. Maintaining consistency across a large pull request with many interdependent files is still where AI tools regularly miss.
The realism gap. A designer on the team noted that AI-generated UI can look fine but isn't fully realistic for development, so the hand-off to engineering still requires rework.
Context switching and cost. Several respondents flagged that juggling multiple tools and paying for them adds up faster than expected.
Imagine this: a founder ships an MVP on Lovable or Replit. It works. They get 200 sign-ups, then 2,000, then a customer asks for SSO, then someone reports a bug that only happens on Safari, then the Stripe integration starts double-charging, then the database query that worked fine for 50 rows starts timing out at 50,000.
Now the founder needs to do something the AI never had to do: understand the code well enough to change it without breaking the rest of it. That technical debt won’t show up on day one. It will show up on day 60, when the founder is googling a vibe code cleanup service.
How the Pros Vibe Code: Battle-Tested Workflows
If you take one practical thing from this article, take this section. These are the habits my team naturally adopted on their own.
1. Plan-then-execute, always. Don't prompt and pray. Ask the AI to generate a plan or an API contract first. Read the plan. Push back on it. Then tell it to implement. It costs you ten minutes upfront and saves you an afternoon of unwinding bad decisions.
2. Treat prompts like specs, not wishes. "Build a dashboard" fails. "Build a dashboard that shows X, Y, Z for user role A, returns data in this JSON shape, uses our existing Card component, and handles the empty state like this" works. Wherever possible, attach a Figma frame, a screenshot, or a sample JSON. Maintain a CLAUDE.md (or equivalent rules file) at the root of each project that captures conventions, naming, and architectural decisions, so the AI stops re-inventing them every session.
3. Verify the things AI is bad at. AI output is a draft, not a pull request. Manually check: business-logic correctness, multi-file dependencies, responsiveness, UX edge cases, and anything involving external APIs (where hallucinated endpoints are still a regular event). Treat suspiciously confident output as a smell.
4. Use one tool to write, a different one to review. Several of my team members run two tools in parallel: for example, Claude Code in one terminal pane and Codex in another, with git worktree keeping branches isolated. And use the second one to review what the first one produced. The disagreement between the two is often where the bugs hide.
5. Restart instead of repairing. Sometimes the fastest path to a working solution is to throw away the conversation, rewrite the prompt with everything you've learned, and let the AI start fresh. Trying to debug a tangled AI-generated mess by chatting with the AI that made it is, in my team's experience, a losing game more often than not.
6. Maintain a knowledge base that the AI can read. One of my architects keeps a running document of successful solutions, architectural decisions, and code review rules, and feeds it into the AI during planning. The result is that the tool builds on its past good outputs rather than reinventing the wheel every session.
None of this is rocket science. However, it might be the difference between a 40% useful-as-is rate and a 0% one.
Final Thoughts
Vibe coding isn't a gimmick. The founder tools are real, the dev tools are real, and the productivity gains are real — but only when the person behind the keyboard understands what the AI is doing well enough to catch what it isn't. The next 12 months of this industry will be defined less by which tool "wins" and more by who builds the discipline around the models.
Building and battle-testing a Laravel package with AI peers
14 Apr 2026, 8:55 amI built laravel-fluent-validation, a fluent rule builder for Laravel. Magic strings like 'required|string|max:255' have always bothered me. I tried PRing expansions to Laravel's fluent API, but even small additions got closed with the usual answer: release it as a package instead. So I did.
Along the way I also fixed a performance problem with wildcard validation and built a Rector companion for automated migration.
The interesting part wasn't the package itself. It was the workflow that built and hardened it.
I used four Claude Code sessions. One owned the package, three owned real Laravel codebases that were adopting it. They reviewed each other's work through claude-peers, a peer messaging MCP server. The codebase peers would test, hit edge cases, report back. The package peer would fix, tag a release, and the codebase peers would re-verify. This compressed release-and-feedback loops from days to minutes.
The Rector companion went through eight functional releases in about 24 hours this way. 108 files converted on one codebase, net -1,426 lines of code, 566 tests green after migration with no behavioral regressions observed. But the Rector cycle is just the most compressed example. The same method shaped the performance benchmarks, the Livewire integration, the error messages, the documentation.
The examples below are Laravel-specific, but the method isn't. Isolated AI agents become far more useful when they review changes against multiple real environments with automated verification.
The workflow
claude-peers is an MCP server for Claude Code. Each instance running on your machine can discover other instances, see what they're working on, and send messages. They don't share context. Each has its own conversation with full codebase access.
In practice it works like this: the package peer tags a new release. It sends a message to the three codebase peers saying "0.4.5 tagged, fixes the parallel-worker race, please re-verify." Each codebase peer receives the message, pulls the new version, runs the migration, runs their tests, and sends back results. If something breaks, the response includes the exact error, the file, and usually a theory about why. The package peer reads that, asks follow-up questions if needed, fixes the issue, and the loop continues.
One thing I didn't expect was how quickly the peers developed their own review dynamic. They would challenge each other's assumptions, ask for evidence, and sometimes reach consensus before coming back with a recommendation.
I had four terminals open:
- The package repo, building features, writing tests, shipping releases
- Three production codebases, each a real Laravel app with its own validation patterns, framework integrations, and test suites
Everything runs locally. Claude Code works on local clones of each codebase, with the same filesystem access you'd have in your terminal. No production servers, no remote environments, no secrets exposed to AI.
Why real codebases beat synthetic fixtures
Running against multiple codebases isn't about redundancy. Each one stresses a different part of the code.
The first app has 108 FormRequests and uses rules() as a naming convention on Actions and Collections, not just validation. The Rector's skip log grew to 2,988 entries and 777KB. The package author expected a near-empty log. At 108 files, it was unusable. On a smaller codebase, you'd never notice. The same app also runs Filament alongside Livewire, and five of its components use Filament's InteractsWithForms trait, which defines its own validate() method. Inserting the package's trait would have created a fatal method collision on first form render. The right fix was to bail and flag those classes for manual review, since the Rector can't know whether the developer intends fluent validation or Filament's form validation.
The second app runs 15 parallel Rector workers. The skip log's "truncate on first write" flag was per-process, so every worker thought it was first and wiped the others' entries. Synthetic test fixtures run single-process. This bug doesn't exist there.
The third app was already on fluent validation with only 7 files left to convert. They tracked Pint code-style fixer counts across releases as an acceptance metric, and found that 5 of their 7 Livewire files had #[Validate] attributes coexisting with explicit validate([...]) calls. Dead-code attributes the package author hadn't anticipated. That drove a whole new hybrid-detection path.
None of these were likely to surface in a fixture-based test suite.
What automated tests still missed
The first app tracked firing counts across every release, how many times each Rector rule fired on their 108-file corpus. On one release, trait-insertion rectors fired zero times. Rector still reported "108 files changed" because the converter rules worked fine. A tester checking that output would have shipped it. The peer tracking counts caught that "108 to 0 on trait rectors" was a regression. The fix landed the same day, and expected counts became a permanent test.
One peer asked a question during a retrospective: "You've tested that the Rector output parses. Have you tested that the runtime semantics match?" Nobody had asked this in nine releases. It led to 16 parameterized test cases asserting that FluentRule and string-form rules produce identical error messages. All 16 passed. But those tests only exist because a peer who didn't write the code asked "prove it."
What peers changed at the design level
Before one release, the package peer was weighing whether to expand detection to handle new Password() constructor calls inside rule arrays. It sounded reasonable, more complete conversion, 30-60 minutes of work. A codebase peer killed it with one observation: the converter is context-free. It runs inside rules() methods and inside attribute arguments. Any expansion would fire in both contexts, silently rewriting code where the developer chose the constructor form intentionally. No test was failing. The feature would have worked in the narrow case it was designed for. The peer prevented it by naming a failure mode the author hadn't considered.
All three codebases reported near-zero ternary rules ($condition ? 'required' : 'nullable'), which was enough to shelve the feature on demand alone. But one peer added a reframe: developers who reach for ternaries in rule arrays are optimizing for terseness, and the closure-form fluent version loses on that axis by construction. Even with demand, the feature might make its target audience's code worse. That moved it from "deferred" to "won't fix."
In both cases, the peer contributed framing, not just evidence.
What made this work
Each Claude instance has full codebase access and its own conversation history. The package peer knows the internals. The codebase peers know their app's patterns, test suites, and integrations. Nobody has to context-switch.
The codebases were real, not demo fixtures. Every bug described above required production-level complexity that doesn't exist in test scenarios.
Automated verification made the loop objective. The package runs PHPStan on level max, Rector, and Pint on every change, with 616 tests and 1,235 assertions. Each codebase peer runs the same stack. When a peer reports "PHPStan clean, 566 tests green, Pint fixer count down from 3 to 2," you can trust the result.
The back-and-forth was fast because it stayed in the same session. Tag a release, three codebases verify, issues come back with exact errors and hypotheses, fix ships, re-verify. The whole cycle in 15-30 minutes. GitHub issues lose context between messages. These peers kept corpus knowledge across every release.
And the peers could challenge scope, not just report failures. The new Password() conversation and the ternary-rule reframe both came from peers who could say "I don't think you should build this" with technical reasoning.
What this workflow costs
Running four Claude Code sessions in parallel means watching your weekly usage limits and session caps burn in front of your eyes. It's worth it for a focused release cycle, but you feel the cost. For a solo contributor, the same process works across sequential sessions. You'd lose the synchronous loop but keep the corpus context.
The workflow also has a blind spot: if all test codebases share the same architectural assumptions, peers can miss the same category of bug together. The three-codebase model worked here because each app had genuinely different patterns: scale, parallel execution, hybrid Livewire attributes. If all three had been small Livewire apps, the skip-log volume and parallel-worker bugs would have shipped uncaught.
When I would and wouldn't use this
I'd use this workflow for packages or tools that modify other people's code: Rector rules, code generators, migration tools, linters. The cost of a silent-rewrite bug is high, and running against codebases you didn't write is the most reliable way to catch them before release.
I'd also use it for packages with integration surface across frameworks. Livewire, Filament, and Inertia all have their own quirks. A peer running on a codebase that actually uses Filament + Livewire together will find trait conflicts and method collisions that your test suite won't.
For a simpler utility package with a narrow API surface, I'd scale it down. One project peer instead of three. You still get the "does this actually work in someone else's codebase" signal without the overhead of a full multi-peer setup.
The surprising part was that multiple isolated peers, each grounded in a different real codebase, acted more like an internal design-and-QA loop than an autocomplete tool. That changed what got built, what got cut, and what got tested.
The package: laravel-fluent-validation -- fluent validation rule builders with up to 160x wildcard performance gains, full Laravel parity, Livewire and Filament support.
The Rector companion: laravel-fluent-validation-rector -- automated migration from string rules. 108 files converted on one production codebase, -1,426 LOC, 566 tests green.
The peer messaging: claude-peers
AI skills for Laravel packages: package-boost -- ships migration guides, optimization hints, and framework-specific gotchas alongside your package so each peer has context without manual setup.
GitHub Just Proved That Remote Terminal Access Matters - Here's the Mobile IDE I Built for It
14 Apr 2026, 8:54 amGitHub just shipped copilot --remote - the ability to stream a Copilot CLI session to your phone or browser. Claude Code has had something similar with their --remote flag. The idea is the same: you should be able to reach your coding session from wherever you are.
I agree. In fact, I've been building exactly that in the last month.
TermBeam is an open-source tool that turns your terminal into a mobile IDE. Run npx termbeam, scan the QR code, and you get a full terminal on your phone — any shell, any command, any folder. It's a PWA, so after the first scan you add it to your home screen and it's just an app. One tap and you're in, works on any device since its just a web page.
TermBeam's built-in Copilot SDK integration — markdown rendering, tool call visibility, and the terminal right there on my phone.
I can be pushing my daughter on the swings at the park, or sitting in a bomb shelter during a missile alert, and still pull out my phone, open TermBeam, and create a new session in whatever folder I need. thinking retrospectively, that kind of access isn't a nice-to-have for work-life balance, it's a necessity.
What It Does
npx termbeam
// or
npm install -g termbeam
termbeam
That starts a server, prints a QR code, and when you scan it you get a real terminal session in your browser. Full PTY — whatever shell you use, whatever commands you run. It tunnels through Azure DevTunnels by default so it works from anywhere.
You start it in whatever folder you want, with whatever shell you want. You're not locked into a specific tool's session model. And you can create new sessions from your phone — pick a folder, pick a shell, go.
For something permanent, install it as a background service:
termbeam service install
PM2-managed, starts on boot. Your terminal is always one tap away.
It Runs AI Agents Too
This is where the copilot --remote connection gets interesting. TermBeam has a proper @github/copilot-sdk integration — a dedicated UI pane with markdown rendering, tool call display, and a chat interface, all alongside the terminal. It's built specifically for interacting with Copilot on a small screen.
On top of that, TermBeam auto-detects AI coding tools in your PATH — Copilot CLI, Claude Code, Aider, Codex, OpenCode — and lets you launch any of them from the new session dialog. One tap.
So while copilot --remote gives you remote access to an existing Copilot session, TermBeam gives you the ability to open a full terminal session where Copilot (or any other agent) is just one of the things you can run. Different approach to a similar idea — and they work well together.
What Makes It Usable on a Phone
A terminal on your phone is a gimmick unless the UX is built for touch, like the existing ssh solutions. Here's what makes the difference:
Touch bar. Two rows of dedicated buttons: Esc, Copy, Paste, Home, End, ↑, ←, ↓, →, Enter, Ctrl, Shift, Tab, ^C, and a microphone for voice commands. These are the keys phone keyboards either don't have or hide behind long-presses. This makes running a Copilot session — or vim, or anything — actually practical on a phone.
Multiple tabs at the top, git and folder context info, and the touch bar above your keyboard. This is a real session on my phone.
Command palette. Tap the Tools button on the top right and you get a panel with six sections: Session (new tab, upload/download files, view markdown, close/rename/split/stop sessions, resume agent sessions), Search (find in terminal), View (font size, themes, port preview, code viewer, git changes), Share (copy shareable link), Notifications (toggle push notifications), and System (refresh, clear terminal, about). Every feature is two taps away.
The full command palette. Upload files, view markdown, preview a port, switch themes, check git changes — all from one panel.
Sessions hub. Each session gets a card showing the working directory, shell, PID, git branch, repo name, and git status — which sessions have staged changes, which are clean, how long ago they were active. The "+ New Session" button lets you create a session in any folder, right from your phone.
Three sessions across different projects, each showing git info and status. Tap to connect, "+ New Session" to start one anywhere.
File browser. Browse the session's working directory, search files, view code, download files, and see git changes — all in a side panel. The Changes tab shows what's modified and staged.
The file browser with Files and Changes tabs, browsing a project directory.
And the rest: 40 themes (yes, i got carried away lol), push notifications when commands finish, port preview that proxies your local dev server. It's a PWA with iPhone safe-area support, so it feels native.
Under the Hood
TermBeam spawns a PTY with your shell, serves a React frontend (xterm.js, Zustand, Radix UI) over Express, and bridges them with WebSocket. Azure DevTunnels handles the public URL, or you can choose to work on LAN. Sessions persist when you disconnect.
npx termbeam # default: tunnel + auto password
termbeam --password secret --lan # LAN only, custom password
termbeam --persisted-tunnel # stable URL across restarts
termbeam resume # reconnect from CLI
termbeam service install # always-on background service
On security — auto-generated passwords, rate-limited login (5 attempts/min/IP), httpOnly cookies, CSP and X-Frame-Options headers, WebSocket origin validation, and single-use QR auth tokens that expire in 5 minutes. Full threat model in SECURITY.md.
1350+ tests across Windows, macOS, and Linux on Node 20, 22, and 24. The less glamorous work was the mobile edge cases — Safari viewport bugs, PTY behavior differences across platforms, WebSocket reconnection after screen locks.
Try It
TermBeam is open source, MIT licensed, v1.20.1 on npm:
npx termbeam
Scan the QR code, add to home screen, and you've got a terminal on your phone.








