Skip to content

Articles

Writing by Alex Hsieh(謝明宏)on cloud infrastructure, DevOps, SRE, and AI automation. Every article is grounded in production experience or careful evaluation — not vendor blog posts repackaged.

Topics

Cloud & Kubernetes

Practical guides on running cloud infrastructure and Kubernetes in real environments:

  • Kubernetes cluster operations: upgrades, node management, and production hardening
  • GitOps workflows and continuous delivery for Kubernetes workloads
  • Multi-cloud architecture patterns and trade-offs
  • Cost optimization strategies for cloud-native teams

DevOps & SRE

  • Incident management and blameless postmortem culture
  • SLO/SLI design that actually drives engineering decisions
  • Toil measurement and reduction through automation
  • Observability: what to measure, what to ignore, and why

AI Automation

  • LLM-powered diagnostics for Kubernetes and cloud infrastructure
  • Runbook automation with AI agents
  • Evaluating when AI tooling adds value vs. when it adds complexity
  • Building guardrails for AI-augmented operations

Where to Read

Newsletter

Subscribe to receive new articles directly. Subscribe through the contact page or the signup form on Alex's sites.


Looking for something specific? Contact Alex with topic suggestions or questions.

Built with VitePress. Deployed on Cloudflare Pages.