INFRA

Kubernetes: Production-Grade Container Orchestration

A hands-on guide to running Kubernetes in production — from core primitives like Pods and Deployments to advanced topics like HPA, RBAC, resource tuning, and Prometheus monitoring. Based on running 26 microservices on GKE with Istio mesh.

By Jose Nobile | Updated 2026-04-23 | 18 min read

Pods: The Atomic Unit

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share a network namespace (same IP, same localhost) and storage volumes. In practice, most Pods run a single application container, but sidecar patterns — like Istio's Envoy proxy — add a second container for cross-cutting concerns.

Pods are ephemeral by design. They can be killed, rescheduled, or replaced at any time. This means your application must be stateless at the Pod level, storing persistent data in external systems (databases, object storage, PersistentVolumes). Health checks via livenessProbe and readinessProbe are critical: the liveness probe restarts a stuck container, while the readiness probe removes it from Service traffic until it is healthy.

Pod lifecycle hooks (postStart, preStop) let you run commands during creation and termination. The preStop hook is especially important for graceful shutdown: it gives your application time to finish in-flight requests and close database connections before the SIGTERM arrives.

apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  containers:
  - name: api
    image: gcr.io/myproject/api:v2.4.1
    ports:
    - containerPort: 3000
    livenessProbe:
      httpGet:
        path: /healthz
        port: 3000
      initialDelaySeconds: 10
      periodSeconds: 15
    readinessProbe:
      httpGet:
        path: /ready
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5

Deployments, StatefulSets, and DaemonSets

You never create bare Pods in production. Instead, you declare a Deployment that manages a ReplicaSet, which in turn manages the Pods. The Deployment controller ensures the desired number of replicas are always running, replacing any Pod that crashes or gets evicted. When you update the container image, the Deployment creates a new ReplicaSet and gradually shifts traffic, giving you zero-downtime updates.

Key Deployment fields include replicas (desired Pod count), strategy (RollingUpdate or Recreate), revisionHistoryLimit (how many old ReplicaSets to keep for rollback), and minReadySeconds (delay before a new Pod is considered available). In the platform, every microservice runs as a Deployment with at least 2 replicas for high availability, and the revision history is kept at 5 for fast rollback.

The maxUnavailable and maxSurge parameters control rollout speed. Setting maxUnavailable: 0 and maxSurge: 1 ensures no capacity loss during updates — new Pods spin up before old ones terminate. For latency-sensitive services, this is the safest configuration.

StatefulSets manage stateful workloads that need stable network identities and persistent storage. Unlike Deployments, StatefulSets guarantee ordered, graceful deployment and scaling. Each Pod gets a persistent identifier (e.g., mysql-0, mysql-1) that survives rescheduling. Combined with a headless Service, Pods get stable DNS names. StatefulSets also support volumeClaimTemplates to provision a dedicated PersistentVolumeClaim per replica. Use StatefulSets for databases, message brokers, and any workload where Pod identity matters.

DaemonSets ensure that a copy of a Pod runs on every node (or a selected subset). They are the right controller for node-level agents: log collectors (Fluentd/Fluent Bit), monitoring exporters (node-exporter), network plugins (Calico, Cilium), and storage drivers. When a new node joins the cluster, the DaemonSet automatically schedules a Pod on it. In production, DaemonSets run Fluent Bit for log forwarding to Cloud Logging and node-exporter for Prometheus host metrics.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 3
  revisionHistoryLimit: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
      - name: payment
        image: gcr.io/myproject/payment:v3.1.0
WORKLOAD

Deployment

Stateless workloads. Manages ReplicaSets for rolling updates and rollbacks. The default choice for microservices, APIs, and web frontends.

WORKLOAD

StatefulSet

Stateful workloads needing stable identities and persistent storage. Ordered Pod creation/deletion. Use for databases, Kafka, Elasticsearch, and Redis clusters.

WORKLOAD

DaemonSet

Runs one Pod per node. Use for log collectors, monitoring agents, network plugins, and storage drivers. Automatically handles node additions.

Services and Ingress

A Service provides a stable network endpoint for a set of Pods. Because Pod IPs change on every restart, Services abstract the routing layer via label selectors. ClusterIP (default) exposes the Service internally, NodePort maps a static port on every node, and LoadBalancer provisions a cloud load balancer. In GKE, LoadBalancer Services automatically create a Google Cloud Network Load Balancer with a public IP.

For HTTP/HTTPS traffic, Ingress objects define routing rules that map hostnames and URL paths to backend Services. An Ingress controller (NGINX, Traefik, or Istio Gateway) watches for Ingress resources and configures the actual reverse proxy. In the cluster, Istio's Gateway replaces the traditional Ingress controller, providing mTLS, traffic splitting, and advanced routing in a single layer.

Headless Services (clusterIP: None) return individual Pod IPs via DNS instead of a single virtual IP. This is essential for stateful workloads like databases or caches where clients need to connect to specific instances. Combined with a StatefulSet, headless Services enable stable DNS identities like mysql-0.mysql.default.svc.cluster.local.

CORE

ClusterIP

Default type. Internal-only virtual IP. Other Pods reach the Service via service-name.namespace.svc.cluster.local. Used for all internal microservice communication in production.

CORE

NodePort

Exposes the Service on a static port (30000-32767) on every node's IP. Useful for development, on-prem clusters, or when a cloud LB is not available. Builds on ClusterIP automatically.

CORE

LoadBalancer

Provisions a cloud LB with a public IP. Use for services that need external access without an Ingress layer. Cost: one LB per Service, so prefer Ingress for HTTP workloads.

CORE

Ingress / Gateway

Single entry point for all HTTP traffic. Routes by hostname and path. TLS termination, rate limiting, and CORS headers. The cluster uses Istio Gateway for unified traffic management.

Namespaces and RBAC

Namespaces partition a cluster into virtual sub-clusters. They provide scope for names (two Deployments can share the same name in different namespaces), resource quotas, and network policies. A production cluster typically has namespaces for production, staging, monitoring, and istio-system. The production cluster runs 6 namespaces separating workloads by environment and function.

RBAC (Role-Based Access Control) restricts who can do what in the cluster. A Role defines permissions within a namespace (e.g., "can list Pods and read logs"), and a RoleBinding assigns that Role to a user or ServiceAccount. For cluster-wide permissions, use ClusterRole and ClusterRoleBinding. Never give developers cluster-admin — scope permissions to the exact resources and verbs they need.

ServiceAccounts are the identity mechanism for Pods. Each namespace has a default ServiceAccount, but you should create dedicated ServiceAccounts for each workload and bind minimal RBAC roles. This follows the principle of least privilege and limits blast radius if a Pod is compromised. In production, each microservice runs under its own ServiceAccount with permissions scoped to only the Secrets and ConfigMaps it needs.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: production
  name: dev-pod-reader
subjects:
- kind: User
  name: developer@example.com
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

ConfigMaps and Secrets

ConfigMaps store non-sensitive configuration data as key-value pairs. They decouple configuration from container images, letting you change settings without rebuilding. ConfigMaps can be mounted as files or injected as environment variables. For structured config (YAML, JSON, .env files), mount the entire ConfigMap as a volume directory.

Secrets store sensitive data (API keys, database credentials, TLS certificates) with base64 encoding. While base64 is not encryption, Kubernetes etcd can be configured with encryption-at-rest (enabled by default on GKE). For stronger security, use external secret managers like Google Secret Manager or HashiCorp Vault with the External Secrets Operator, which syncs external secrets into Kubernetes Secret objects automatically.

Immutable ConfigMaps and Secrets (set immutable: true) prevent accidental changes and improve cluster performance by eliminating the need for the kubelet to watch for updates. In production, all production ConfigMaps are immutable — config changes require creating a new ConfigMap with a versioned name and updating the Deployment reference, ensuring a clean rollout.

Never commit Secrets to Git. Use sealed-secrets, External Secrets Operator, or Helm's --set flag with CI/CD variables to inject sensitive values at deploy time. In production, all secrets flow from Google Secret Manager through External Secrets Operator — zero secrets in the Git repository.

HPA and VPA Autoscaling

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed metrics. The most common metric is CPU utilization, but HPA v2 supports memory, custom metrics (from Prometheus via the custom metrics API), and external metrics (like queue depth from Cloud Pub/Sub). HPA evaluates metrics every 15 seconds by default and adjusts replicas to keep the target metric near the specified threshold.

Setting the right target utilization is critical. Too low (e.g., 30% CPU) wastes resources by keeping too many replicas. Too high (e.g., 90%) leaves no headroom for traffic spikes. A target of 60-70% CPU is a safe starting point for most HTTP services. For batch workloads, scale on custom metrics like queue length instead of CPU.

HPA requires resource requests to be set on containers — without requests, there is no baseline for calculating utilization percentages. The behavior field lets you control scale-up and scale-down speed. In production, scale-down is conservative (stabilization window of 300 seconds) to avoid flapping during traffic fluctuations, while scale-up is aggressive (0-second stabilization) to handle sudden spikes.

The Vertical Pod Autoscaler (VPA) complements HPA by automatically adjusting CPU and memory requests/limits on individual containers. VPA observes actual resource usage over time and recommends (or applies) right-sized values. It operates in three modes: Off (recommendation only), Initial (sets requests at Pod creation), and Auto (evicts and recreates Pods with updated requests). VPA is invaluable for workloads where resource needs are unpredictable or change over time. On GKE, VPA is available as a cluster add-on.

Do not use HPA and VPA on the same metric (e.g., both scaling on CPU). They will conflict. A common pattern is HPA for horizontal scaling on CPU/custom metrics and VPA in recommendation mode to right-size memory requests. In production, VPA runs in Off mode across all namespaces, feeding recommendations into a weekly resource-tuning review that prevents both overprovisioning and OOM kills.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 65
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 0

Resource Requests and Limits

Resource requests and limits are the most misunderstood and most impactful Kubernetes settings. Requests define the minimum resources a container needs — the scheduler uses them to place Pods on nodes with sufficient capacity. Limits define the maximum — the kubelet enforces them with cgroups. If a container exceeds its memory limit, it gets OOM-killed. If it exceeds its CPU limit, it gets throttled (not killed).

The gap between request and limit matters. Setting them equal (Guaranteed QoS class) gives predictable performance but wastes resources. Setting requests lower than limits (Burstable QoS) allows overcommit but risks noisy-neighbor issues. Never omit requests entirely (BestEffort QoS) — these Pods are the first to be evicted under memory pressure.

To find the right values, observe actual usage with kubectl top pods and Prometheus metrics for at least a week under production traffic. Start with generous limits and tighten them iteratively. A common pattern: set request to p50 usage and limit to p99 + 20% headroom. In production, every microservice has tuned requests and limits based on real production data, preventing OOM kills that previously caused cascading failures.

OOM kills are silent application killers. A container that hits its memory limit gets terminated with exit code 137 — no graceful shutdown, no log entry before death. Monitor kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} in Prometheus and alert on it. In production, resolving OOM issues on the payment service (by raising limits from 256Mi to 512Mi after profiling) eliminated 98% of the service's intermittent failures.

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Rolling Updates and Rollbacks

Kubernetes Deployments use rolling updates by default. When you change the Pod template (usually the container image tag), the Deployment controller creates a new ReplicaSet and incrementally scales it up while scaling down the old one. The maxSurge and maxUnavailable parameters control the pace. Combined with readiness probes, this ensures traffic only reaches healthy new Pods.

Rollbacks are instant. Each Deployment update creates a new revision stored as a ReplicaSet. kubectl rollout undo deployment/api-server reverts to the previous revision. kubectl rollout undo deployment/api-server --to-revision=3 jumps to a specific version. The revisionHistoryLimit controls how many old ReplicaSets are retained — set it to at least 3 in production for safety.

For critical services, add a minReadySeconds value (e.g., 30 seconds) to slow down rollouts. This gives you time to observe error rates before the next batch of Pods is replaced. Combine with Prometheus alerting: if error rate spikes during a rollout, trigger an automatic rollback via CI/CD. In production, Helm + GitLab CI orchestrate canary-style deployments by checking Prometheus error rates between rollout steps.

# Check rollout status
kubectl rollout status deployment/api-server

# View revision history
kubectl rollout history deployment/api-server

# Rollback to previous version
kubectl rollout undo deployment/api-server

# Rollback to specific revision
kubectl rollout undo deployment/api-server --to-revision=3

Monitoring with Prometheus/Grafana

Prometheus is the de facto standard for Kubernetes monitoring. It scrapes metrics from the Kubernetes API server, kubelet, node-exporter, and your application's /metrics endpoint. The kube-prometheus-stack Helm chart deploys Prometheus, Grafana, Alertmanager, and pre-configured dashboards in one command. On GKE, Google Cloud Managed Service for Prometheus (GMP) offers a fully managed alternative.

Essential metrics to monitor: container_cpu_usage_seconds_total and container_memory_working_set_bytes for resource consumption, kube_pod_status_phase for Pod health, apiserver_request_duration_seconds for control plane latency, and custom application metrics (request rate, error rate, latency percentiles — the RED method). Set alerts on: Pod restart loops (CrashLoopBackOff), OOM kills, node NotReady, PVC capacity thresholds, and certificate expiry.

Grafana dashboards visualize Prometheus data. Use the community dashboards for Kubernetes cluster overview (ID 315), node metrics (ID 1860), and namespace resource usage (ID 12740). Build custom dashboards for your application's business metrics. In production, a single Grafana dashboard per microservice shows request rate, error rate, p99 latency, CPU/memory, and database connection pool utilization — enabling engineers to diagnose issues in under 60 seconds.

METRIC

RED Method

Rate (requests/sec), Errors (failed requests/sec), Duration (latency histogram). The essential triad for monitoring any microservice. Expose via /metrics endpoint.

METRIC

USE Method

Utilization, Saturation, Errors — for infrastructure resources (CPU, memory, disk, network). Detects resource bottlenecks before they cause outages.

METRIC

SLO-based Alerts

Alert on error budget burn rate, not individual failures. Multi-window multi-burn-rate alerts reduce false positives while catching real degradation early.

Debugging Techniques

Kubernetes debugging follows a systematic pattern: check Pod status, read events, inspect logs, and exec into the container. Start with kubectl get pods -n production to see status. CrashLoopBackOff means the container keeps crashing — check logs. ImagePullBackOff means the image cannot be pulled — verify the image tag and registry credentials. Pending means no node has enough resources — check node capacity with kubectl describe node.

kubectl describe pod shows events, conditions, and resource allocation. Look at the Events section at the bottom for scheduling failures, probe failures, and OOM kills. kubectl logs pod-name -c container-name --previous shows logs from the previous (crashed) container instance. For multi-container Pods, always specify the container name with -c.

For live debugging, kubectl exec -it pod-name -- /bin/sh opens a shell inside the container. Use kubectl port-forward pod-name 3000:3000 to access a Pod's port from your local machine without an Ingress. kubectl debug creates an ephemeral debug container with tools (curl, dig, strace) attached to a running Pod — invaluable for distroless images that have no shell.

# Pod overview with wide output
kubectl get pods -n production -o wide

# Detailed Pod information and events
kubectl describe pod api-server-7d8f6b5c4-xk2mn -n production

# Logs from current and previous container
kubectl logs api-server-7d8f6b5c4-xk2mn -n production
kubectl logs api-server-7d8f6b5c4-xk2mn -n production --previous

# Interactive shell
kubectl exec -it api-server-7d8f6b5c4-xk2mn -n production -- /bin/sh

# Port-forward for local access
kubectl port-forward pod/api-server-7d8f6b5c4-xk2mn 3000:3000 -n production

# Ephemeral debug container
kubectl debug -it api-server-7d8f6b5c4-xk2mn --image=busybox --target=api

CRDs and Operators

Custom Resource Definitions (CRDs) extend the Kubernetes API with your own resource types. Once a CRD is registered, you can create, read, update, and delete instances of that custom resource using kubectl just like built-in resources. CRDs are the foundation of the Kubernetes extension model — Istio's VirtualService, Cert-Manager's Certificate, and Prometheus Operator's ServiceMonitor are all CRDs.

An Operator is a custom controller that watches CRDs and automates complex application lifecycle management. Instead of manually running database backups, failovers, or upgrades, an Operator encodes that operational knowledge into code. The Operator pattern follows a reconciliation loop: observe the desired state (the custom resource spec), compare it to the actual state, and take action to converge. Popular Operators include the Prometheus Operator, PostgreSQL Operator (Zalando), and Redis Operator.

In production, Cert-Manager's CRDs automate TLS certificate provisioning from Let's Encrypt. The External Secrets Operator syncs secrets from Google Secret Manager into Kubernetes. The Prometheus Operator's ServiceMonitor CRD auto-discovers scrape targets for each microservice — no manual Prometheus config needed when a new service is deployed.

# Example: Cert-Manager Certificate CRD
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: production
spec:
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - api.example.com
  - api.example.com

Network Policies

By default, all Pods in a Kubernetes cluster can communicate with every other Pod. Network Policies are firewall rules at the Pod level that control ingress and egress traffic using label selectors, namespace selectors, and IP blocks. They follow a whitelist model: once a NetworkPolicy selects a Pod, only explicitly allowed traffic reaches it. A CNI plugin that supports NetworkPolicy (Calico, Cilium, Weave Net) is required — GKE uses Dataplane V2 (based on Cilium) for native support.

Start with a default-deny policy per namespace and then add specific allow rules. This zero-trust approach ensures that a compromised Pod cannot reach unrelated services. For example, the payment service should only accept ingress from the API gateway and should only have egress to the database and the Stripe API. Network Policies make this enforceable at the infrastructure level.

In production, every production namespace has a default-deny ingress policy. Each microservice's Helm chart includes a NetworkPolicy that allows ingress only from its known consumers and egress only to its dependencies. This segmentation contained a security incident where a compromised logging sidecar was unable to reach the payment database due to the enforced policy.

# Default deny all ingress in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# Allow API gateway to reach payment service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-gateway-to-payment
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: payment-service
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - port: 3000
      protocol: TCP

Storage: PersistentVolumes and PVCs

Containers lose all data when they restart. For stateful workloads, Kubernetes provides the PersistentVolume (PV) and PersistentVolumeClaim (PVC) abstraction. A PV represents a piece of storage in the cluster (a cloud disk, NFS share, or local SSD). A PVC is a request for storage by a Pod. When a PVC is created, Kubernetes binds it to an available PV that satisfies the requested size and access mode. Dynamic provisioning via StorageClasses eliminates the need to pre-create PVs — the cloud provider creates the disk on demand.

Access modes define how a volume can be mounted: ReadWriteOnce (single node read-write, most common), ReadOnlyMany (multiple nodes read-only), and ReadWriteMany (multiple nodes read-write, requires NFS or a distributed filesystem like GlusterFS). Reclaim policies control what happens when a PVC is deleted: Retain keeps the data for manual recovery, Delete removes the underlying storage. Always use Retain for production databases.

StatefulSets use volumeClaimTemplates to automatically create a PVC per replica. This ensures each database instance has its own dedicated disk that persists across Pod rescheduling. In production, MySQL runs as a StatefulSet with pd-ssd StorageClass on GKE, provisioning 100Gi SSD PersistentDisks per replica. Prometheus uses a 500Gi PVC for metrics retention, monitored with alerts at 80% capacity to prevent data loss.

# StorageClass for GKE SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
---
# PVC requesting 100Gi SSD
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi

Real-World: Production GKE Cluster

The platform runs 26 microservices on a GKE cluster with Istio service mesh, serving fitness businesses across multiple countries. This production environment demonstrates every concept in this guide at scale — from resource tuning that eliminated OOM cascades to HPA configurations that handle 10x traffic spikes during peak gym hours.

20+ Microservices

Each service runs as a Deployment with tuned resource requests/limits, HPA scaling, and Istio sidecar. Services communicate via gRPC and REST over the mesh with automatic mTLS.

OOM Debugging Victory

The payment service suffered intermittent failures traced to OOM kills. Prometheus memory metrics + heap profiling revealed a connection pool leak. Fixed the leak and raised limits from 256Mi to 512Mi — 98% fewer incidents.

Istio Service Mesh

Full Istio mesh with mTLS, traffic management, and distributed tracing via Jaeger. Circuit breakers protect downstream services. Canary deployments route 5% traffic to new versions before full rollout.

Gateway API: The Successor to Ingress

The Kubernetes Gateway API is the official successor to the Ingress resource, designed by SIG Network to address Ingress limitations. Gateway, GatewayClass, and HTTPRoute graduated to GA (v1.0) in October 2023 and have continued evolving -- v1.4.0 (October 2025) added BackendTLSPolicy to the Standard channel, TLSRoute GA, and improved conformance testing. Gateway API is the recommended path forward for all new Kubernetes deployments.

This is especially urgent because the Ingress NGINX Controller was retired in April 2026. Maintenance has halted with no further releases, bugfixes, or security patches. Given that ~50% of cloud-native environments relied on Ingress NGINX, the Kubernetes Steering Committee and Security Response Committee recommend immediate migration to Gateway API or another supported ingress controller (Envoy Gateway, Istio, Traefik, Contour).

Gateway API introduces a role-oriented model with clear separation of concerns: GatewayClass (infrastructure provider defines available gateway types), Gateway (cluster operator configures listeners, ports, and TLS), and HTTPRoute/TLSRoute/GRPCRoute (application developer defines routing rules). This separation lets platform teams manage infrastructure while developers control their own routing without elevated permissions.

Key Advantages Over Ingress

# Gateway API: GatewayClass + Gateway + HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: istio
spec:
  controllerName: istio.io/gateway-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: app-gateway
  namespace: infra
spec:
  gatewayClassName: istio
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - name: app-tls-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-routes
  namespace: app
spec:
  parentRefs:
  - name: app-gateway
    namespace: infra
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api/v2
    backendRefs:
    - name: api-v2
      port: 8080
      weight: 90
    - name: api-v2-canary
      port: 8080
      weight: 10
If you are running Ingress NGINX, start migrating to Gateway API now. The Ingress NGINX Controller received its final release in April 2026 -- no further security patches will be issued. Gateway API is supported by all major ingress controllers and is the recommended standard by Kubernetes SIG Network.

Latest Kubernetes Features (2025-2026)

In-Place Pod Resize (GA in 1.35): You can now change CPU and memory resources on running Pods without restarting them. Memory limit decreases are now permitted, not just increases. This is a major production capability -- previously, any resource change required a Pod restart, causing brief downtime. For stateful workloads and long-running batch jobs, in-place resize eliminates unnecessary disruptions. Combined with VPA in Auto mode, clusters can right-size containers continuously without the evict-and-recreate cycle.

Native Sidecar Containers (GA in 1.33): Kubernetes now has first-class support for sidecar containers via restartPolicy: Always on init containers. Native sidecars start before regular containers and run for the entire Pod lifecycle, solving the long-standing problem of sidecar ordering and shutdown. Istio, logging agents, and other sidecar workloads benefit from proper lifecycle management -- sidecars now terminate after the main container exits, and Jobs with sidecars complete correctly instead of hanging indefinitely.

Dynamic Resource Allocation (DRA) (GA in 1.34): DRA provides a standardized Kubernetes API for requesting and managing GPUs, FPGAs, and other specialized hardware. Instead of relying on device plugins with limited scheduling intelligence, DRA enables fine-grained resource allocation with vendor-specific parameters. This is critical for AI/ML workloads: teams report 20-35% GPU cost reduction through better scheduling and sharing. DRA supports structured parameters for precise resource matching and claim management across Pod restarts.

Pod-Level Resources (Beta in 1.35): A new spec.resources field at the Pod level lets you set aggregate CPU and memory limits for all containers in a Pod. Instead of specifying resources per container and hoping the sum works out, you define a Pod-wide budget. The kubelet enforces the aggregate limit via a shared cgroup. This simplifies resource management for multi-container Pods and sidecars, ensuring the total Pod consumption stays within bounds regardless of which container is active.

Kubernetes v1.36 "Haru" (released April 22, 2026) ships with 70 tracked enhancements: 18 graduating to stable, 25 to beta, and 25 new alpha features. Headline GA promotions include User Namespaces in Pods for truly rootless containers, OCI VolumeSource for running OCI images as volumes, MutatingAdmissionPolicy for webhook-free admission control, SELinux volume mounting (replacing recursive relabeling with mount-time labels), and fine-grained kubelet API authorization for node-level security. HPAScaleToZero is now enabled by default, letting the Horizontal Pod Autoscaler scale Deployments down to zero replicas when there is no demand -- a feature first introduced in v1.16 (2019). New alpha features target AI/ML workloads with workload-aware preemption for training jobs and sharded API streams for large-scale clusters. The gitRepo volume plugin and IPVS mode in kube-proxy are removed in this release.

Ingress NGINX officially retired: Kubernetes SIG Network and the Security Response Committee retired the Ingress NGINX Controller on March 24, 2026. There will be no further releases, bugfixes, or security patches. Since approximately 50% of cloud-native environments relied on Ingress NGINX, all teams should migrate to the Gateway API or another supported ingress controller (Envoy Gateway, Istio, Traefik, Contour) immediately. Gateway API is the recommended standard going forward, supported by all major controllers.

CRI List Streaming (Alpha in 1.36): On large-scale nodes running hundreds of containers, the kubelet's traditional monolithic List requests to the container runtime caused memory pressure and latency spikes. CRI list streaming replaces these with server-side streaming RPC, significantly reducing memory allocation during container listing operations. This is critical for AI/ML nodes with many sidecar containers and init containers.

GA 1.35

In-Place Pod Resize

Change CPU/memory on running Pods without restart. Memory limit decreases now permitted. Eliminates unnecessary disruptions for stateful workloads.

GA 1.33

Native Sidecar Containers

First-class sidecars with restartPolicy: Always in init containers. Proper startup/shutdown ordering. Jobs with sidecars now complete correctly.

GA 1.34

Dynamic Resource Allocation

Standardized API for GPUs and specialized hardware. 20-35% GPU cost reduction through better scheduling. Critical for AI/ML workloads on Kubernetes.

BETA 1.35

Pod-Level Resources

Aggregate resource limits for all containers in a Pod via spec.resources. Simplifies multi-container and sidecar resource management.

v1.36

Kubernetes v1.36 Highlights

Released April 22, 2026. User Namespaces GA (rootless containers), OCI VolumeSource GA, HPAScaleToZero default-on, MutatingAdmissionPolicy GA. Ingress NGINX retired -- migrate to Gateway API.

# In-Place Pod Resize: patch resources without restart
kubectl patch pod api-server -p '{
  "spec": {"containers": [{"name": "api",
    "resources": {"requests": {"memory": "512Mi"},
      "limits": {"memory": "1Gi", "cpu": "500m"}}}]}
}'

# Native Sidecar Container
apiVersion: v1
kind: Pod
spec:
  initContainers:
  - name: log-collector
    image: fluent-bit:3.2
    restartPolicy: Always # makes it a native sidecar
  containers:
  - name: app
    image: myapp:v2

# Pod-Level Resources (Beta 1.35)
apiVersion: v1
kind: Pod
spec:
  resources:
    limits:
      cpu: "2"
      memory: 2Gi
  containers:
  - name: app
    image: myapp:v2
  - name: sidecar
    image: envoy:1.32

More Guides