CLOUD

Google Cloud Platform: GKE and Beyond

A deep technical guide to running production workloads on Google Cloud Platform. Covers GKE cluster architecture, Cloud Run, Cloud Build, IAM with Workload Identity Federation, Cloud SQL, Cloud Storage, Pub/Sub, VPC networking, Cloud CDN, Secret Manager, observability with tracing, and cost management.

1. GKE: Google Kubernetes Engine
2. Cloud Run: Serverless Containers
3. Cloud Build and Artifact Registry
4. IAM and Workload Identity
5. Logging and Monitoring (Cloud Operations)
6. Cost Management
7. Real-World Experience
8. Cloud SQL
9. Cloud Storage
10. Pub/Sub
11. VPC and Firewall Rules
12. Cloud CDN
13. Secret Manager
14. Vertex AI: Google's ML Platform

1. GKE: Google Kubernetes Engine

Cluster Architecture

GKE is Google's managed Kubernetes service. It handles control plane management, automatic upgrades, and node auto-repair. Two modes: Standard (you manage node pools) and Autopilot (Google manages everything).

Control plane: Managed by Google. Regional control plane for HA (3 masters across zones). No charge for the control plane in Standard mode
Node pools: Groups of VMs with identical configuration. Mix machine types (e2, n2, c2) for cost/performance balance
Autopilot mode: Google manages nodes, scales per-pod. You only define pod resource requests. Best for teams that want zero node management
Release channels: Rapid, Regular, Stable. Control how quickly GKE upgrades your cluster. Stable is recommended for production
Private clusters: Nodes have no public IPs. Control plane accessible only via private endpoint or authorized networks
VPC-native clusters: Use alias IP ranges. Required for Network Policies, Pod-level firewall rules, and IP masquerading

# Create a production-ready GKE cluster
gcloud container clusters create myapp-prod \
  --region us-central1 \
  --release-channel stable \
  --enable-private-nodes \
  --master-ipv4-cidr 172.16.0.0/28 \
  --enable-master-authorized-networks \
  --master-authorized-networks 10.0.0.0/8 \
  --enable-ip-alias \
  --network app-vpc \
  --subnetwork app-subnet \
  --cluster-secondary-range-name pods \
  --services-secondary-range-name services \
  --enable-network-policy \
  --workload-pool=myapp-prod.svc.id.goog \
  --num-nodes 3 \
  --machine-type e2-standard-4 \
  --disk-size 100 \
  --enable-autoscaling \
  --min-nodes 2 \
  --max-nodes 10 \
  --enable-autorepair \
  --enable-autoupgrade

Workload Configuration

Kubernetes workloads in GKE require careful resource planning. Under-provisioned pods get OOMKilled; over-provisioned pods waste money.

Resource requests vs limits: Requests guarantee minimum resources. Limits cap maximum usage. Set requests based on P95 usage, limits at 2x requests
Horizontal Pod Autoscaler (HPA): Scale pods based on CPU, memory, or custom metrics. Use behavior field to control scale-up/down speed
Vertical Pod Autoscaler (VPA): Automatically adjusts resource requests based on historical usage. Run in "Off" mode first to get recommendations
Pod Disruption Budgets (PDB): Ensure minimum availability during voluntary disruptions (upgrades, scaling). Set minAvailable or maxUnavailable
Topology spread constraints: Distribute pods across nodes and zones for high availability
Preemptible/Spot VMs: 60-91% cheaper than regular VMs. Use for stateless workloads with proper PDB and pod anti-affinity

# Kubernetes deployment with production-grade configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      serviceAccountName: api-service-sa
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-service
      containers:
        - name: api
          image: us-central1-docker.pkg.dev/myapp-prod/services/api:v2.1.0
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: 500m
              memory: 1Gi
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
          env:
            - name: DB_HOST
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: host

Networking and Ingress

GKE Ingress controller: Provisions Google Cloud Load Balancers automatically. Supports HTTP(S), SSL certificates via Google-managed certs
Istio / Anthos Service Mesh: mTLS between services, traffic management (canary, blue-green), circuit breaking, observability
Network Policies: Kubernetes-native firewall. Restrict pod-to-pod communication. Default deny + explicit allow is the safest approach
Internal load balancing: Expose services internally via cloud.google.com/load-balancer-type: Internal annotation
Cloud NAT: Provides outbound internet access for private nodes without assigning public IPs

2. Cloud Run: Serverless Containers

Cloud Run runs stateless containers without managing infrastructure. It scales from zero to thousands of instances automatically. You only pay for actual request processing time (billed per 100ms).

Container contract: Listen on PORT env var (default 8080), respond to HTTP requests, stateless (no local disk persistence between requests)
Concurrency: Each instance handles multiple concurrent requests (default 80, max 1000). Tune based on your app's memory/CPU profile
Cold starts: First request to a new instance incurs startup latency. Minimize by keeping container images small and using minimum instances > 0
CPU allocation: "CPU always allocated" for background processing, or "CPU only during requests" for pure request-response workloads (cheaper)
VPC connector: Access private resources (Cloud SQL, Memorystore, GKE internal services) from Cloud Run via Serverless VPC Access
Traffic splitting: Route percentage of traffic to new revisions for canary deployments. Instant rollback by shifting 100% back

# Deploy a Cloud Run service
gcloud run deploy api-service \
  --image us-central1-docker.pkg.dev/myapp-prod/services/api:v2.1.0 \
  --region us-central1 \
  --platform managed \
  --port 3000 \
  --memory 1Gi \
  --cpu 1 \
  --concurrency 80 \
  --min-instances 1 \
  --max-instances 100 \
  --timeout 60 \
  --service-account api-service@myapp-prod.iam.gserviceaccount.com \
  --vpc-connector app-vpc-connector \
  --set-env-vars "NODE_ENV=production" \
  --set-secrets "DB_PASSWORD=db-password:latest"

Cloud Run vs GKE: Use Cloud Run for stateless HTTP services with variable traffic. Use GKE when you need persistent volumes, complex networking (service mesh), long-running processes, or fine-grained resource control.

3. Cloud Build and Artifact Registry

Cloud Build

Cloud Build is GCP's serverless CI/CD platform. It executes build steps as containers, supports parallel steps, and integrates natively with GCP services.

Build config: YAML file defining steps, each running in its own container. Steps share a persistent /workspace volume
Triggers: GitHub/GitLab push, pull request, tag, manual, Pub/Sub event. Filter by branch, tag pattern, or file path
Substitutions: Built-in ($SHORT_SHA, $BRANCH_NAME) and custom variables. Use for image tags, environment names
Private pools: Run builds in your VPC. Access private registries, internal APIs, and databases during build
Build caching: Use kaniko for layer caching. Dramatically reduces Docker build times for large images

Artifact Registry

Artifact Registry stores Docker images, npm packages, Maven artifacts, and Python packages. It replaces Container Registry with better security and multi-format support.

Docker repositories: Regional or multi-regional. Vulnerability scanning built-in (on-push or on-demand)
Cleanup policies: Automatically delete images older than N days or keep only the last N versions
IAM-based access: Fine-grained permissions per repository. Workload Identity for pull access from GKE
Remote repositories: Proxy and cache images from Docker Hub, reducing pull rate limit issues

Cloud Build Pipeline Example

# cloudbuild.yaml - Build, test, push, deploy
steps:
  # Run tests
  - name: 'node:20-alpine'
    entrypoint: 'sh'
    args: ['-c', 'npm ci && npm test']

  # Build Docker image with kaniko (layer caching)
  - name: 'gcr.io/kaniko-project/executor:latest'
    args:
      - '--destination=us-central1-docker.pkg.dev/$PROJECT_ID/services/api:$SHORT_SHA'
      - '--cache=true'
      - '--cache-ttl=72h'

  # Deploy to GKE
  - name: 'gcr.io/cloud-builders/gke-deploy'
    args:
      - 'run'
      - '--filename=k8s/'
      - '--image=us-central1-docker.pkg.dev/$PROJECT_ID/services/api:$SHORT_SHA'
      - '--cluster=myapp-prod'
      - '--location=us-central1'

options:
  logging: CLOUD_LOGGING_ONLY
  machineType: 'E2_HIGHCPU_8'

timeout: '900s'

4. IAM and Workload Identity

GCP IAM Model

GCP IAM uses a resource hierarchy: Organization > Folders > Projects > Resources. Permissions are inherited downward. Policies at higher levels apply to all resources below.

Principals: Google accounts, service accounts, groups, domains. Always use groups for human access
Roles: Predefined (e.g., roles/container.developer) or custom. Avoid primitive roles (Owner, Editor, Viewer) -- they are too broad
Service accounts: Machine identities. Create per-service SAs with minimal permissions. Never use the default compute SA
IAM Conditions: Grant permissions only when conditions are met (time-based, resource attributes, IP-based)
Audit logs: Admin Activity (always on), Data Access (configure per service), System Event. Send to Cloud Logging

Workload Identity

Workload Identity is the recommended way for GKE workloads to access GCP APIs. It binds Kubernetes service accounts to GCP service accounts, eliminating the need for exported service account keys.

How it works: KSA (Kubernetes Service Account) is annotated with a GSA (Google Service Account). When a pod uses the KSA, GKE's metadata server provides GSA credentials
No key files: Eliminates the risk of leaked JSON key files. Credentials are short-lived and automatically rotated
Per-pod identity: Different pods can have different GCP permissions by using different KSAs
Cross-project access: A GSA in project A can be bound to a KSA in project B's GKE cluster

# Setup Workload Identity for a service
# 1. Create GCP service account
gcloud iam service-accounts create api-service \
  --display-name="API Service" \
  --project=myapp-prod

# 2. Grant necessary permissions
gcloud projects add-iam-policy-binding myapp-prod \
  --member="serviceAccount:api-service@myapp-prod.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

# 3. Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  api-service@myapp-prod.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:myapp-prod.svc.id.goog[your-org/api-service-sa]"

# 4. Annotate the Kubernetes service account
kubectl annotate serviceaccount api-service-sa \
  --namespace myapp \
  iam.gke.io/gcp-service-account=api-service@myapp-prod.iam.gserviceaccount.com

Never export service account JSON keys for GKE workloads. Workload Identity is more secure, easier to manage, and provides automatic credential rotation. If you have existing key files in Kubernetes secrets, migrate to Workload Identity immediately.

Workload Identity Federation

Workload Identity Federation extends the same keyless concept beyond GKE. External workloads (GitHub Actions, GitLab CI, AWS, Azure, on-prem) can authenticate to GCP without service account keys by exchanging their native identity tokens.

Identity pools: Create a pool per trust boundary (e.g., one for GitHub, one for GitLab). Each pool contains providers that map external identities
Attribute mapping: Map external token claims (e.g., assertion.repository) to Google attributes. Use attribute conditions to restrict which external identities can authenticate
GitHub Actions: Configure OIDC provider with token.actions.githubusercontent.com. Restrict to specific repos and branches via attribute conditions
No long-lived keys: Federation tokens are short-lived (1 hour default). Eliminates the key rotation burden and leaked-key risk entirely
Service account impersonation: External identity authenticates via the pool, then impersonates a GCP service account to access resources. The SA still governs permissions via IAM

# Setup Workload Identity Federation for GitHub Actions
# 1. Create the identity pool
gcloud iam workload-identity-pools create github-pool \
  --location="global" \
  --display-name="GitHub Actions Pool"

# 2. Add GitHub as an OIDC provider
gcloud iam workload-identity-pools providers create-oidc github-provider \
  --location="global" \
  --workload-identity-pool=github-pool \
  --issuer-uri="https://token.actions.githubusercontent.com" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository=='your-org/api-service'"

# 3. Allow the provider to impersonate the deploy SA
gcloud iam service-accounts add-iam-policy-binding \
  deployer@myapp-prod.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/123456/locations/global/workloadIdentityPools/github-pool/attribute.repository/your-org/api-service"

5. Logging and Monitoring (Cloud Operations)

Cloud Logging

Structured logging: Write JSON to stdout/stderr from containers. GKE automatically ships to Cloud Logging. Fields like severity, httpRequest, and trace are parsed automatically
Log-based metrics: Create custom metrics from log patterns. Use for alerting on error rates, specific error messages, or business events
Log sinks: Route logs to BigQuery (analytics), Cloud Storage (long-term archive), or Pub/Sub (real-time processing)
Exclusion filters: Reduce logging costs by excluding high-volume, low-value logs (health checks, debug-level logs)
Log retention: Default 30 days for most logs. Use log buckets for custom retention periods (up to 10 years)

// Structured logging in Node.js for Cloud Logging
const log = (severity, message, extra = {}) => {
  const entry = {
    severity,  // DEBUG, INFO, WARNING, ERROR, CRITICAL
    message,
    timestamp: new Date().toISOString(),
    'logging.googleapis.com/trace': getTraceId(),
    'logging.googleapis.com/spanId': getSpanId(),
    serviceContext: { service: 'api-service', version: '2.1.0' },
    ...extra
  };
  console.log(JSON.stringify(entry));
};

// Usage
log('INFO', 'Booking created', {
  userId: '12345',
  bookingId: 'bk_789',
  httpRequest: { requestMethod: 'POST', requestUrl: '/api/bookings', status: 201 }
});

Cloud Monitoring

GKE metrics: CPU/memory utilization, pod restarts, node health, container errors -- all collected automatically
Custom metrics: Push application-specific metrics via OpenTelemetry or the Monitoring API. Use for business KPIs (bookings/minute, payment success rate)
Alerting policies: Multi-condition alerts with notification channels (email, Slack, PagerDuty, webhooks). Use metric-absence conditions to detect silent failures
Uptime checks: HTTP(S) probes from multiple global locations. Alert on downtime within 1 minute
Dashboards: Custom dashboards with MQL (Monitoring Query Language). Embed charts from Cloud Logging and Cloud Trace
SLOs (Service Level Objectives): Define availability and latency targets. Burn rate alerts notify before SLO is breached

The most important GKE alerts: container restarts > 3 in 5 minutes, pod OOMKilled events, node NotReady status, and persistent volume >80% full. Set these up before anything else.

Cloud Trace

Distributed tracing: Track requests across microservices. See exactly where latency occurs in a multi-service call chain
Automatic instrumentation: GKE and Cloud Run automatically generate trace spans for incoming requests. Add OpenTelemetry for custom spans
Trace-log correlation: Include logging.googleapis.com/trace in structured logs. Cloud Logging links logs to their trace automatically
Latency analysis: Cloud Trace generates latency distributions and identifies regressions. Set alerts on P99 latency exceeding thresholds
Sampling: Configure sampling rate to control cost. 1% sampling is sufficient for production latency analysis in high-traffic services

// OpenTelemetry tracing setup for Node.js on GKE
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { TraceExporter } = require('@google-cloud/opentelemetry-cloud-trace-exporter');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');

const provider = new NodeTracerProvider();
provider.addSpanProcessor(
  new BatchSpanProcessor(new TraceExporter({ projectId: 'myapp-prod' }))
);
provider.register();

registerInstrumentations({
  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()]
});

6. Cost Management

GCP billing is project-based. Without active management, GKE clusters and persistent disks are the top cost drivers.

Committed Use Discounts (CUDs): 1-year or 3-year commitments for 20-57% savings on compute. Flexible CUDs apply across machine families
Spot VMs in node pools: Create a separate node pool with Spot VMs for non-critical workloads. Use taints and tolerations to control scheduling
Right-size with VPA: Run VPA in recommendation mode. Review and apply suggestions. Most teams over-provision by 50%+ on initial deployment
Cluster autoscaler: Scale node pools to zero when not needed. Use scale-down-unneeded-time and scale-down-utilization-threshold tuning
Regional vs zonal: Regional clusters cost 3x for nodes (one per zone). Use regional for production HA, zonal for dev/staging
Persistent disk costs: Delete unused PVCs. Use pd-standard for non-IOPS-sensitive workloads (pd-ssd costs 6x more)
Logging costs: Cloud Logging charges per GB ingested. Exclude verbose logs, reduce log level in production, and set appropriate retention
Budget alerts: Set budgets per project with alerts at 50%, 80%, 100%, 150%. Use programmatic budget notifications to auto-scale down or notify team

7. Real-World Experience

In production, I designed and managed the entire GCP infrastructure running 26 microservices in production on GKE. Key architecture decisions and outcomes:

GKE cluster: Regional cluster in us-central1 with Stable release channel. Three node pools: default (e2-standard-4), high-memory (e2-highmem-4 for data-intensive services), and spot (e2-standard-2 for batch jobs). Cluster autoscaler from 2 to 10 nodes per pool
Cloud Build: Automated CI/CD pipeline triggered on GitLab merge to main. Build, test, push to Artifact Registry, deploy to GKE with rolling updates. Average build time: 3 minutes with kaniko layer caching
Workload Identity: Every microservice has its own KSA-GSA binding. API service accesses Cloud SQL, notification service accesses Pub/Sub, export service accesses Cloud Storage -- all without key files
Observability: Structured JSON logging from all Node.js services. Custom dashboards for booking rates, payment success, and API latency P95. PagerDuty integration for critical alerts
Cost optimization: Spot VMs for development namespace, CUDs for production baseline, VPA recommendations applied quarterly. Reduced GKE compute costs by ~40% versus initial on-demand pricing

Book free 1-hour consult All Guides Home

8. Cloud SQL

Cloud SQL is a fully managed relational database service supporting MySQL, PostgreSQL, and SQL Server. It handles replication, backups, encryption, and patching.

Instance tiers: Shared-core (db-f1-micro, db-g1-small) for dev, dedicated (db-custom-*) for production. Always use dedicated instances in production for consistent performance
High availability: Regional HA with automatic failover. Primary and standby in different zones. Failover takes 30-120 seconds. Enable for all production databases
Read replicas: Offload read traffic to cross-zone or cross-region replicas. Useful for analytics queries that should not impact production writes
Cloud SQL Auth Proxy: Secure connection method using IAM-based authentication. Runs as a sidecar in GKE pods. Eliminates the need to manage SSL certificates or allowlist IPs
Automated backups: Daily backups with configurable retention (up to 365 days). Point-in-time recovery (PITR) using write-ahead logs for any moment within the retention window
Connection limits: Each instance has a max connection limit based on tier. Use connection pooling (PgBouncer, ProxySQL) to avoid exhaustion under high concurrency
Private IP: Deploy Cloud SQL with private IP only. Access from GKE via private services access. No public endpoint reduces attack surface

# Create a production Cloud SQL PostgreSQL instance
gcloud sql instances create app-db \
  --database-version=POSTGRES_15 \
  --tier=db-custom-4-16384 \
  --region=us-central1 \
  --availability-type=REGIONAL \
  --storage-type=SSD \
  --storage-size=100GB \
  --storage-auto-increase \
  --backup-start-time=03:00 \
  --enable-point-in-time-recovery \
  --retained-backups-count=30 \
  --network=projects/myapp-prod/global/networks/app-vpc \
  --no-assign-ip \
  --database-flags=log_min_duration_statement=1000,max_connections=200

# Deploy Cloud SQL Auth Proxy as a sidecar in GKE
# (add to pod spec alongside your app container)
# - name: cloud-sql-proxy
#   image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2
#   args: ["--structured-logs", "myapp-prod:us-central1:app-db"]
#   securityContext:
#     runAsNonRoot: true

9. Cloud Storage

Cloud Storage is GCP's object storage service. It stores any amount of unstructured data with high durability (99.999999999% -- eleven 9s). Four storage classes balance cost and access patterns.

Storage classes: Standard (frequent access), Nearline (once/month), Coldline (once/quarter), Archive (once/year). Lifecycle rules auto-transition objects between classes
Uniform bucket-level access: Disable per-object ACLs and use only IAM for access control. Simpler, auditable, and recommended for all new buckets
Signed URLs: Generate time-limited URLs granting temporary access to private objects. Use for user-facing file downloads without exposing bucket permissions
Object versioning: Retain previous versions of overwritten or deleted objects. Essential for data protection. Combine with lifecycle rules to delete old versions after N days
Pub/Sub notifications: Trigger events on object creation, deletion, or metadata update. Drive serverless pipelines (Cloud Functions, Cloud Run) from storage events
Transfer Service: Migrate data from AWS S3, Azure Blob, HTTP sources, or on-prem to Cloud Storage. Schedule recurring transfers for data synchronization

# Create a Cloud Storage bucket with lifecycle rules
gcloud storage buckets create gs://myapp-prod-uploads \
  --location=us-central1 \
  --uniform-bucket-level-access \
  --public-access-prevention

# Set lifecycle rule: transition to Nearline after 30 days, delete after 365
cat <<'EOF' > lifecycle.json
{
  "rule": [
    {"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
     "condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}},
    {"action": {"type": "Delete"},
     "condition": {"age": 365}}
  ]
}
EOF
gcloud storage buckets update gs://myapp-prod-uploads \
  --lifecycle-file=lifecycle.json

10. Pub/Sub

Cloud Pub/Sub is a fully managed, real-time messaging service for event-driven architectures. It decouples producers from consumers and guarantees at-least-once delivery at any scale.

Topics and subscriptions: Publishers send messages to topics. Subscribers consume via pull (poll-based) or push (HTTP endpoint) subscriptions. One topic can have many subscriptions
Message ordering: Enable ordering keys to guarantee in-order delivery for messages with the same key. Required for event sourcing and stateful processing
Dead letter topics: Automatically route messages that fail processing after N attempts to a dead letter topic for investigation. Prevents poison messages from blocking the queue
Exactly-once delivery: Available with pull subscriptions using acknowledgment IDs. Subscribers must handle deduplication for push subscriptions
Message retention: Retain acknowledged messages for up to 31 days. Replay old messages by seeking to a timestamp -- useful for reprocessing after deploying a bug fix
Schema validation: Enforce Avro, Protocol Buffer, or JSON schema on messages. Reject invalid messages at publish time to prevent downstream failures

# Create a Pub/Sub topic and subscription with dead lettering
gcloud pubsub topics create booking-events

gcloud pubsub topics create booking-events-dlq

gcloud pubsub subscriptions create booking-processor \
  --topic=booking-events \
  --ack-deadline=60 \
  --message-retention-duration=7d \
  --dead-letter-topic=booking-events-dlq \
  --max-delivery-attempts=5 \
  --enable-exactly-once-delivery

# Push subscription to Cloud Run
gcloud pubsub subscriptions create booking-push \
  --topic=booking-events \
  --push-endpoint=https://booking-worker-xxxxx.run.app/pubsub \
  --push-auth-service-account=pubsub-invoker@myapp-prod.iam.gserviceaccount.com

In production, Pub/Sub handles all asynchronous workflows: booking confirmations, payment webhooks, notification dispatch, and analytics event ingestion. This decoupling allows each service to scale independently.

11. VPC and Firewall Rules

VPC (Virtual Private Cloud) is the networking foundation in GCP. Every GKE cluster, Cloud SQL instance, and Memorystore cache runs inside a VPC. Proper network design is critical for security and performance.

Subnets: Regional resources. Plan CIDR ranges carefully -- GKE needs secondary ranges for pods and services. A /20 for pods and /24 for services supports most clusters
Firewall rules: Stateful, priority-ordered. Default deny ingress, allow egress. Use tags or service accounts as targets for granular rules
Shared VPC: Host project owns the network, service projects deploy resources into it. Centralizes network management for multi-team organizations
VPC peering: Connect VPCs across projects or organizations. Non-transitive (A peered to B, B peered to C, A cannot reach C). Use for cross-project database access
Private Google Access: Allow VMs without external IPs to reach Google APIs (Cloud Storage, BigQuery, Artifact Registry). Must be enabled per subnet
VPC Flow Logs: Capture network flow data for security analysis and troubleshooting. Enable at the subnet level with configurable sampling rate
Cloud Armor: Web application firewall and DDoS protection. Attach security policies to HTTP(S) load balancers. Block by IP, geography, or custom WAF rules (OWASP Top 10)

# Create a VPC with custom subnets for GKE
gcloud compute networks create app-vpc --subnet-mode=custom

gcloud compute networks subnets create app-subnet \
  --network=app-vpc \
  --region=us-central1 \
  --range=10.0.0.0/24 \
  --secondary-range pods=10.4.0.0/14,services=10.8.0.0/20 \
  --enable-private-ip-google-access \
  --enable-flow-logs

# Firewall: allow internal communication, deny all external
gcloud compute firewall-rules create allow-internal \
  --network=app-vpc \
  --allow=tcp,udp,icmp \
  --source-ranges=10.0.0.0/8 \
  --priority=1000

gcloud compute firewall-rules create allow-health-checks \
  --network=app-vpc \
  --allow=tcp:80,tcp:443,tcp:3000 \
  --source-ranges=130.211.0.0/22,35.191.0.0/16 \
  --target-tags=gke-node \
  --priority=1000

gcloud compute firewall-rules create deny-all-ingress \
  --network=app-vpc \
  --action=DENY \
  --rules=all \
  --source-ranges=0.0.0.0/0 \
  --priority=65534

12. Cloud CDN

Cloud CDN caches HTTP(S) content at Google's global edge locations (over 150 points of presence). It integrates with HTTP(S) Load Balancing and supports GKE, Cloud Run, Cloud Storage, and external backends.

Cache modes: USE_ORIGIN_HEADERS (respect Cache-Control), CACHE_ALL_STATIC (cache common static types automatically), or FORCE_CACHE_ALL (cache everything). Use USE_ORIGIN_HEADERS for API responses, CACHE_ALL_STATIC for assets
Cache keys: By default, includes the full URI. Customize to include/exclude query parameters, headers, or cookies. Improves hit rates for APIs with non-significant parameters
Signed URLs and cookies: Restrict CDN access to authorized users. Time-limited tokens for premium content delivery or private asset distribution
Cache invalidation: Purge cached content by URL path or tag. Takes effect globally within seconds. Use sparingly -- design cache keys and TTLs to avoid frequent invalidation
Compression: Automatically serves Brotli or gzip compressed responses when the client supports it. Reduces bandwidth costs and improves load times

# Enable Cloud CDN on an existing backend service
gcloud compute backend-services update app-api-backend \
  --enable-cdn \
  --cache-mode=USE_ORIGIN_HEADERS \
  --default-ttl=3600 \
  --max-ttl=86400 \
  --global

# For static assets bucket backend
gcloud compute backend-buckets update app-static-backend \
  --enable-cdn \
  --cache-mode=CACHE_ALL_STATIC \
  --default-ttl=86400

# Invalidate cache
gcloud compute url-maps invalidate-cdn-cache app-lb \
  --path="/api/v2/config/*" --global

13. Secret Manager

Secret Manager stores API keys, passwords, certificates, and other sensitive data. It provides versioning, automatic rotation, IAM-based access control, and audit logging for every secret access.

Secret versions: Each secret has numbered versions. Access the latest or a specific version. Old versions can be disabled or destroyed (irreversible)
IAM access control: Grant roles/secretmanager.secretAccessor to specific service accounts. Audit every access via Cloud Audit Logs
Automatic rotation: Configure rotation schedules with Pub/Sub notifications. Trigger a Cloud Function to generate and store new credentials
GKE integration: Mount secrets as volumes or environment variables using the GKE Secret Manager add-on (SecretProviderClass). Avoids syncing to Kubernetes Secrets
Cloud Run integration: Reference secrets directly via --set-secrets flag. Cloud Run injects them as environment variables at startup
Replication: Automatic (Google manages) or user-managed (specify regions). User-managed for compliance requirements that restrict data to certain regions

# Create and manage secrets
gcloud secrets create db-password \
  --replication-policy="user-managed" \
  --locations="us-central1,us-east1"

# Add a secret version
echo -n "s3cur3P@ssw0rd" | gcloud secrets versions add db-password --data-file=-

# Grant access to a GKE workload's service account
gcloud secrets add-iam-policy-binding db-password \
  --member="serviceAccount:api-service@myapp-prod.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

# Access from GKE using SecretProviderClass (CSI driver)
# apiVersion: secrets-store.csi.x-k8s.io/v1
# kind: SecretProviderClass
# metadata:
#   name: app-secrets
# spec:
#   provider: gcp
#   parameters:
#     secrets: |
#       - resourceName: "projects/myapp-prod/secrets/db-password/versions/latest"
#         path: "db-password"

Never store secrets in environment variables in your Dockerfile, Kubernetes ConfigMaps, or source code. Always use Secret Manager or Kubernetes Secrets (synced from Secret Manager) for sensitive values.

14. Vertex AI: Google's ML Platform

Vertex AI is Google Cloud's unified platform for building, deploying, and scaling machine learning models and generative AI applications. It combines Model Garden (200+ models including Google's Gemini, partner models like Anthropic's Claude, and open models), MLOps tooling, and enterprise controls into a single managed service.

Claude models on Vertex AI: Claude Opus 4.8 and Claude Opus 4.7 (Anthropic's newest models), Claude Sonnet 4.6 and Claude Opus 4.6 (both GA since February 2026), Claude Haiku 4.5, and earlier 4.x models are available. Two endpoint types: Global (dynamic routing for optimal latency) and Regional (guaranteed data routing through specific geographic regions)
Model Garden: Curated catalog of 200+ models -- Google first-party (Gemini, Imagen, Chirp, Veo), third-party (Claude, Llama), and open models (Gemma, Mistral). Integrated with model tuning, evaluation, and serving infrastructure
Vertex AI Agents, Agent Engine, and ADK: Build AI agents with built-in tool use, grounding, and RAG capabilities. Agents connect to Google Search, enterprise data sources, and custom APIs. Orchestration handled by the platform. Native support for the A2A Protocol across the Agent Development Kit (ADK), Vertex AI Agent Engine, and Agentspace means Vertex-hosted agents are discoverable and callable from any A2A-compliant client out of the box
MLOps pipeline: Vertex AI Pipelines for orchestrating ML workflows, Feature Store for serving reusable ML features, Model Registry for versioning, and Model Monitoring for detecting training-serving skew and inference drift in production
Security and compliance: IAM-based access control, VPC Service Controls for data perimeter enforcement, Customer-Managed Encryption Keys (CMEK), and Cloud Audit Logs for every API call. Data stays within your GCP project
Claude Code integration: Configure Claude Code to use Vertex AI as the provider with claude config set provider vertex. Traffic routes through your GCP project with full integration into GCP security, monitoring, and billing

# Invoke Claude on Vertex AI (Python)
import anthropic

client = anthropic.AnthropicVertex(
    region="us-east5",
    project_id="my-gcp-project"
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": "Analyze this GKE cluster config for cost optimization."
    }]
)
print(message.content[0].text)

For organizations already on GCP, Vertex AI provides the most natural path to integrate Claude into existing workflows. IAM roles, VPC Service Controls, and billing are unified with your existing GCP infrastructure. Use Global endpoints for best latency or Regional endpoints when data residency requirements apply.

Latest GCP Updates (June 2026)

Google I/O 2026 (May 19-20): Gemini 3.5 Flash launched generally available -- the first of the Gemini 3.5 series, outperforming Gemini 3.1 Pro on coding and agentic benchmarks (Terminal-Bench 2.1: 76.2%, MCP Atlas: 83.6%) -- via the Gemini API in Google AI Studio, the Google Antigravity agent platform, and Vertex AI. Gemini 3.5 Pro is in internal use and rolling out in the following month. Google also announced Gemini Omni, a new any-input multimodal creation family starting with Gemini Omni Flash for video generation and editing. Gemini 3.1 Pro remains available in preview in Vertex AI and Gemini Enterprise.

Deployment Manager Deprecated: Google Cloud Deployment Manager was deprecated as of March 31, 2026. New deployments via the Marketplace "Deploy" button will fail for services like Apigee Drupal Portal. Organizations should transition to Infrastructure Manager (based on Terraform) for infrastructure-as-code workflows.

Security Operations Feed Migration: The transition from v1 to v2 feed types began April 6, 2026. v1 support will be discontinued September 15, 2026, with full End of Life March 15, 2027. v2 feeds use Google Cloud Storage Transfer Service (STS) for improved performance and scalability.

Apigee API Hub Enhancements: New integration with API Gateway to automatically centralize API metadata into a single control plane. Specification boost add-on now in public preview for improved API documentation generation.

Cloud Next 2026 Announcements (April 22-24): Google committed a $750M partner fund for agentic AI development. The Gemini Enterprise Agent Platform brings together Agent Studio, Agent-to-Agent Orchestration, Agent Registry, Agent Identity, Agent Gateway, and Agent Observability in a unified platform. Eighth-generation TPUs (v8) launched in two variants: TPU 8t (training-optimized, scales to 9,600 TPUs with 2 PB shared high-bandwidth memory via ICI) and TPU 8i (inference-optimized, 1,152 TPUs per pod with Boardfly topology and 3x more on-chip SRAM). The Agentic Data Cloud introduces Knowledge Catalog and a cross-cloud AI-native Lakehouse. Agentic Defense combines Google Threat Intelligence with Wiz Cloud Security. Workspace Intelligence brings AI-powered agentic work across Gmail, Docs, Sheets, and Meet.

More Guides

OrchestrationKubernetes: Container Orchestration ContainersDocker: Containerization Guide CloudAWS: Cloud Infrastructure Guide AI DevelopmentClaude Code: AI-Augmented Development EdgeCloudflare: Workers AI & Edge Platform AI AgentsGoogle ADK: Build Multi-Agent AI Systems