Google Cloud Platform: GKE and Beyond
A deep technical guide to running production workloads on Google Cloud Platform. Covers GKE cluster architecture, Cloud Run, Cloud Build, IAM with Workload Identity Federation, Cloud SQL, Cloud Storage, Pub/Sub, VPC networking, Cloud CDN, Secret Manager, observability with tracing, and cost management.
Table of Contents
- 1. GKE: Google Kubernetes Engine
- 2. Cloud Run: Serverless Containers
- 3. Cloud Build and Artifact Registry
- 4. IAM and Workload Identity
- 5. Logging and Monitoring (Cloud Operations)
- 6. Cost Management
- 7. Real-World Experience
- 8. Cloud SQL
- 9. Cloud Storage
- 10. Pub/Sub
- 11. VPC and Firewall Rules
- 12. Cloud CDN
- 13. Secret Manager
- 14. Vertex AI: Google's ML Platform
1. GKE: Google Kubernetes Engine
Cluster Architecture
GKE is Google's managed Kubernetes service. It handles control plane management, automatic upgrades, and node auto-repair. Two modes: Standard (you manage node pools) and Autopilot (Google manages everything).
- Control plane: Managed by Google. Regional control plane for HA (3 masters across zones). No charge for the control plane in Standard mode
- Node pools: Groups of VMs with identical configuration. Mix machine types (e2, n2, c2) for cost/performance balance
- Autopilot mode: Google manages nodes, scales per-pod. You only define pod resource requests. Best for teams that want zero node management
- Release channels: Rapid, Regular, Stable. Control how quickly GKE upgrades your cluster. Stable is recommended for production
- Private clusters: Nodes have no public IPs. Control plane accessible only via private endpoint or authorized networks
- VPC-native clusters: Use alias IP ranges. Required for Network Policies, Pod-level firewall rules, and IP masquerading
# Create a production-ready GKE cluster
gcloud container clusters create myapp-prod \
--region us-central1 \
--release-channel stable \
--enable-private-nodes \
--master-ipv4-cidr 172.16.0.0/28 \
--enable-master-authorized-networks \
--master-authorized-networks 10.0.0.0/8 \
--enable-ip-alias \
--network app-vpc \
--subnetwork app-subnet \
--cluster-secondary-range-name pods \
--services-secondary-range-name services \
--enable-network-policy \
--workload-pool=myapp-prod.svc.id.goog \
--num-nodes 3 \
--machine-type e2-standard-4 \
--disk-size 100 \
--enable-autoscaling \
--min-nodes 2 \
--max-nodes 10 \
--enable-autorepair \
--enable-autoupgrade
Workload Configuration
Kubernetes workloads in GKE require careful resource planning. Under-provisioned pods get OOMKilled; over-provisioned pods waste money.
- Resource requests vs limits: Requests guarantee minimum resources. Limits cap maximum usage. Set requests based on P95 usage, limits at 2x requests
- Horizontal Pod Autoscaler (HPA): Scale pods based on CPU, memory, or custom metrics. Use
behaviorfield to control scale-up/down speed - Vertical Pod Autoscaler (VPA): Automatically adjusts resource requests based on historical usage. Run in "Off" mode first to get recommendations
- Pod Disruption Budgets (PDB): Ensure minimum availability during voluntary disruptions (upgrades, scaling). Set
minAvailableormaxUnavailable - Topology spread constraints: Distribute pods across nodes and zones for high availability
- Preemptible/Spot VMs: 60-91% cheaper than regular VMs. Use for stateless workloads with proper PDB and pod anti-affinity
# Kubernetes deployment with production-grade configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
spec:
serviceAccountName: api-service-sa
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-service
containers:
- name: api
image: us-central1-docker.pkg.dev/myapp-prod/services/api:v2.1.0
ports:
- containerPort: 3000
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-credentials
key: host
Networking and Ingress
- GKE Ingress controller: Provisions Google Cloud Load Balancers automatically. Supports HTTP(S), SSL certificates via Google-managed certs
- Istio / Anthos Service Mesh: mTLS between services, traffic management (canary, blue-green), circuit breaking, observability
- Network Policies: Kubernetes-native firewall. Restrict pod-to-pod communication. Default deny + explicit allow is the safest approach
- Internal load balancing: Expose services internally via
cloud.google.com/load-balancer-type: Internalannotation - Cloud NAT: Provides outbound internet access for private nodes without assigning public IPs
2. Cloud Run: Serverless Containers
Cloud Run runs stateless containers without managing infrastructure. It scales from zero to thousands of instances automatically. You only pay for actual request processing time (billed per 100ms).
- Container contract: Listen on PORT env var (default 8080), respond to HTTP requests, stateless (no local disk persistence between requests)
- Concurrency: Each instance handles multiple concurrent requests (default 80, max 1000). Tune based on your app's memory/CPU profile
- Cold starts: First request to a new instance incurs startup latency. Minimize by keeping container images small and using minimum instances > 0
- CPU allocation: "CPU always allocated" for background processing, or "CPU only during requests" for pure request-response workloads (cheaper)
- VPC connector: Access private resources (Cloud SQL, Memorystore, GKE internal services) from Cloud Run via Serverless VPC Access
- Traffic splitting: Route percentage of traffic to new revisions for canary deployments. Instant rollback by shifting 100% back
# Deploy a Cloud Run service
gcloud run deploy api-service \
--image us-central1-docker.pkg.dev/myapp-prod/services/api:v2.1.0 \
--region us-central1 \
--platform managed \
--port 3000 \
--memory 1Gi \
--cpu 1 \
--concurrency 80 \
--min-instances 1 \
--max-instances 100 \
--timeout 60 \
--service-account api-service@myapp-prod.iam.gserviceaccount.com \
--vpc-connector app-vpc-connector \
--set-env-vars "NODE_ENV=production" \
--set-secrets "DB_PASSWORD=db-password:latest"
3. Cloud Build and Artifact Registry
Cloud Build
Cloud Build is GCP's serverless CI/CD platform. It executes build steps as containers, supports parallel steps, and integrates natively with GCP services.
- Build config: YAML file defining steps, each running in its own container. Steps share a persistent /workspace volume
- Triggers: GitHub/GitLab push, pull request, tag, manual, Pub/Sub event. Filter by branch, tag pattern, or file path
- Substitutions: Built-in ($SHORT_SHA, $BRANCH_NAME) and custom variables. Use for image tags, environment names
- Private pools: Run builds in your VPC. Access private registries, internal APIs, and databases during build
- Build caching: Use kaniko for layer caching. Dramatically reduces Docker build times for large images
Artifact Registry
Artifact Registry stores Docker images, npm packages, Maven artifacts, and Python packages. It replaces Container Registry with better security and multi-format support.
- Docker repositories: Regional or multi-regional. Vulnerability scanning built-in (on-push or on-demand)
- Cleanup policies: Automatically delete images older than N days or keep only the last N versions
- IAM-based access: Fine-grained permissions per repository. Workload Identity for pull access from GKE
- Remote repositories: Proxy and cache images from Docker Hub, reducing pull rate limit issues
Cloud Build Pipeline Example
# cloudbuild.yaml - Build, test, push, deploy
steps:
# Run tests
- name: 'node:20-alpine'
entrypoint: 'sh'
args: ['-c', 'npm ci && npm test']
# Build Docker image with kaniko (layer caching)
- name: 'gcr.io/kaniko-project/executor:latest'
args:
- '--destination=us-central1-docker.pkg.dev/$PROJECT_ID/services/api:$SHORT_SHA'
- '--cache=true'
- '--cache-ttl=72h'
# Deploy to GKE
- name: 'gcr.io/cloud-builders/gke-deploy'
args:
- 'run'
- '--filename=k8s/'
- '--image=us-central1-docker.pkg.dev/$PROJECT_ID/services/api:$SHORT_SHA'
- '--cluster=myapp-prod'
- '--location=us-central1'
options:
logging: CLOUD_LOGGING_ONLY
machineType: 'E2_HIGHCPU_8'
timeout: '900s'
4. IAM and Workload Identity
GCP IAM Model
GCP IAM uses a resource hierarchy: Organization > Folders > Projects > Resources. Permissions are inherited downward. Policies at higher levels apply to all resources below.
- Principals: Google accounts, service accounts, groups, domains. Always use groups for human access
- Roles: Predefined (e.g., roles/container.developer) or custom. Avoid primitive roles (Owner, Editor, Viewer) -- they are too broad
- Service accounts: Machine identities. Create per-service SAs with minimal permissions. Never use the default compute SA
- IAM Conditions: Grant permissions only when conditions are met (time-based, resource attributes, IP-based)
- Audit logs: Admin Activity (always on), Data Access (configure per service), System Event. Send to Cloud Logging
Workload Identity
Workload Identity is the recommended way for GKE workloads to access GCP APIs. It binds Kubernetes service accounts to GCP service accounts, eliminating the need for exported service account keys.
- How it works: KSA (Kubernetes Service Account) is annotated with a GSA (Google Service Account). When a pod uses the KSA, GKE's metadata server provides GSA credentials
- No key files: Eliminates the risk of leaked JSON key files. Credentials are short-lived and automatically rotated
- Per-pod identity: Different pods can have different GCP permissions by using different KSAs
- Cross-project access: A GSA in project A can be bound to a KSA in project B's GKE cluster
# Setup Workload Identity for a service
# 1. Create GCP service account
gcloud iam service-accounts create api-service \
--display-name="API Service" \
--project=myapp-prod
# 2. Grant necessary permissions
gcloud projects add-iam-policy-binding myapp-prod \
--member="serviceAccount:api-service@myapp-prod.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
# 3. Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
api-service@myapp-prod.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:myapp-prod.svc.id.goog[your-org/api-service-sa]"
# 4. Annotate the Kubernetes service account
kubectl annotate serviceaccount api-service-sa \
--namespace myapp \
iam.gke.io/gcp-service-account=api-service@myapp-prod.iam.gserviceaccount.com
Workload Identity Federation
Workload Identity Federation extends the same keyless concept beyond GKE. External workloads (GitHub Actions, GitLab CI, AWS, Azure, on-prem) can authenticate to GCP without service account keys by exchanging their native identity tokens.
- Identity pools: Create a pool per trust boundary (e.g., one for GitHub, one for GitLab). Each pool contains providers that map external identities
- Attribute mapping: Map external token claims (e.g.,
assertion.repository) to Google attributes. Use attribute conditions to restrict which external identities can authenticate - GitHub Actions: Configure OIDC provider with
token.actions.githubusercontent.com. Restrict to specific repos and branches via attribute conditions - No long-lived keys: Federation tokens are short-lived (1 hour default). Eliminates the key rotation burden and leaked-key risk entirely
- Service account impersonation: External identity authenticates via the pool, then impersonates a GCP service account to access resources. The SA still governs permissions via IAM
# Setup Workload Identity Federation for GitHub Actions
# 1. Create the identity pool
gcloud iam workload-identity-pools create github-pool \
--location="global" \
--display-name="GitHub Actions Pool"
# 2. Add GitHub as an OIDC provider
gcloud iam workload-identity-pools providers create-oidc github-provider \
--location="global" \
--workload-identity-pool=github-pool \
--issuer-uri="https://token.actions.githubusercontent.com" \
--attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
--attribute-condition="assertion.repository=='your-org/api-service'"
# 3. Allow the provider to impersonate the deploy SA
gcloud iam service-accounts add-iam-policy-binding \
deployer@myapp-prod.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/projects/123456/locations/global/workloadIdentityPools/github-pool/attribute.repository/your-org/api-service"
5. Logging and Monitoring (Cloud Operations)
Cloud Logging
- Structured logging: Write JSON to stdout/stderr from containers. GKE automatically ships to Cloud Logging. Fields like
severity,httpRequest, andtraceare parsed automatically - Log-based metrics: Create custom metrics from log patterns. Use for alerting on error rates, specific error messages, or business events
- Log sinks: Route logs to BigQuery (analytics), Cloud Storage (long-term archive), or Pub/Sub (real-time processing)
- Exclusion filters: Reduce logging costs by excluding high-volume, low-value logs (health checks, debug-level logs)
- Log retention: Default 30 days for most logs. Use log buckets for custom retention periods (up to 10 years)
// Structured logging in Node.js for Cloud Logging
const log = (severity, message, extra = {}) => {
const entry = {
severity, // DEBUG, INFO, WARNING, ERROR, CRITICAL
message,
timestamp: new Date().toISOString(),
'logging.googleapis.com/trace': getTraceId(),
'logging.googleapis.com/spanId': getSpanId(),
serviceContext: { service: 'api-service', version: '2.1.0' },
...extra
};
console.log(JSON.stringify(entry));
};
// Usage
log('INFO', 'Booking created', {
userId: '12345',
bookingId: 'bk_789',
httpRequest: { requestMethod: 'POST', requestUrl: '/api/bookings', status: 201 }
});
Cloud Monitoring
- GKE metrics: CPU/memory utilization, pod restarts, node health, container errors -- all collected automatically
- Custom metrics: Push application-specific metrics via OpenTelemetry or the Monitoring API. Use for business KPIs (bookings/minute, payment success rate)
- Alerting policies: Multi-condition alerts with notification channels (email, Slack, PagerDuty, webhooks). Use metric-absence conditions to detect silent failures
- Uptime checks: HTTP(S) probes from multiple global locations. Alert on downtime within 1 minute
- Dashboards: Custom dashboards with MQL (Monitoring Query Language). Embed charts from Cloud Logging and Cloud Trace
- SLOs (Service Level Objectives): Define availability and latency targets. Burn rate alerts notify before SLO is breached
Cloud Trace
- Distributed tracing: Track requests across microservices. See exactly where latency occurs in a multi-service call chain
- Automatic instrumentation: GKE and Cloud Run automatically generate trace spans for incoming requests. Add OpenTelemetry for custom spans
- Trace-log correlation: Include
logging.googleapis.com/tracein structured logs. Cloud Logging links logs to their trace automatically - Latency analysis: Cloud Trace generates latency distributions and identifies regressions. Set alerts on P99 latency exceeding thresholds
- Sampling: Configure sampling rate to control cost. 1% sampling is sufficient for production latency analysis in high-traffic services
// OpenTelemetry tracing setup for Node.js on GKE
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { TraceExporter } = require('@google-cloud/opentelemetry-cloud-trace-exporter');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
const provider = new NodeTracerProvider();
provider.addSpanProcessor(
new BatchSpanProcessor(new TraceExporter({ projectId: 'myapp-prod' }))
);
provider.register();
registerInstrumentations({
instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()]
});
6. Cost Management
GCP billing is project-based. Without active management, GKE clusters and persistent disks are the top cost drivers.
- Committed Use Discounts (CUDs): 1-year or 3-year commitments for 20-57% savings on compute. Flexible CUDs apply across machine families
- Spot VMs in node pools: Create a separate node pool with Spot VMs for non-critical workloads. Use taints and tolerations to control scheduling
- Right-size with VPA: Run VPA in recommendation mode. Review and apply suggestions. Most teams over-provision by 50%+ on initial deployment
- Cluster autoscaler: Scale node pools to zero when not needed. Use scale-down-unneeded-time and scale-down-utilization-threshold tuning
- Regional vs zonal: Regional clusters cost 3x for nodes (one per zone). Use regional for production HA, zonal for dev/staging
- Persistent disk costs: Delete unused PVCs. Use pd-standard for non-IOPS-sensitive workloads (pd-ssd costs 6x more)
- Logging costs: Cloud Logging charges per GB ingested. Exclude verbose logs, reduce log level in production, and set appropriate retention
- Budget alerts: Set budgets per project with alerts at 50%, 80%, 100%, 150%. Use programmatic budget notifications to auto-scale down or notify team
7. Real-World Experience
In production, I designed and managed the entire GCP infrastructure running 26 microservices in production on GKE. Key architecture decisions and outcomes:
- GKE cluster: Regional cluster in us-central1 with Stable release channel. Three node pools: default (e2-standard-4), high-memory (e2-highmem-4 for data-intensive services), and spot (e2-standard-2 for batch jobs). Cluster autoscaler from 2 to 10 nodes per pool
- Cloud Build: Automated CI/CD pipeline triggered on GitLab merge to main. Build, test, push to Artifact Registry, deploy to GKE with rolling updates. Average build time: 3 minutes with kaniko layer caching
- Workload Identity: Every microservice has its own KSA-GSA binding. API service accesses Cloud SQL, notification service accesses Pub/Sub, export service accesses Cloud Storage -- all without key files
- Observability: Structured JSON logging from all Node.js services. Custom dashboards for booking rates, payment success, and API latency P95. PagerDuty integration for critical alerts
- Cost optimization: Spot VMs for development namespace, CUDs for production baseline, VPA recommendations applied quarterly. Reduced GKE compute costs by ~40% versus initial on-demand pricing
8. Cloud SQL
Cloud SQL is a fully managed relational database service supporting MySQL, PostgreSQL, and SQL Server. It handles replication, backups, encryption, and patching.
- Instance tiers: Shared-core (db-f1-micro, db-g1-small) for dev, dedicated (db-custom-*) for production. Always use dedicated instances in production for consistent performance
- High availability: Regional HA with automatic failover. Primary and standby in different zones. Failover takes 30-120 seconds. Enable for all production databases
- Read replicas: Offload read traffic to cross-zone or cross-region replicas. Useful for analytics queries that should not impact production writes
- Cloud SQL Auth Proxy: Secure connection method using IAM-based authentication. Runs as a sidecar in GKE pods. Eliminates the need to manage SSL certificates or allowlist IPs
- Automated backups: Daily backups with configurable retention (up to 365 days). Point-in-time recovery (PITR) using write-ahead logs for any moment within the retention window
- Connection limits: Each instance has a max connection limit based on tier. Use connection pooling (PgBouncer, ProxySQL) to avoid exhaustion under high concurrency
- Private IP: Deploy Cloud SQL with private IP only. Access from GKE via private services access. No public endpoint reduces attack surface
# Create a production Cloud SQL PostgreSQL instance
gcloud sql instances create app-db \
--database-version=POSTGRES_15 \
--tier=db-custom-4-16384 \
--region=us-central1 \
--availability-type=REGIONAL \
--storage-type=SSD \
--storage-size=100GB \
--storage-auto-increase \
--backup-start-time=03:00 \
--enable-point-in-time-recovery \
--retained-backups-count=30 \
--network=projects/myapp-prod/global/networks/app-vpc \
--no-assign-ip \
--database-flags=log_min_duration_statement=1000,max_connections=200
# Deploy Cloud SQL Auth Proxy as a sidecar in GKE
# (add to pod spec alongside your app container)
# - name: cloud-sql-proxy
# image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2
# args: ["--structured-logs", "myapp-prod:us-central1:app-db"]
# securityContext:
# runAsNonRoot: true
9. Cloud Storage
Cloud Storage is GCP's object storage service. It stores any amount of unstructured data with high durability (99.999999999% -- eleven 9s). Four storage classes balance cost and access patterns.
- Storage classes: Standard (frequent access), Nearline (once/month), Coldline (once/quarter), Archive (once/year). Lifecycle rules auto-transition objects between classes
- Uniform bucket-level access: Disable per-object ACLs and use only IAM for access control. Simpler, auditable, and recommended for all new buckets
- Signed URLs: Generate time-limited URLs granting temporary access to private objects. Use for user-facing file downloads without exposing bucket permissions
- Object versioning: Retain previous versions of overwritten or deleted objects. Essential for data protection. Combine with lifecycle rules to delete old versions after N days
- Pub/Sub notifications: Trigger events on object creation, deletion, or metadata update. Drive serverless pipelines (Cloud Functions, Cloud Run) from storage events
- Transfer Service: Migrate data from AWS S3, Azure Blob, HTTP sources, or on-prem to Cloud Storage. Schedule recurring transfers for data synchronization
# Create a Cloud Storage bucket with lifecycle rules
gcloud storage buckets create gs://myapp-prod-uploads \
--location=us-central1 \
--uniform-bucket-level-access \
--public-access-prevention
# Set lifecycle rule: transition to Nearline after 30 days, delete after 365
cat <<'EOF' > lifecycle.json
{
"rule": [
{"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}},
{"action": {"type": "Delete"},
"condition": {"age": 365}}
]
}
EOF
gcloud storage buckets update gs://myapp-prod-uploads \
--lifecycle-file=lifecycle.json
10. Pub/Sub
Cloud Pub/Sub is a fully managed, real-time messaging service for event-driven architectures. It decouples producers from consumers and guarantees at-least-once delivery at any scale.
- Topics and subscriptions: Publishers send messages to topics. Subscribers consume via pull (poll-based) or push (HTTP endpoint) subscriptions. One topic can have many subscriptions
- Message ordering: Enable ordering keys to guarantee in-order delivery for messages with the same key. Required for event sourcing and stateful processing
- Dead letter topics: Automatically route messages that fail processing after N attempts to a dead letter topic for investigation. Prevents poison messages from blocking the queue
- Exactly-once delivery: Available with pull subscriptions using acknowledgment IDs. Subscribers must handle deduplication for push subscriptions
- Message retention: Retain acknowledged messages for up to 31 days. Replay old messages by seeking to a timestamp -- useful for reprocessing after deploying a bug fix
- Schema validation: Enforce Avro, Protocol Buffer, or JSON schema on messages. Reject invalid messages at publish time to prevent downstream failures
# Create a Pub/Sub topic and subscription with dead lettering
gcloud pubsub topics create booking-events
gcloud pubsub topics create booking-events-dlq
gcloud pubsub subscriptions create booking-processor \
--topic=booking-events \
--ack-deadline=60 \
--message-retention-duration=7d \
--dead-letter-topic=booking-events-dlq \
--max-delivery-attempts=5 \
--enable-exactly-once-delivery
# Push subscription to Cloud Run
gcloud pubsub subscriptions create booking-push \
--topic=booking-events \
--push-endpoint=https://booking-worker-xxxxx.run.app/pubsub \
--push-auth-service-account=pubsub-invoker@myapp-prod.iam.gserviceaccount.com
11. VPC and Firewall Rules
VPC (Virtual Private Cloud) is the networking foundation in GCP. Every GKE cluster, Cloud SQL instance, and Memorystore cache runs inside a VPC. Proper network design is critical for security and performance.
- Subnets: Regional resources. Plan CIDR ranges carefully -- GKE needs secondary ranges for pods and services. A /20 for pods and /24 for services supports most clusters
- Firewall rules: Stateful, priority-ordered. Default deny ingress, allow egress. Use tags or service accounts as targets for granular rules
- Shared VPC: Host project owns the network, service projects deploy resources into it. Centralizes network management for multi-team organizations
- VPC peering: Connect VPCs across projects or organizations. Non-transitive (A peered to B, B peered to C, A cannot reach C). Use for cross-project database access
- Private Google Access: Allow VMs without external IPs to reach Google APIs (Cloud Storage, BigQuery, Artifact Registry). Must be enabled per subnet
- VPC Flow Logs: Capture network flow data for security analysis and troubleshooting. Enable at the subnet level with configurable sampling rate
- Cloud Armor: Web application firewall and DDoS protection. Attach security policies to HTTP(S) load balancers. Block by IP, geography, or custom WAF rules (OWASP Top 10)
# Create a VPC with custom subnets for GKE
gcloud compute networks create app-vpc --subnet-mode=custom
gcloud compute networks subnets create app-subnet \
--network=app-vpc \
--region=us-central1 \
--range=10.0.0.0/24 \
--secondary-range pods=10.4.0.0/14,services=10.8.0.0/20 \
--enable-private-ip-google-access \
--enable-flow-logs
# Firewall: allow internal communication, deny all external
gcloud compute firewall-rules create allow-internal \
--network=app-vpc \
--allow=tcp,udp,icmp \
--source-ranges=10.0.0.0/8 \
--priority=1000
gcloud compute firewall-rules create allow-health-checks \
--network=app-vpc \
--allow=tcp:80,tcp:443,tcp:3000 \
--source-ranges=130.211.0.0/22,35.191.0.0/16 \
--target-tags=gke-node \
--priority=1000
gcloud compute firewall-rules create deny-all-ingress \
--network=app-vpc \
--action=DENY \
--rules=all \
--source-ranges=0.0.0.0/0 \
--priority=65534
12. Cloud CDN
Cloud CDN caches HTTP(S) content at Google's global edge locations (over 150 points of presence). It integrates with HTTP(S) Load Balancing and supports GKE, Cloud Run, Cloud Storage, and external backends.
- Cache modes: USE_ORIGIN_HEADERS (respect Cache-Control), CACHE_ALL_STATIC (cache common static types automatically), or FORCE_CACHE_ALL (cache everything). Use USE_ORIGIN_HEADERS for API responses, CACHE_ALL_STATIC for assets
- Cache keys: By default, includes the full URI. Customize to include/exclude query parameters, headers, or cookies. Improves hit rates for APIs with non-significant parameters
- Signed URLs and cookies: Restrict CDN access to authorized users. Time-limited tokens for premium content delivery or private asset distribution
- Cache invalidation: Purge cached content by URL path or tag. Takes effect globally within seconds. Use sparingly -- design cache keys and TTLs to avoid frequent invalidation
- Compression: Automatically serves Brotli or gzip compressed responses when the client supports it. Reduces bandwidth costs and improves load times
# Enable Cloud CDN on an existing backend service
gcloud compute backend-services update app-api-backend \
--enable-cdn \
--cache-mode=USE_ORIGIN_HEADERS \
--default-ttl=3600 \
--max-ttl=86400 \
--global
# For static assets bucket backend
gcloud compute backend-buckets update app-static-backend \
--enable-cdn \
--cache-mode=CACHE_ALL_STATIC \
--default-ttl=86400
# Invalidate cache
gcloud compute url-maps invalidate-cdn-cache app-lb \
--path="/api/v2/config/*" --global
13. Secret Manager
Secret Manager stores API keys, passwords, certificates, and other sensitive data. It provides versioning, automatic rotation, IAM-based access control, and audit logging for every secret access.
- Secret versions: Each secret has numbered versions. Access the latest or a specific version. Old versions can be disabled or destroyed (irreversible)
- IAM access control: Grant
roles/secretmanager.secretAccessorto specific service accounts. Audit every access via Cloud Audit Logs - Automatic rotation: Configure rotation schedules with Pub/Sub notifications. Trigger a Cloud Function to generate and store new credentials
- GKE integration: Mount secrets as volumes or environment variables using the GKE Secret Manager add-on (SecretProviderClass). Avoids syncing to Kubernetes Secrets
- Cloud Run integration: Reference secrets directly via
--set-secretsflag. Cloud Run injects them as environment variables at startup - Replication: Automatic (Google manages) or user-managed (specify regions). User-managed for compliance requirements that restrict data to certain regions
# Create and manage secrets
gcloud secrets create db-password \
--replication-policy="user-managed" \
--locations="us-central1,us-east1"
# Add a secret version
echo -n "s3cur3P@ssw0rd" | gcloud secrets versions add db-password --data-file=-
# Grant access to a GKE workload's service account
gcloud secrets add-iam-policy-binding db-password \
--member="serviceAccount:api-service@myapp-prod.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
# Access from GKE using SecretProviderClass (CSI driver)
# apiVersion: secrets-store.csi.x-k8s.io/v1
# kind: SecretProviderClass
# metadata:
# name: app-secrets
# spec:
# provider: gcp
# parameters:
# secrets: |
# - resourceName: "projects/myapp-prod/secrets/db-password/versions/latest"
# path: "db-password"
14. Vertex AI: Google's ML Platform
Vertex AI is Google Cloud's unified platform for building, deploying, and scaling machine learning models and generative AI applications. It combines Model Garden (200+ models including Google's Gemini, partner models like Anthropic's Claude, and open models), MLOps tooling, and enterprise controls into a single managed service.
- Claude models on Vertex AI: Claude Opus 4.6 and Claude Sonnet 4.6 (both GA since February 2026), Claude Opus 4.5, Claude Sonnet 4, and Claude Opus 4.1 are available. Two endpoint types: Global (dynamic routing for optimal latency) and Regional (guaranteed data routing through specific geographic regions)
- Model Garden: Curated catalog of 200+ models -- Google first-party (Gemini, Imagen, Chirp, Veo), third-party (Claude, Llama), and open models (Gemma, Mistral). Integrated with model tuning, evaluation, and serving infrastructure
- Vertex AI Agents, Agent Engine, and ADK: Build AI agents with built-in tool use, grounding, and RAG capabilities. Agents connect to Google Search, enterprise data sources, and custom APIs. Orchestration handled by the platform. Native support for the A2A Protocol across the Agent Development Kit (ADK), Vertex AI Agent Engine, and Agentspace means Vertex-hosted agents are discoverable and callable from any A2A-compliant client out of the box
- MLOps pipeline: Vertex AI Pipelines for orchestrating ML workflows, Feature Store for serving reusable ML features, Model Registry for versioning, and Model Monitoring for detecting training-serving skew and inference drift in production
- Security and compliance: IAM-based access control, VPC Service Controls for data perimeter enforcement, Customer-Managed Encryption Keys (CMEK), and Cloud Audit Logs for every API call. Data stays within your GCP project
- Claude Code integration: Configure Claude Code to use Vertex AI as the provider with
claude config set provider vertex. Traffic routes through your GCP project with full integration into GCP security, monitoring, and billing
# Invoke Claude on Vertex AI (Python)
import anthropic
client = anthropic.AnthropicVertex(
region="us-east5",
project_id="my-gcp-project"
)
message = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
messages=[{
"role": "user",
"content": "Analyze this GKE cluster config for cost optimization."
}]
)
print(message.content[0].text)
Latest GCP Updates (April 2026)
Gemini 3.1 Pro: Now available in preview in Vertex AI and Gemini Enterprise. Represents Google's latest flagship model with improved reasoning and coding capabilities.
Deployment Manager Deprecated: Google Cloud Deployment Manager was deprecated as of March 31, 2026. New deployments via the Marketplace "Deploy" button will fail for services like Apigee Drupal Portal. Organizations should transition to Infrastructure Manager (based on Terraform) for infrastructure-as-code workflows.
Security Operations Feed Migration: The transition from v1 to v2 feed types began April 6, 2026. v1 support will be discontinued September 15, 2026, with full End of Life March 15, 2027. v2 feeds use Google Cloud Storage Transfer Service (STS) for improved performance and scalability.
Apigee API Hub Enhancements: New integration with API Gateway to automatically centralize API metadata into a single control plane. Specification boost add-on now in public preview for improved API documentation generation.
Cloud Next 2026 Announcements: Google committed a $750M partner fund for agentic AI development. Enterprise agents are now available natively in Gemini Enterprise, enabling organizations to build custom AI agents with enterprise-grade security and compliance controls. New TPU v6e variants — TPU 8t for training and TPU 8i for inference — deliver up to 4.7x cost-performance improvement over previous generations. Workspace Intelligence brings AI-powered summarization, drafting, and workflow automation across Gmail, Docs, Sheets, and Meet.