name: hybrid-cloud-architect type: reference description: "Designs hybrid cloud architectures connecting on-premises infrastructure with public cloud services. Use when designing systems spanning on-prem and cloud, or when the user mentions hybrid cloud or multi-environment architecture." effort: 4 allowed-tools: Read, Glob, Grep, Write, Edit, Bash user-invocable: true when_to_use: "When designing complex multi-cloud or hybrid cloud solutions across AWS, Azure, GCP, and private clouds"
Hybrid Cloud Architect
Designs hybrid and multi-cloud architectures that bridge on-premises infrastructure (OpenStack, VMware, bare metal) with public cloud services (AWS, Azure, GCP).
When to Use
- Designing systems that span on-premises and cloud environments
- Planning workload placement across private and public clouds
- Migrating from on-prem to hybrid architecture
- User mentions hybrid cloud, multi-cloud, or cross-environment
When NOT to Use
- Single-cloud deployment (use cloud-architect instead)
- Pure infrastructure provisioning without architecture decisions (use devops-deploy)
- Application-level architecture without infrastructure concerns (use backend-architect)
Workflow
1. Assess Requirements
Gather constraints before designing:
| Dimension | Questions |
|---|---|
| Compliance | Data sovereignty? Regulatory frameworks (HIPAA, PCI-DSS, GDPR)? |
| Performance | Latency requirements? Data gravity? Real-time vs batch? |
| Budget | TCO targets? Existing licenses? CapEx vs OpEx preference? |
| Skills | Team expertise in cloud platforms? OpenStack experience? |
| Timeline | Migration urgency? Phased approach acceptable? |
2. Classify Workloads
For each workload, determine placement:
| Criteria | On-Prem | Public Cloud | Edge |
|---|---|---|---|
| Data sovereignty | Yes | No unless region-locked | Yes |
| Low latency (less than 10ms) | Yes | No unless co-located | Yes |
| Elastic scaling | No | Yes | No |
| Cost-sensitive steady-state | Yes | No | - |
| Managed services needed | No | Yes | No |
3. Design Connectivity
Choose connectivity based on requirements:
Options:
- VPN: Low cost, lower bandwidth, good for non-critical traffic
- Dedicated (Direct Connect / ExpressRoute / Interconnect): High bandwidth, low latency, SLA-backed
- SD-WAN: Multi-site, dynamic path selection, cost optimization
- Service mesh: For cross-cloud microservices communication (Istio, Linkerd)
4. Design Security Architecture
Apply zero-trust across environments:
- Identity federation: AD/LDAP to cloud IAM (SAML/OIDC)
- Network segmentation: Micro-segmentation, security groups across clouds
- Encryption: In-transit (TLS) + at-rest, key management per environment
- Secret management: Centralized (Vault) or cloud-native (KMS/KeyVault)
- Compliance: Per-environment compliance controls, audit logging
5. Design Data Strategy
| Pattern | Use When | Tools |
|---|---|---|
| Active-active replication | RPO=0, RTO less than 1min | Database-native replication, Kafka |
| Active-passive | RPO less than 15min, RTO less than 1hr | Cross-cloud backup, DNS failover |
| Data mesh | Domain ownership, distributed teams | Data catalogs, federated queries |
| Edge preprocessing | IoT, real-time analytics | Edge compute to cloud aggregation |
6. Define Infrastructure as Code
Multi-cloud IaC strategy:
- Terraform/OpenTofu: Cross-cloud resource provisioning
- Ansible: Configuration management
- Pulumi/CDK: Complex orchestration logic
- OPA/Conftest: Policy as Code
- GitOps (ArgoCD/Flux): Multi-environment deployment
State management:
- Remote state with locking (S3+DynamoDB, Azure Storage, GCS)
- Separate state per environment, shared modules
- State migration plan for cross-cloud moves
7. Design Observability
Unified monitoring across environments:
- Metrics: Prometheus + Thanos / Grafana Mimir (cross-cloud)
- Logs: Centralized logging (ELK/Loki) with per-environment collectors
- Traces: Distributed tracing (Jaeger/Tempo) across service boundaries
- Alerting: Unified alerting with environment-aware routing
- Cost monitoring: Per-cloud cost dashboards, anomaly detection
8. Plan Disaster Recovery
| Tier | Strategy | RPO | RTO | Cost |
|---|---|---|---|---|
| Tier 1 | Active-active multi-cloud | 0 | less than 1min | High |
| Tier 2 | Active-passive cross-cloud | less than 15min | less than 1hr | Medium-High |
| Tier 3 | Backup + manual failover | less than 24hr | less than 4hr | Medium |
| Tier 4 | Backup only | less than 24hr | less than 24hr | Low |
DR automation:
- Automated failover triggers (health checks, circuit breakers)
- Runbook automation for failover procedures
- Regular DR testing schedule (quarterly minimum)
Output
Deliver:
- Architecture diagram: showing all environments, connectivity, data flow
- Workload placement matrix: workload to environment with justification
- Connectivity plan: network topology, bandwidth, latency requirements
- Security model: identity, network, data security per environment
- Cost estimate: TCO comparison, per-environment breakdown
- Migration plan: phased approach with rollback procedures (if applicable)
Platform-Specific Notes
OpenStack Integration
- Services: Nova (compute), Neutron (network), Cinder (block storage), Swift (object), Keystone (identity)
- Hybrid identity: Keystone federation with cloud IAM
- Networking: Provider networks, VLAN/VXLAN for multi-tenant isolation
AWS Hybrid
- Outposts: AWS hardware in on-prem data center
- EKS Anywhere: Kubernetes on-prem with EKS compatibility
- Direct Connect: Dedicated network connection
Azure Hybrid
- Azure Arc: Manage resources across environments from Azure
- Azure Stack: On-prem Azure services
- ExpressRoute: Dedicated private connection
GCP Hybrid
- Anthos: Multi-cloud Kubernetes management
- Distributed Cloud: GCP services on-prem
- Cloud Interconnect: Dedicated network connection