All case studies
Engeem SAS

Multi-Tenant SaaS Control Plane

Designed a tenant provisioning platform with Spring Boot 3, Keycloak OIDC, and HashiCorp Vault, automating onboarding across 15+ microservices.

4 min read
Spring BootKeycloakHashiCorp VaultKubernetesOIDCMulti-tenancy

Share this post

LinkedIn

Substack button copies a ready-to-paste draft snippet and opens the editor.

Overview

At Engeem SAS, I architected a multi-tenant SaaS control plane that automates the full lifecycle of tenant provisioning — from organization creation through environment setup, secrets management, and RBAC policy enforcement. The platform manages 15+ microservices and uses domain-driven design to maintain strict service boundaries.

Problem

Enterprise customers needed isolated environments with:

  • Zero-touch onboarding: A new tenant should go from signup to fully provisioned in under 60 seconds
  • Secrets management: No hardcoded credentials anywhere in the stack — everything rotated automatically
  • Fine-grained RBAC: Tenants can define custom roles scoped to their organization
  • Audit compliance: Every provisioning action must be traceable for SOC 2 readiness

The legacy approach was a collection of scripts and manual Jira tickets that took 2–3 days per new tenant.

Architecture

The control plane is decomposed into three domain services, each owning its aggregate root.

Data Flow

Tenant Provisioning Sequence

Each step is orchestrated via a saga pattern with compensating transactions. If Vault fails, the Keycloak realm is rolled back.

API Design

The control plane exposes a RESTful API following resource-oriented design:

POST   /api/v1/tenants                  # Create tenant
GET    /api/v1/tenants/:id              # Get tenant details
POST   /api/v1/tenants/:id/environments # Provision environment
DELETE /api/v1/tenants/:id/environments/:env  # Deprovision
POST   /api/v1/tenants/:id/roles        # Create custom role
GET    /api/v1/tenants/:id/audit-log    # Audit trail

All endpoints are secured with OIDC bearer tokens issued by Keycloak, scoped per tenant.

Scaling Strategy

  • Stateless services: Each microservice is horizontally scalable behind Kubernetes Ingress
  • Event-driven provisioning: Long-running provisioning tasks use async events (RabbitMQ) to avoid HTTP timeout issues
  • Database per tenant: Each tenant gets an isolated PostgreSQL schema, managed by Flyway migrations triggered at provisioning time
  • Connection pooling: HikariCP with tenant-aware routing via AbstractRoutingDataSource

Reliability

  • Saga orchestration: Multi-step provisioning with compensating rollbacks prevents partial states
  • Health checks: Each service exposes /actuator/health with dependency-aware status (Keycloak reachable, Vault unsealed, DB connected)
  • Circuit breakers: Resilience4j circuit breakers on all external calls (Keycloak, Vault, K8s API) with fallback strategies
  • Provisioning SLA: < 60 seconds from API call to fully operational tenant environment

Security

  • OIDC everywhere: Service-to-service authentication via Keycloak client credentials, no shared secrets
  • Vault-managed secrets: Database credentials, API keys, and TLS certificates are all dynamic secrets with automatic rotation
  • Namespace isolation: Kubernetes NetworkPolicies enforce strict tenant-to-tenant isolation
  • Zero hardcoded credentials: Eliminated all static secrets from codebase and CI/CD pipelines via Vault integration

Trade-offs

| Decision | Benefit | Cost | |----------|---------|------| | Schema-per-tenant | Strong isolation, easy compliance | More complex migrations, higher DB overhead | | Keycloak as IdP | Standards-compliant OIDC, rich RBAC | Operational complexity of running Keycloak | | Saga over 2PC | Loose coupling, individual service resilience | Eventually consistent, complex error handling | | Vault for all secrets | Dynamic rotation, audit trail | Additional infrastructure to operate |

Lessons Learned

  1. Domain boundaries pay off — Strict DDD boundaries between Organization, Environment, and Cluster services meant teams could work in parallel and deploy independently.

  2. Automate day-2 operations early — Tenant deprovisioning and secret rotation were scoped from the start, not bolted on later. This saved weeks of rework.

  3. Keycloak Admin API is powerful but underdocumented — We contributed back several documentation PRs after learning the hard way about realm-level client scope propagation.

  4. Observability-driven development — Every provisioning step emits structured logs and metrics. When a provisioning fails, we can trace the exact step and compensating action.

  5. Infrastructure as code for tenant resources — Using Terraform modules for Vault secrets engines and K8s namespaces ensured reproducibility across staging and production.

Found this useful?

Get more case studies and architecture deep-dives every week — from production fintech systems.

No spam. Unsubscribe anytime.