Skip to main content

Infrastructure Planning

Updated Sep 15, 2023 ·

Overview

OpenStack needs careful planning because it runs on real infrastructure and depends on many supporting components.

  • OpenStack runs on existing infrastructure
  • Services communicate through APIs
  • Control plane and user plane are separated

OpenStack depends on servers, networks, storage, and supporting software to function reliably. The goal is to design a stable control plane, reliable compute resources, and scalable network and storage backends.

Core Infrastructure Requirements

OpenStack services are mostly API-driven applications running on Linux. They need several core infrastructure components to work correctly.

ComponentPurposeCommon Options
Web serversExpose service APIsApache, Nginx
DatabasesStore service state and dataMySQL, MariaDB, PostgreSQL
Message queuesHandle internal async messagingRabbitMQ
Cache servicesImprove performanceMemcached

Control Plane

The control plane contains all management and decision-making components in OpenStack.

  • API endpoints for all services
  • Authentication and authorization
  • Scheduling and coordination logic

Services like compute and networking split their components between control plane and user plane. Agents running on compute nodes cannot directly access databases. Instead, conductor services in the control plane relay and filter communication.

Control Plane High Availability

The control plane uses several mechanisms to stay stable and highly available. These mechanisms keep OpenStack services accessible during failures, reduce downtime, and support reliable scaling as the cloud grows.

MechanismPurposeCommon Tools
Database clusteringPrevent data loss and improve availabilityGalera
Load balancing for APIsDistribute traffic and detect failuresHAProxy
Service orchestrationMonitor and restart failed servicesPacemaker

Compute Resource Pool Planning

OpenStack supports multiple compute models, with virtual machines being the most common. Compute resources can be grouped, scheduled, and sized based on workload requirements and host capabilities.

FactorDescription
Hypervisor support
  • Supports multiple hypervisors such as KVM and ESXi
  • The choice depends on cost, team skills, and existing workloads
Compute pools
  • Hosts can be grouped into separate resource pools
Capability-based scheduling
  • Instances are scheduled based on host features like SSDs or GPUs

Availability zones and host aggregates ensure that workloads run on the most suitable hosts and improve performance and efficiency.

Compute Hardware Considerations

When selecting compute hardware, several factors must be considered to ensure optimal performance and resource utilization.

FactorDescription
Instance density & workload patterns
  • Flavors define VM CPU, memory, and storage sizes
  • Server configuration should consider cost and performance
  • Use COTS rack servers, blade servers, or HCI as appropriate
  • Plan for power and cooling density
Overcommit ratios
  • CPU default ratio is 16:1
  • RAM default ratio is 1.5:1
Hardware acceleration
  • GPUs for compute-intensive workloads
  • SSDs and NVMe drives for faster storage performance

These factors define how VMs are sized and how resources are allocated. Overcommitment allows physical hardware to host more virtual resources than its actual capacity, and improves efficiency while maintaining predictable performance.

The table below illustrates how overcommitment works in practice:

ResourcePhysical HardwareVirtual AllocationNotes
CPU12 cores192 vCPUsWith default 16:1 ratio, each physical core supports 16 virtual CPUs
RAM256 GB384 GBWith 1.5:1 ratio, virtual memory exceeds physical memory
VM Flavor Example4 vCPUs, 8 GB RAM48 instancesScheduler can run 48 M1 Large instances on this node

Overcommitment ratios can be changed based on workload requirements or operational priorities. This helps administrators maintain a balance between efficiency, cost, and performance.

Physical Network Design

The physical network, or underlay, carries all OpenStack traffic. It must be reliable, redundant, and high-performing to support production workloads.

FactorDetails
Traffic segmentation
  • Management network
  • External/provider network
  • Storage network
  • Internal/user network
VLAN vs VXLAN
  • VLAN for smaller or legacy deployments
  • VXLAN for large-scale overlay networks
  • VXLAN allows easier network isolation and scaling
Redundant paths
  • Multiple NICs on compute nodes
  • Redundant switching layers
Performance & latency
  • Throughput planning to meet workload demands
  • Low latency for real-time systems
  • Quality of Service for critical traffic
Network acceleration
  • Intelligent NICs
  • SR-IOV
  • Hardware VTEPs
  • VXLAN termination in switches

High network performance is essential for demanding workloads such as telecom or real-time systems. Acceleration technologies improve throughput and reduce latency, which ensures predictable and efficient operations.

Software-Defined Networking

OpenStack networking is mostly implemented in software, which allows flexible and scalable network management.

FactorDetails
Virtual networksOpenStack creates virtual networks and subnets for tenant isolation and traffic control
VXLAN tunnelingHosts communicate over VXLAN tunnels to enable overlay networks across physical nodes
SDN optionsMultiple software-defined networking solutions are available

For small deployments, Linux Bridge is commonly used, while Open vSwitch is preferred for larger environments. More complex networks can integrate with external SDN platforms to provide advanced features and improved scalability.

OpenStack supports both open-source and commercial SDN controllers. Examples include:

  • OpenContrail
  • OpenDaylight
  • VMware NSX
  • Cisco ACI

Storage Backend Planning

Storage design should match existing infrastructure and workload needs. A well-planned storage backend provides compute nodes with fast and reliable access to data.

TypeConsiderations
Hardware storage backends
  • Disk arrays for capacity and performance
  • Redundancy and availability
  • Common vendors: Dell EMC, NetApp, HPE
Software-defined storage
  • Scalable and flexible
  • Reliable and highly available
  • Examples: LVM, Ceph, GlusterFS
Data path design
  • Connections between compute and storage nodes
  • Local storage on compute nodes
  • Network options: Fibre Channel, Ethernet, iSCSI, FCoE

Planning for Additional Services

Core services are only part of a full OpenStack deployment. Additional services need careful planning to ensure scalability and reliability.

  • Monitoring and telemetry

    • Tracks system performance
    • Supports proactive issue detection
    • Adds extra network and storage load
  • Database and container services

    • May require large instances
    • Needs replication for high availability
  • Future expansion needs

    • Planning ahead prevents resource shortages
    • Supports smooth scaling of workloads
    • Prepares for new services and users