Architecture Overview
SmolVM follows a clean layered architecture:Component Responsibilities
| Component | Responsibility |
|---|---|
facade.py | User-facing SmolVM class with context managers and convenience methods |
vm.py | Core orchestrator (SmolVMManager class) managing VM lifecycle, networking, and state |
api.py | Low-level Firecracker HTTP API client over Unix sockets |
storage.py | SQLite-based state persistence for VMs, IPs, and port mappings |
network.py | Linux networking (TAP devices, NAT, port forwarding via nftables) |
host.py | Environment validation and Firecracker binary management |
The
SmolVMManager class in vm.py is exported as part of the public API but is primarily for advanced use cases. Most users should use the high-level SmolVM facade class instead.Data Flow
VM Creation Flow
- facade.SmolVM → Initializes
vm.SmolVMManager - vm.SmolVMManager.create() →
- Validates configuration
- Determines effective backend (Firecracker/QEMU)
- Materializes rootfs (isolated copy if needed)
- storage.StateManager.create_vm() →
- Creates VM record in SQLite database
- Sets status to
CREATED
- storage.StateManager.reserve_ssh_port() →
- Allocates port from pool (2200-2999)
- storage.StateManager.allocate_ip() → (Firecracker only)
- Allocates IP from pool (172.16.0.2-254)
- Derives TAP device name from last octet (e.g.,
tap2)
- network.NetworkManager → (Firecracker only)
- Creates TAP device:
ip tuntap add tap2 mode tap - Configures TAP:
ip addr add 172.16.0.1/32 dev tap2 - Adds route:
ip route add 172.16.0.2 dev tap2 - Sets up NAT:
nft add rule nat postrouting ... - Configures SSH forwarding:
nft add rule nat prerouting ...
- Creates TAP device:
- storage.StateManager.update_vm() →
- Stores network configuration in database
- Returns
VMInfowith complete state
VM Start Flow
Firecracker Backend:- vm.SmolVMManager.start() →
- Retrieves VM info from database
- Validates VM is in
CREATEDorSTOPPEDstate
- vm.SmolVMManager._start_firecracker() →
- Spawns
firecracker --api-sock /tmp/fc-vm-xxxxx.sock - Redirects stdout/stderr to log file
- Detaches from terminal using
start_new_session=True
- Spawns
- api.FirecrackerClient.wait_for_socket() →
- Polls until Unix socket exists
- Validates socket is responsive (
GET /)
- api.FirecrackerClient.set_boot_source() →
PUT /boot-sourcewith kernel path and boot args
- api.FirecrackerClient.set_machine_config() →
PUT /machine-configwith vCPU and memory settings
- api.FirecrackerClient.add_drive() →
PUT /drives/rootfswith rootfs path
- api.FirecrackerClient.add_network_interface() →
PUT /network-interfaces/eth0with TAP device and MAC
- api.FirecrackerClient.start_instance() →
PUT /actions {"action_type": "InstanceStart"}- Firecracker boots Linux kernel
- storage.StateManager.update_vm() →
- Updates status to
RUNNING - Stores process PID and socket path
- Updates status to
- vm.SmolVMManager._start_qemu() →
- Finds
qemu-system-aarch64orqemu-system-x86_64binary - Builds command line arguments
- Configures user-mode networking (
-netdev user,hostfwd=tcp:...) - Spawns QEMU process with HVF acceleration (macOS) or KVM (Linux)
- Finds
- Process warmup check →
- Polls for 2 seconds to detect immediate crashes
- Validates process didn’t exit with error
- storage.StateManager.update_vm() →
- Updates status to
RUNNING - Stores process PID
- Updates status to
Command Execution Flow
- facade.SmolVM.run() →
- Delegates to SSH executor
- ssh.SSHExecutor.execute() →
- Connects to
localhost:<ssh_host_port>(QEMU) - OR connects to
<guest_ip>:22(Firecracker) - Authenticates using default SSH key
- Executes command via SSH session
- Captures stdout/stderr
- Returns
CommandResultwith exit code and output
- Connects to
VM Teardown Flow
- vm.SmolVMManager.stop() →
- Retrieves VM info
- Validates VM is
RUNNING
- Firecracker shutdown:
api.FirecrackerClient.send_ctrl_alt_del()(graceful)- Waits 0.5s for guest shutdown
os.kill(pid, SIGKILL)(force)- Unlinks Unix socket
- QEMU shutdown:
os.kill(pid, SIGTERM)(graceful)- Waits up to timeout
os.kill(pid, SIGKILL)if still running
- storage.StateManager.update_vm() →
- Sets status to
STOPPED - Clears PID field
- Sets status to
- vm.SmolVMManager.delete() →
- Calls
stop()if running vm._cleanup_resources()→- Removes nftables rules (Firecracker)
- Deletes TAP device (Firecracker)
- Releases IP lease
- Releases SSH port
- Removes isolated disk if applicable
storage.StateManager.delete_vm()→- Deletes VM record (cascades to IP/port leases)
- Calls
State Management
SmolVM uses SQLite for durable state persistence across process restarts.Database Schema
vms table:State Transitions
Concurrency Safety
SmolVM uses SQLite’sEXCLUSIVE transaction mode for writes to ensure atomic IP/port allocation:
Networking Architecture
Firecracker Networking (Linux)
Each VM gets:- Dedicated TAP device:
tap<N>where N is last octet of guest IP - Private IP: Allocated from
172.16.0.2-254pool - Gateway IP:
172.16.0.1(host) - NAT: Outbound internet access via nftables
- SSH forwarding: Host port
2200-2999→ guest port22
QEMU Networking (macOS/Linux)
QEMU uses user-mode networking (slirp):- No TAP devices: Networking handled entirely by QEMU
- Guest IP: Fixed at
10.0.2.15 - Gateway IP:
10.0.2.2(QEMU virtual gateway) - Port forwarding: Built into QEMU via
-netdevhostfwd
Disk Management
SmolVM supports two disk modes:Isolated Mode (Default)
Each VM gets a private copy of the rootfs:- On
create(), check ifdata_dir/disks/<vm_id>.ext4exists - If not,
shutil.copy2(config.rootfs_path, instance_disk_path) - Update
config.rootfs_pathto point to the isolated copy - Firecracker/QEMU mounts this copy (writable)
- On
delete(), remove the copy (unlessretain_disk_on_delete=True)
- Complete isolation between VMs
- No state pollution across VM instances
- Safe for concurrent VMs
- Disk usage: N × rootfs size
- Copy overhead: ~100-200ms on first boot
Shared Mode
All VMs mount the same rootfs image:- Firecracker/QEMU mounts
config.rootfs_pathdirectly (writable) - All changes persist in the base image
- Multiple VMs share the same filesystem state
- Read-only workloads (mount as read-only externally)
- Persistent development environment
- Lower disk usage
- Cross-VM contamination
- Concurrent writes can corrupt filesystem
Backend Abstraction
SmolVM abstracts two backends with a unified API:Firecracker Backend
- Platform: Linux only (requires KVM)
- Hypervisor: Firecracker microVM monitor
- Networking: TAP devices + nftables NAT
- Boot time: ~2.1s to SSH ready
- API: HTTP over Unix socket
QEMU Backend
- Platform: macOS (HVF) and Linux (KVM)
- Hypervisor: QEMU full system emulator
- Networking: User-mode networking (slirp)
- Boot time: ~3-5s to SSH ready
- API: Process management (no API socket)
Backend Selection
- Explicit
backendparameter SMOLVM_BACKENDenvironment variable- Auto-detect:
Darwin→qemu,Linux→firecracker
Security Considerations
Isolation Boundary
- Hardware virtualization: KVM (Linux) or Hypervisor.framework (macOS)
- Separate kernel: Each VM runs its own Linux kernel
- Network isolation: VMs cannot access each other directly
- Process isolation: VM processes run in separate PID namespaces
Attack Surface
Smaller than containers:- No shared kernel (vs Docker)
- No syscall translation (vs gVisor)
- Minimal device emulation (vs traditional VMs)
- Hypervisor bugs (Firecracker/QEMU)
- Kernel vulnerabilities (guest → host escalation)
- Network escapes (TAP device bugs)
SSH Trust Model
SeeSECURITY.md in the source repository for full security policy.
Extension Points
Custom State Backend
Replace SQLite with your own database:Custom Network Manager
Implement alternative networking (e.g., OVS, CNI):Custom Backend
Add support for other hypervisors (e.g., Cloud Hypervisor, crosvm):Performance Design
Key optimizations:- Fast-path shutdown: SIGKILL instead of graceful shutdown for ephemeral VMs
- Lazy resource allocation: Network setup only on first boot
- Connection pooling: Reuse SSH connections for multiple commands
- Minimal device emulation: Firecracker only emulates virtio devices
- Direct kernel boot: No bootloader overhead