Docker for ML Engineers: A Practical Guide from Zero to Production
You trained an ML model. It works in your notebook. Now you need to serve it as an API for the rest of the team to consume. You build a FastAPI API, run it locally, everything looks great. Then you send the repo to a colleague and…
“What Python version are you on? numpy throws an error here. And the .pkl model — where is it?”
This guide takes you from that situation — an API that only works on your machine — to a containerized application ready for production. There are 12 real problems you’ll face when putting a model into production, each one leading to the next Docker concept: Dockerfile, .dockerignore, layer caching, Compose, volumes, networking, Redis, security, container registry, CI/CD, debugging, and cloud deployment.
We’ll use a FastAPI API that serves predictions from an ML model as our running example. If you work on backend for AI applications, you’ll feel right at home.
Want to follow along hands-on? All example files are available on GitHub. Clone the repo, install dependencies with
pip install -r requirements.txt, runpython train_model.pyto generate the model, and follow along.
The Starting Point: An API That Works (on Your Machine)
We have a simple API: it receives data, runs a trained model, and returns a prediction.
# app/main.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI(title="ML Prediction API")
model = joblib.load("models/model.pkl")
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
model_config = {"protected_namespaces": ()}
prediction: float
model_version: str = "1.0.0"
@app.get("/health")
def health():
return {"status": "healthy"}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
X = np.array(request.features).reshape(1, -1)
prediction = model.predict(X)[0]
return PredictionResponse(prediction=float(prediction))
To run locally:
pip install fastapi uvicorn joblib scikit-learn numpy
uvicorn app.main:app --host 0.0.0.0 --port 8000
It works. But try replicating this on another machine. Python 3.11 vs 3.12, a different scikit-learn version, incompatible numpy, a model trained with another library version. Chaos.
First problem: how do you guarantee this API runs the same everywhere?
What Docker Is (and Why It Solves This Problem)
Docker is a platform that packages your application together with everything it needs — code, runtime, libraries, model, configuration — into a container. A container is an isolated, portable environment: it works on your machine, your colleague’s machine, the staging server, and in production on AWS.
The most useful analogy: before shipping containers, every type of cargo required a different transport method. Containers standardized everything — any cargo, any ship, any port. Docker did the same for software.
Essential Concepts (only what you need for now)
- Image: a read-only template with everything your app needs. Think of it as a snapshot of the environment
- Container: a running instance of an image. Think of it as an isolated process
- Dockerfile: the recipe for building an image
- Registry: where images are stored (Docker Hub is the default public one)
More concepts will appear as needed. Let’s get to what matters.
Problem 1: “It Works on My Machine”
Solution: Your First Dockerfile
Create a file called Dockerfile (no extension) at the project root:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Line by line:
| Instruction | What it does |
|---|---|
FROM python:3.12-slim | Starts from an official Python image (slim version, ~150MB) |
WORKDIR /app | Creates and enters the /app directory inside the container |
COPY requirements.txt . | Copies the dependencies file into the image |
RUN pip install ... | Installs dependencies (during build, not at runtime) |
COPY . . | Copies all application code |
EXPOSE 8000 | Documents that the app listens on port 8000 |
CMD [...] | Command executed when the container starts |
Now build and run:
# Build the image (the trailing dot is the build context — current directory)
docker build -t ml-api:v1 .
# Run the container
docker run -d --name ml-api -p 8000:8000 ml-api:v1
# Test
curl http://localhost:8000/health
# {"status": "healthy"}
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"features": [1.5, 2.3, 0.8, 4.1]}'
# {"prediction": 42.7, "model_version": "1.0.0"}
Done. Anyone with Docker installed runs docker run and has the API working. Same Python version, same libraries, same model.
Basic Commands You’ll Use All the Time
docker ps # Running containers
docker ps -a # All (including stopped)
docker logs ml-api # View logs
docker logs -f ml-api # Follow logs in real time
docker exec -it ml-api bash # Open a shell inside the container
docker stop ml-api # Stop
docker rm ml-api # Remove
But there’s a problem…
Problem 2: Docker Is Copying Junk into the Image
Run the build and notice the first line:
docker build -t ml-api:v1 .
# [+] Building ... transferring context: 500MB
The COPY . . copies everything from the directory into the image: .git, .venv, __pycache__, training datasets, model checkpoints, .env with credentials. Everything.
This causes three problems: slow builds (sending hundreds of MB to the daemon takes time), an image larger than necessary, and — worst of all — secrets leaking into the image you publish.
Solution: .dockerignore
It works exactly like .gitignore. Create it at the project root:
.git
.venv
__pycache__
*.pyc
.pytest_cache
.env
.env.*
*.md
LICENSE
.vscode
.idea
notebooks/
data/raw/
Now docker build sends only what matters: code, requirements, and the model. Faster builds, a cleaner image, no leaking secrets.
Tip: Always create
.dockerignorealongside yourDockerfile. It’s as important as.gitignore— and frequently forgotten.
Problem 3: Every Change Rebuilds Everything
You change one line in main.py, run docker build, and… it reinstalls all dependencies from scratch. The pip install takes 2 minutes every time.
Solution: Understanding Layer Caching
Docker builds images in layers. Each Dockerfile instruction creates a layer. If a layer changes, all layers after it are invalidated.
Look at our Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Notice the order: first COPY requirements.txt and pip install, then COPY . . with the code. This is intentional.
If we did COPY . . before pip install, any change in main.py would invalidate the cache and force reinstallation of all dependencies. With the correct order, changing the code doesn’t affect the pip install layer — rebuild in seconds.
Golden rule: order instructions from least changed to most changed.
Test it: change something in main.py and run docker build again. Notice that the pip install layers come from cache (CACHED), and only the last layers are rebuilt.
Problem 4: The API Needs a Database
The API works, but now you need to save predictions for auditing and monitoring. You need PostgreSQL. You could install it on your machine, but… remember “it works on my machine”?
Solution: Docker Compose
Docker Compose lets you define and run multiple containers in a single YAML file. Create a compose.yaml:
name: ml-api
services:
api:
build: .
ports:
- "8000:8000"
environment:
DATABASE_URL: postgresql://app:secret@db:5432/predictions
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: predictions
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d predictions"]
interval: 5s
timeout: 3s
retries: 5
docker compose up -d # Start everything in the background
docker compose logs -f # Follow logs
docker compose ps # Service status
docker compose down # Stop everything
Pay attention to depends_on with condition: service_healthy. Without it, the API may try to connect before Postgres is ready — one of the most common causes of errors in multi-container setups.
What Changed in the API
Now the API saves predictions:
# app/main.py (updated version)
from fastapi import FastAPI, Depends
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
import os
DATABASE_URL = os.getenv("DATABASE_URL", "").replace(
"postgresql://", "postgresql+asyncpg://"
)
engine = create_async_engine(DATABASE_URL)
async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
X = np.array(request.features).reshape(1, -1)
prediction = float(model.predict(X)[0])
async with async_session() as session:
session.add(PredictionLog(
features=request.features,
prediction=prediction,
))
await session.commit()
return PredictionResponse(prediction=prediction)
But there’s a problem…
Problem 5: Data Disappears on Restart
docker compose down
docker compose up -d
# Empty database! All predictions gone.
Containers are ephemeral. When removed, everything inside them disappears.
Solution: Volumes
Volumes are Docker’s way of persisting data beyond the container lifecycle.
services:
db:
image: postgres:16-alpine
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: predictions
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d predictions"]
interval: 5s
timeout: 3s
retries: 5
volumes:
db_data:
Now docker compose down keeps the data. Only docker compose down -v removes volumes (careful!).
Mount Types
| Type | When to Use | Example |
|---|---|---|
| Volume | Production data (database, uploads) | db_data:/var/lib/postgresql/data |
| Bind mount | Development (hot-reload for code) | ./app:/app |
| tmpfs | Temporary/sensitive data (RAM only) | tmpfs: [/tmp] |
For development, use a bind mount for the source code:
services:
api:
build: .
volumes:
- ./app:/app/app # Source code mounted for hot-reload
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
ports:
- "8000:8000"
Now code changes reflect instantly without a rebuild.
Problem 6: The API Can’t Find the Database
When you wrote postgresql://app:secret@db:5432/predictions, how does the API know what db is? It’s not a hostname registered in any DNS.
Solution: Docker Networking
Docker Compose automatically creates an isolated network for the services defined in the file. Within that network, each service is reachable by its name — automatic DNS resolution.
compose.yaml:
services:
api: → reachable as "api" on the internal network
db: → reachable as "db" on the internal network
That’s why @db:5432 works. Docker resolves db to the internal IP of the PostgreSQL container.
When You Need Custom Networks
As you add more services, network separation becomes a security concern:
services:
api:
networks:
- frontend
- backend
db:
networks:
- backend # Only reachable by the API, not from outside
redis:
networks:
- backend
networks:
frontend:
backend:
The database is not on the frontend network — impossible to reach it directly from outside. Only the API, which is on both networks, bridges the gap.
Port Publishing
Ports inside the Docker network are internal. To access from outside (your machine, the internet), use ports:
services:
api:
ports:
- "8000:8000" # Exposed to the host
db:
expose:
- "5432" # Only inside the Docker network (better for security)
Security tip: Bind to localhost when it shouldn’t be public:
"127.0.0.1:5432:5432".
Problem 7: Repeated Predictions Are Slow
Your model takes 200ms per prediction. Many clients send the same features. It doesn’t make sense to reprocess — you need caching.
Solution: Add Redis to the Stack
name: ml-api
services:
api:
build: .
ports:
- "8000:8000"
environment:
DATABASE_URL: postgresql://app:secret@db:5432/predictions
REDIS_URL: redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
networks:
- frontend
- backend
db:
image: postgres:16-alpine
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: predictions
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d predictions"]
interval: 5s
timeout: 3s
retries: 5
networks:
- backend
redis:
image: redis:7-alpine
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
networks:
- backend
volumes:
db_data:
networks:
frontend:
backend:
And in the code:
import hashlib, json, redis.asyncio as redis
redis_client = redis.from_url(os.getenv("REDIS_URL", "redis://localhost:6379"))
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
cache_key = hashlib.md5(json.dumps(request.features).encode()).hexdigest()
cached = await redis_client.get(cache_key)
if cached:
return PredictionResponse(prediction=float(cached))
X = np.array(request.features).reshape(1, -1)
prediction = float(model.predict(X)[0])
await redis_client.setex(cache_key, 3600, str(prediction))
async with async_session() as session:
session.add(PredictionLog(features=request.features, prediction=prediction))
await session.commit()
return PredictionResponse(prediction=prediction)
Repeated predictions: from 200ms to <1ms. And all with docker compose up -d.
Note the depends_on conditions:
service_healthy— waits for the healthcheck to pass (for the database, which needs to initialize)service_started— only waits for the container to start (sufficient for Redis)
Problem 8: “Is It Running as Root?”
You show the setup to the security team. First question:
“Is the container running as root?”
docker exec ml-api whoami
# root
Yes. By default, Docker containers run as root. If someone exploits a vulnerability in your API, they have root access inside the container — and potentially on the host.
Solution: Container Hardening
Update the Dockerfile:
FROM python:3.12-slim
WORKDIR /app
# Create non-root user
RUN groupadd -r appuser && useradd --no-log-init -r -g appuser appuser
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY --chown=appuser:appuser . .
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
# Run as non-root
USER appuser
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
What About the Database Password?
Look at compose.yaml: POSTGRES_PASSWORD: secret — the password is in plain text in a file that goes to Git.
Use Docker Secrets for sensitive data:
services:
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: predictions
POSTGRES_USER: app
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
secrets:
db_password:
file: ./secrets/db_password.txt # This file does NOT go to Git
Add secrets/ to .gitignore and .dockerignore.
Security Checklist
Best practices recommended by OWASP and Docker:
- Run as non-root — never use
--privilegedin production - Pin image versions —
python:3.12-slim, notpython:latest - Use secrets — never put passwords in
ENVor in the image - Limit resources — containers without limits can take down the host
- Scan images —
docker scout cves ml-api:v2 - Read-only filesystem —
docker run --read-only --tmpfs /tmp - Drop capabilities —
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE - Separate networks — database never on the same network as the frontend
- Bind ports to localhost —
"127.0.0.1:5432:5432"when not public - Keep everything updated — Docker, base images, dependencies
Resource Limits in Compose
services:
api:
deploy:
resources:
limits:
memory: 1G # ML models can consume a lot of RAM
cpus: "2.0"
reservations:
memory: 512M
cpus: "1.0"
Without limits, a model with a memory leak can take down all other containers (and the host).
Problem 9: I Need to Share This Image
You built the image, tested it locally, everything works. Now a colleague wants to run the same API — or you need to deploy on a server. How do you share it?
Sending the code and asking someone to run docker build works, but they’ll need the model.pkl, the dependencies, and hope everything goes right. The whole point of Docker is to avoid that.
Solution: Container Registry
A container registry is like GitHub, but for Docker images. You push the image; anyone who wants to run it does pull. The model, dependencies, code — everything goes together inside the image.
We’ll use the GitHub Container Registry (GHCR), which is free for public repositories:
# 1. Login to GHCR (use a Personal Access Token with write:packages permission)
docker login ghcr.io -u YOUR_USERNAME
# 2. Tag the image with the registry address
docker tag ml-api:v1 ghcr.io/YOUR_USERNAME/ml-api:v1
# 3. Push
docker push ghcr.io/YOUR_USERNAME/ml-api:v1
Done. Now anyone (or any server) can run:
docker pull ghcr.io/YOUR_USERNAME/ml-api:v1
docker run -d -p 8000:8000 ghcr.io/YOUR_USERNAME/ml-api:v1
No installing Python, no installing dependencies, no separate model.pkl needed. Everything is inside the image.
Docker Hub vs GHCR: Docker Hub is the most popular registry (where
python:3.12-slim,postgres:16-alpinelive). GHCR is convenient if your code is already on GitHub — permissions follow the repository. Both are free for public images.
What About Corporate Environments?
GHCR and Docker Hub are great for open-source projects, but if your model is proprietary, you need a private registry. Major clouds offer integrated registries:
| Cloud | Registry | Login |
|---|---|---|
| AWS | Amazon ECR | aws ecr get-login-password | docker login |
| GCP | Artifact Registry | gcloud auth configure-docker |
Private by default, with access control via IAM — the same permissions your team already uses in the cloud.
In practice, nobody does git clone + docker build on the production server. The real flow is: dev pushes code → CI/CD builds the image and pushes to the registry → production server does docker pull and runs. The registry is the middle ground between code and deployment.
Versioning with Tags
Tags are like versions of your image:
docker tag ml-api:v1 ghcr.io/YOUR_USERNAME/ml-api:v1
docker tag ml-api:v1 ghcr.io/YOUR_USERNAME/ml-api:latest
docker push ghcr.io/YOUR_USERNAME/ml-api:v1
docker push ghcr.io/YOUR_USERNAME/ml-api:latest
This is especially useful for ML models: retrained the model? Build a new image with v2, push it. The old model stays available at v1 if you need to roll back.
Problem 10: I Need to Automate Deployment
Doing docker build and docker push manually works, but it’s error-prone. Forgot to build? Published the wrong image? In production, you want this to be automatic.
Production Dockerfile
Before automating, let’s put together the final Dockerfile — applying everything we’ve learned:
# syntax=docker/dockerfile:1
FROM python:3.12-slim
WORKDIR /app
RUN groupadd -r appuser && useradd --no-log-init -r -g appuser appuser
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-cache-dir -r requirements.txt
COPY --chown=appuser:appuser . .
# Training the model inside the container ensures version compatibility
RUN python train_model.py
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
USER appuser
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Highlights:
RUN python train_model.py— training inside the container eliminates version incompatibilities (e.g., trained with scikit-learn 1.8 on Mac, container has 1.5 → error)--mount=type=cache— pip cache persists across builds (BuildKit)PYTHONUNBUFFERED=1— logs appear immediately, no buffering--workers 4— multiple workers for production (Uvicorn with workers requiresuvicorn[standard])- Non-root user, health check, slim image
What about models that take hours to train? The
RUN python train_model.pyworks for small models like ours. For large models, the MLOps pattern is to use a model registry (MLflow, Weights & Biases, or simply S3/GCS). CI/CD downloads the trained model during the build:RUN aws s3 cp s3://my-bucket/models/model-v2.pkl models/model.pkl. Training happens in another pipeline (with GPU), and the Dockerfile only packages the result.
CI/CD with GitHub Actions
Automate build and push on every commit to main:
name: Build and Push
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
ghcr.io/${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
Every push to main: automatic build, smart caching, image versioned by commit SHA.
Production compose.yaml
In production, instead of build: ., use the image from the registry:
name: ml-api-prod
services:
api:
image: ghcr.io/your-username/ml-api:latest
ports:
- "8000:8000"
environment:
DATABASE_URL: postgresql://app@db:5432/predictions
REDIS_URL: redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
restart: unless-stopped
deploy:
resources:
limits:
memory: 1G
cpus: "2.0"
networks:
- frontend
- backend
db:
image: postgres:16-alpine
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: predictions
POSTGRES_USER: app
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d predictions"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
networks:
- backend
redis:
image: redis:7-alpine
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
restart: unless-stopped
deploy:
resources:
limits:
memory: 256M
cpus: "0.5"
networks:
- backend
volumes:
db_data:
redis_data:
networks:
frontend:
backend:
secrets:
db_password:
file: ./secrets/db_password.txt
Problem 11: Something Went Wrong in Production
The API returns 500. The container is running, but predictions fail. How do you investigate?
Solution: Debugging and Troubleshooting
First stop — logs:
docker compose logs api # API logs
docker compose logs -f api # Follow in real time
docker compose logs --tail 50 api # Last 50 lines
Need a shell inside the container:
docker compose exec api bash
# or, if the image doesn't have bash:
docker compose exec api sh
Container has no shell? (minimal images):
docker debug ml-api
# Opens a shell with debug tools, without modifying the container
Container crashes immediately:
# Run interactively to see the error
docker run -it --rm ml-api:v2 bash
# Inside the container, try running manually:
python -c "from app.main import app; print('OK')"
Check resource usage:
docker stats # CPU, memory, network in real time
docker system df # Total disk usage
Common Problems and Solutions
Port already in use:
lsof -i :8000 # Who's using it?
docker ps # Another container on the same port?
Out of disk space:
docker system df # Diagnosis
docker system prune -a # Clean everything unused
docker volume prune # Clean orphaned volumes
docker builder prune # Clean build cache
Permission denied on file:
# Check ownership inside the container
docker exec ml-api ls -la /app/models/
# Fix: ensure COPY uses --chown
# COPY --chown=appuser:appuser . .
Build ignores changes (stale cache):
docker build --no-cache . # Force full rebuild
docker compose build --no-cache # Same effect with Compose
Problem 12: I Need This Running in the Cloud
The image is in the registry, CI/CD works, but the API still runs on your machine. To serve real users, it needs to be in the cloud.
Let’s look at the two simplest options — no Kubernetes, no unnecessary complexity.
Option 1: Google Cloud Run
Cloud Run is the simplest way to deploy a container. Serverless: scales automatically (including to zero — you don’t pay when nobody is using it).
# 1. Authenticate and configure
gcloud auth login
gcloud config set project YOUR_PROJECT
# 2. Build and push to Artifact Registry (or use the GHCR image)
gcloud builds submit --tag gcr.io/YOUR_PROJECT/ml-api:v1
# 3. Deploy
gcloud run deploy ml-api \
--image gcr.io/YOUR_PROJECT/ml-api:v1 \
--port 8000 \
--region us-central1 \
--allow-unauthenticated
Done. In ~2 minutes you get a public URL. Test:
curl https://ml-api-xxxxx-uc.a.run.app/health
# {"status": "healthy"}
For environment variables and secrets:
gcloud run deploy ml-api \
--image gcr.io/YOUR_PROJECT/ml-api:v1 \
--port 8000 \
--region us-central1 \
--set-env-vars="PYTHONUNBUFFERED=1" \
--set-secrets="DATABASE_URL=db-url:latest"
Option 2: AWS App Runner
App Runner is the AWS equivalent — as simple as Cloud Run.
# 1. Push the image to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker tag ml-api:v1 123456789.dkr.ecr.us-east-1.amazonaws.com/ml-api:v1
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/ml-api:v1
# 2. Create the service via CLI
aws apprunner create-service \
--service-name ml-api \
--source-configuration '{
"ImageRepository": {
"ImageIdentifier": "123456789.dkr.ecr.us-east-1.amazonaws.com/ml-api:v1",
"ImageRepositoryType": "ECR",
"ImageConfiguration": {"Port": "8000"}
},
"AutoDeploymentsEnabled": true
}'
App Runner also scales automatically and supports auto-deploy when the image is updated in ECR.
How much does it cost? Both charge by usage (CPU + memory while the API is processing requests). For ML APIs with irregular traffic, Cloud Run’s “scale to zero” is unbeatable — you literally pay nothing when there’s no traffic. App Runner keeps at least one instance active by default, but can be configured to scale to zero as well.
What about Kubernetes? EKS (AWS) and GKE (GCP) are options when you need complex orchestration — multiple services, sophisticated auto-scaling, GPU scheduling. But for an ML API serving predictions, Cloud Run or App Runner solve it with a fraction of the complexity and operational cost.
Bonus: Daily Workflow with Docker
Do I Need to Develop Inside Docker?
Not necessarily. In practice, most teams use a mix:
- Code runs locally — faster, native hot-reload, IDE debugger works directly. For fast iteration, nothing beats running in your terminal.
- Dependencies run in Docker — Postgres, Redis, queues. The
compose.yamlstarts these services while your API runs locally pointing tolocalhost:5432. - Docker for final validation — before committing,
docker compose upto test everything together in the same environment as production.
Developing 100% inside Docker works, but it’s slower and the developer experience is worse. Developing 100% locally is fast, but it’s the path to “it works on my machine.” The middle ground is the industry standard.
Day to Day
After setting everything up, day-to-day is simple:
# Morning: start the environment
docker compose up -d
# Develop normally (bind mount enables hot-reload)
# Change code → API reloads automatically
# Check logs when something looks wrong
docker compose logs -f api
# Run tests inside the container (same environment as production)
docker compose exec api pytest
# End of day: shut down (data persists in the volume)
docker compose down
# Update dependencies
docker compose build
docker compose up -d
Commands You’ll Memorize
# Compose (90% of your usage)
docker compose up -d # Start
docker compose down # Stop
docker compose logs -f # Logs
docker compose exec api bash # Shell in service
docker compose build # Rebuild
docker compose ps # Status
# Images
docker build -t name:tag . # Build
docker image ls # List
docker image prune # Clean unused
# Standalone containers
docker run -d --name x -p 80:80 img # Run
docker stop x && docker rm x # Stop and remove
docker run -it --rm img bash # Disposable shell
# Maintenance
docker system df # Disk usage
docker system prune -a # General cleanup
Docker Under the Hood
You’re already using Docker productively. Now it’s worth understanding what happens underneath — this helps diagnose problems and make better decisions.
Architecture
Docker uses a client-server architecture:
┌──────────────────────────────────────────┐
│ Docker Client (docker CLI) │
│ Sends commands via REST API │
└─────────────────┬────────────────────────┘
│
┌─────────────────▼────────────────────────┐
│ Docker Daemon (dockerd) │
│ Manages images, containers, │
│ networks and volumes │
└─────────────────┬────────────────────────┘
│
┌─────────────────▼────────────────────────┐
│ containerd → runc │
│ Runtime that creates/runs containers │
│ using Linux namespaces and cgroups │
└──────────────────────────────────────────┘
- Docker Client: the CLI you use (
docker build,docker run) - Docker Daemon: the server that does the heavy lifting
- containerd: manages container lifecycle (donated to CNCF)
- runc: creates and runs containers at the lowest level (donated to OCI)
Underlying Linux Technologies
Docker isn’t magic — it’s engineering on top of Linux kernel features:
Namespaces isolate resources: each container has its own view of processes (pid), network (net), filesystem (mnt), hostname (uts), and users (user).
Control Groups (cgroups) limit resources: CPU, memory, disk I/O. This is what makes deploy.resources.limits work in Compose.
Union Filesystems enable the layered image architecture — copy-on-write for efficiency.
Docker vs VMs
A common question:
| Aspect | Containers | VMs |
|---|---|---|
| Virtualizes | Operating system | Hardware |
| Size | Tens of MB | Tens of GB |
| Boot | Seconds | Minutes |
| Isolation | Process (namespaces) | Hardware (hypervisor) |
| Performance | Near-native | Hypervisor overhead |
In practice, most companies use both: containers running inside VMs in the cloud.
Docker Desktop vs Alternatives
Docker Desktop is the official GUI, but since 2021 it requires a paid license for large companies (+250 employees or +$10M revenue). Free alternatives:
| Tool | Highlights | Installation |
|---|---|---|
| Podman | Daemonless, rootless by default, compatible CLI | brew install podman |
| Colima | Simpler for Mac, CLI only | brew install colima && colima start |
| Rancher Desktop | GUI, Kubernetes built-in | Download from site |
Podman stands out for those who want security: no daemon running as root, and alias docker=podman works for almost everything.
New Docker Features
Three recent tools worth knowing:
Docker Init — generates Dockerfile, compose.yaml, and .dockerignore with best practices for your language:
docker init
# Detects Python, asks version, port, command — generates everything
Docker Scout — scans vulnerabilities in your images:
docker scout cves ml-api:v2
docker scout recommendations ml-api:v2
Docker Debug — shell in any container, even those without a shell:
docker debug ml-api
# Full toolbox: vim, curl, htop — without modifying the container
Recapping the Journey
We started with a FastAPI API that only worked locally. Throughout the guide, each problem led to a Docker concept:
| Problem | Docker Concept | Solution |
|---|---|---|
| ”It works on my machine” | Dockerfile | Package everything in an image |
| Docker copying junk and secrets | .dockerignore | Filter what goes into the image |
| Slow rebuild on every change | Layer caching | Order instructions strategically |
| I need PostgreSQL | Docker Compose | Define multiple services in YAML |
| Data disappears on restart | Volumes | Persist data outside the container |
| API can’t find the database | Networking | Automatic DNS between services |
| Slow predictions for repeated data | Additional services | Redis as cache in the stack |
| Running as root | Security | USER, secrets, resource limits |
| I need to share the image | Container registry | docker push/pull via GHCR |
| I need to automate deployment | CI/CD | GitHub Actions + production compose |
| Error in production | Debugging | Logs, exec, docker debug |
| I need to run in the cloud | Cloud deploy | Cloud Run, App Runner |
Each concept solved a concrete problem. There’s no reason to memorize Docker commands without context — now you know why each one exists.
The ecosystem keeps evolving: Docker Scout, Docker Debug, Docker Init, and Hardened Images show that the platform is more mature than ever. But the fundamentals — images, containers, volumes, networks, security — have been the same since 2013. Master those, and you’ll be ready for whatever comes next.