Kamal has been positioned as the answer to a question Rails developers have asked for years: how do you deploy to your own servers without the overhead of Kubernetes or the cost of a managed platform? And in many ways, it delivers. But the gap between what Kamal promises and what it actually requires in a production environment is wider than most tutorials suggest.
I have been using Kamal to deploy Rails applications across multiple projects. Some of those deployments were clean and straightforward. Others turned into multi-day debugging sessions that made me question whether I should have just stayed on the platform I was migrating from. This is an honest account of both sides.
The Promise: Heroku on Your Own Servers
The pitch is compelling. Kamal, built by the team at 37signals, uses Docker containers and SSH to deploy your Rails application to any VPS. It handles zero-downtime deployments, rolling restarts, and SSL certificates through its built-in kamal-proxy. You write a single deploy.yml configuration file, run kamal setup, and your application is live. No Kubernetes manifests. No Terraform files. No $500/month Heroku bills for what amounts to a few Puma processes and a database.
For a fresh application deployed to a clean VPS, this promise is largely kept. And that is worth acknowledging before diving into the complications.
Where Kamal Genuinely Shines
If you are deploying a new Rails application to a fresh server with no existing infrastructure to work around, Kamal is genuinely excellent. The initial setup is minimal. You configure your server IP, your Docker registry credentials, and your application secrets. You run kamal setup and within a few minutes, your application is running in production with SSL certificates provisioned automatically through Let’s Encrypt.
The zero-downtime deployment model works well. Kamal boots a new container, waits for it to pass health checks, and then switches traffic from the old container to the new one. If the new container fails its health check, the old one keeps running. This is exactly what you want from a deployment tool, and Kamal handles it with very little configuration.
Once a deployment pipeline is set up and stable, the day-to-day experience is smooth. Push your code, run kamal deploy (or let your CI do it), and your changes are live in minutes. The integration with CI tools like GitHub Actions is straightforward. A basic workflow that builds, pushes, and deploys on every merge to main takes about twenty lines of YAML to configure. For teams that want to own their infrastructure without hiring a dedicated DevOps engineer, this is a meaningful improvement over what existed before.
Kamal also handles multiple applications on a single server surprisingly well. Each application gets its own set of containers, and kamal-proxy routes traffic based on hostnames. For side projects or small applications that do not justify their own dedicated server, this multi-tenant approach is practical and cost-effective.
The First Wall: Postgres on the Same Host
Most Rails tutorials for Kamal show you how to set up Postgres as a Kamal “accessory,” a Docker container running on the same host as your application. This works, but it introduces a problem that is rarely discussed in those tutorials: security.
When you define a Postgres accessory in your deploy.yml and expose its port, Docker adds its own iptables rules that bypass your server’s firewall. If you are using UFW (which most Ubuntu servers do), you might assume that running ufw deny 5432 will block external access to your database. It will not. Docker operates on the NAT table of iptables, while UFW operates on the filter table. Traffic destined for your Docker container’s published port never passes through UFW’s rules at all.
This is not a Kamal-specific problem. It is a Docker problem that Kamal inherits. But because Kamal’s documentation encourages you to run Postgres as an accessory on the same host, you are very likely to encounter it. The solution is to bind the Postgres port to localhost only by configuring the port as 127.0.0.1:5432:5432 instead of 5432:5432 in your accessory configuration. But if you do not know to do this, your database is sitting on the public internet with whatever password you chose, and UFW will happily report that port 5432 is blocked while it absolutely is not.
If you want additional protection beyond localhost binding, you need to work with the DOCKER-USER iptables chain directly or use an external firewall provided by your cloud provider. Neither of these approaches is covered in Kamal’s documentation, and both require knowledge that goes well beyond what a typical Rails developer is expected to have.
The Health Check Trap: Solid Cache and Rack Attack
Rails 8 ships with the Solid trifecta: Solid Cache, Solid Queue, and Solid Cable. It also encourages the use of Rack Attack for rate limiting and request throttling. Both of these are sensible defaults for production applications. Both of them can break your Kamal deployment in ways that are not immediately obvious.
When Kamal deploys a new container, it hits the /up health check endpoint to verify that the application has booted successfully. If the health check does not return a 200 response within the configured timeout (30 seconds by default), Kamal considers the container unhealthy and rolls back the deployment. The error message you see is: target failed to become healthy within configured timeout.
Here is where it gets interesting. If you are using Rack Attack with strict rate limiting or IP-based throttling, the health check requests coming from kamal-proxy might get blocked. The health check runs frequently during deployment (every few seconds), and depending on your Rack Attack configuration, those rapid successive requests from the same internal IP can trigger your throttling rules. Your application is perfectly healthy, but Rack Attack is doing exactly what you told it to do: blocking what looks like suspicious traffic.
The fix is to add an exception for the health check path in your Rack Attack configuration. Something like excluding requests to /up from your throttling rules. But you only discover this after a failed deployment, and the error message gives you no indication that Rack Attack is the culprit. The logs show a timeout. The container appears to start. The application boots. But the health check never succeeds.
Solid Cache can introduce a similar issue if your cache backend is not yet available during the initial deployment. If your application’s boot sequence tries to connect to a cache store that does not exist yet (because the database has not been migrated, or because the Solid Cache tables have not been created), the /up endpoint can fail or hang. Again, the error message is unhelpful, and the root cause is several layers removed from what Kamal reports.
The general pattern here is that Kamal’s health check system is simple and reliable in isolation, but any middleware, initializer, or framework feature that interferes with the /up endpoint during the critical boot window can cause deployments to fail. You need to audit your entire request pipeline for anything that might affect that single endpoint, and you need to create exceptions for it. This is manageable once you know about it, but discovering it through failed deployments is not a pleasant experience.
The Nginx Question: Do Not Try to Combine Them
This is the most opinionated advice in this post, and I stand by it: do not try to run nginx alongside kamal-proxy, especially if SSL is involved.
The temptation is understandable. Many Rails developers have years of experience with nginx as a reverse proxy. It handles static file serving, request buffering, gzip compression, and SSL termination. You might want nginx in front of your application for features that kamal-proxy does not provide, or because your existing infrastructure already uses nginx for other services on the same server.
The problem is that kamal-proxy and nginx both want to own the same ports (80 and 443) and both want to handle SSL termination. You can configure nginx to listen on the standard ports and forward traffic to kamal-proxy on internal ports, but now you have two reverse proxies in series, each with its own SSL configuration, its own header forwarding behavior, and its own opinions about how health checks should work. The X-Forwarded-For and X-Forwarded-Proto headers get mangled. SSL redirects loop. Health checks that work through kamal-proxy fail when they pass through nginx first. The HTTP Origin header mismatches cause Rails’ forgery protection to reject legitimate POST requests.
I have seen developers spend days trying to make this combination work, including myself. The number of moving parts is simply too high. You are debugging interactions between nginx’s proxy_pass configuration, kamal-proxy’s TLS settings, Rails’ force_ssl behavior, host authorization rules, and Docker’s internal networking, all at the same time. Each component works correctly in isolation. Together, they create a debugging nightmare where fixing one issue introduces another.
If you need nginx features, my recommendation is to choose one approach and commit to it. Either use kamal-proxy for everything and accept its limitations, or disable kamal-proxy entirely and manage nginx yourself outside of Kamal’s lifecycle. Trying to layer them is not worth the complexity.
More Ways It Goes Wrong
Beyond the major issues, there are several smaller problems that collectively add up to significant debugging time during your first few deployments.
The force_ssl redirect loop. Rails production environments enable force_ssl by default. Kamal’s health check hits /up over HTTP internally. The health check gets redirected to HTTPS, which kamal-proxy interprets as a failure. Your application is running, SSL is working for actual users, but deployments fail because the internal health check cannot complete. The fix is to exclude /up from SSL redirects in your production configuration, but this is a non-obvious interaction between two reasonable defaults.
The host authorization block. Rails 7 and later include host authorization that rejects requests from unexpected hostnames. Kamal’s health check may use an IP address or internal Docker hostname that does not match your configured allowed hosts. The fix is another exclusion rule for the /up path, but the error manifests as a generic health check failure with no clear indication of what went wrong.
Architecture mismatches. If you develop on an Apple Silicon Mac and deploy to an AMD64 server (which is most VPS providers), you need to configure cross-platform builds. Kamal supports remote builders and multi-architecture builds, but misconfiguring this results in containers that build successfully but crash immediately on the server. The error output is cryptic, and the root cause is not always obvious.
The missing CMD instruction. If your Dockerfile does not include an explicit CMD instruction, the base Ruby image defaults to running IRB. Your container starts, passes the Docker health check (because the container is “running”), but kamal-proxy cannot connect to your application because it is sitting in an interactive Ruby console instead of running Puma. The default Rails Dockerfile includes the correct CMD, but if you have customized your Dockerfile, this is an easy thing to accidentally remove.
Credentials during the build phase. If you use environment-specific credentials (staging vs. production), the Docker build phase uses the production environment regardless of what you specify in your deploy configuration. This means staging-specific credentials are not available during asset precompilation, which can cause the build to fail with errors that look completely unrelated to credential management.
Memory pressure on small VPS instances. Building Docker images is memory-intensive. If your VPS has limited RAM (2GB or less), the build process can push the system into swap, making everything slow enough to trigger deployment timeouts. The container is not unhealthy. The server is just too slow to boot the application within the default timeout window. You can increase the timeout, use a remote builder, or add swap space, but diagnosing this as the root cause requires monitoring server resources during deployment.
Accessory boot ordering. Kamal does not guarantee that accessories (like Postgres or Redis) are fully ready before your application container starts. If your application’s boot process tries to connect to a database that is still initializing, it will fail. You may need to add retry logic to your entrypoint script or configure health checks that account for accessory startup time.
Once It Works, It Really Works
Here is the thing that makes all of the above frustrations tolerable: once you have worked through the initial setup issues and your deployment pipeline is stable, Kamal is genuinely pleasant to use.
The daily workflow becomes simple. You push code, your CI pipeline runs tests, and if they pass, it deploys. Zero-downtime. Automatic SSL renewal. Rolling restarts. No platform vendor to negotiate with, no surprise bills, no dependency on someone else’s infrastructure decisions. Your $10/month VPS runs your application just as reliably as the $200/month platform you migrated from.
GitHub Actions integration is particularly smooth. A basic deployment workflow looks roughly like this: check out your code, set up Docker buildx, log into your registry, run kamal deploy. The Kamal team provides a GitHub Action that handles most of the setup. You add your secrets (registry password, Rails master key, server SSH key) to your repository’s GitHub Secrets, reference them in your workflow, and you have a fully automated deployment pipeline in under thirty minutes.
For teams that have been deploying with Capistrano, the shift to containerized deployments through Kamal removes an entire class of “works on my machine” problems. Your production environment is defined by your Dockerfile, not by whatever packages happen to be installed on the server. New team members do not need to understand the server’s history to deploy safely.
Who Should Use Kamal
Kamal is a strong choice if you are deploying a new Rails application to a fresh server and you are comfortable with Docker basics. It is also a good fit if you are migrating away from expensive managed platforms and want to retain a similar developer experience at a fraction of the cost. If your infrastructure is simple (one or two servers, a single application per server, managed database), the setup process is close to what the documentation promises.
It is a harder sell if you have existing infrastructure that includes nginx, custom firewall rules, or services that were not designed to coexist with Docker’s networking model. The more your server already does, the more edge cases you will encounter during the initial setup. This is not because Kamal is poorly designed. It is because deploying containerized applications to shared infrastructure is inherently more complex than deploying to a clean slate.
If you are considering Kamal, my advice is to start with a clean server and a simple application. Get the deployment pipeline working end to end before adding complexity. Add your database accessory and verify the firewall situation. Configure your health check exclusions before enabling Rack Attack or force_ssl. Layer in complexity one piece at a time, and test each addition with a full deployment cycle. The problems I described in this post are all solvable. They are just much easier to solve when you encounter them one at a time instead of all at once.
The Bigger Picture
Kamal represents a meaningful step forward for Rails deployment. It brings the convenience of platform-as-a-service to self-hosted infrastructure without requiring deep DevOps expertise. The rough edges I have described are real, but they are the kind of problems that get solved through better documentation, community knowledge sharing, and incremental improvements to the tool itself.
The Rails community has always valued convention over configuration. Kamal extends that philosophy to deployment, and for the most common case (a new application on a clean server), it succeeds. The challenge is that production infrastructure is rarely the most common case. Real servers have history, existing services, security requirements, and operational constraints that no deployment tool can fully anticipate.
What Kamal does well is give you a foundation. What it requires is the understanding to adapt that foundation to your specific situation. And that understanding, as with most things in software engineering, comes from deploying, breaking things, debugging, and learning the hard way why certain defaults exist.
The Entrypoint File: Where Your Custom Database Setup Lives or Dies
Every Rails application deployed with Kamal runs through a file called bin/docker-entrypoint. This is the script that executes before your application server starts inside the container. The default version that ships with Rails is deceptively simple:
#!/bin/bash -e
# If running the rails server then create or migrate existing database
if [ "${*}" == "./bin/rails server" ]; then
./bin/rails db:prepare
fi
exec "${@}"Code language: PHP (php)
This checks whether the container is starting the Rails server (as opposed to a background worker or console session), and if so, runs db:prepare to handle schema loading or pending migrations. For a standard Rails application with a straightforward schema, this works perfectly.
It stops working the moment you introduce anything that db:prepare does not handle on its own.
A concrete example: suppose your application uses a custom Postgres function for full-text search or data transformation. You have defined it in a migration, and your schema references it. When Kamal deploys for the first time and db:prepare tries to load the schema, it fails because the function does not exist yet. The schema expects it. The migration that creates it has not run. You are in a circular dependency, and the deployment fails with an error about an unknown function that tells you nothing about the entrypoint being the actual problem.
The fix requires modifying bin/docker-entrypoint to create the function before db:prepare runs:
#!/bin/bash -e
# If running the rails server then create or migrate existing database
if [ "${@: -2:1}" == "./bin/rails" ] && [ "${@: -1:1}" == "server" ]; then
./bin/rails db:create 2>/dev/null || true
./bin/rails db:create_functions
./bin/rails db:prepare
fi
exec "${@}"Code language: PHP (php)
The sequence matters. First, db:create ensures the database exists (silencing the error if it already does). Then a custom rake task creates the Postgres functions that the schema depends on. Only then does db:prepare run, finding everything it needs in place.
Notice also the change in the condition check. The default entrypoint uses ${*} for string comparison, but if you are running through Thruster (which Rails 8 defaults to with ./bin/thrust ./bin/rails server), the full command string changes and the condition silently fails. Your migrations never run. The application boots with an empty or outdated database, and you get cryptic errors in production that work fine in development. Using positional parameter checks (${@: -2:1} and ${@: -1:1}) catches the Rails server command regardless of what wraps it.
This pattern extends beyond custom functions. Any application that needs seed data, extension creation (CREATE EXTENSION for pgcrypto, uuid-ossp, or pg_trgm), or multi-database setup for the Solid trifecta needs a customized entrypoint. The default is a starting point, not a complete solution. Treat it as code that you own and modify as your application’s requirements evolve.
Aliases: The Small Configuration That Saves Hours
One of Kamal 2’s best quality-of-life features is aliases, and I am surprised how many deployment guides skip over them entirely. Without aliases, connecting to a Rails console on your production server requires typing: kamal app exec --interactive --reuse "bin/rails console". That is a lot of keystrokes for something you will do multiple times a day when debugging production issues.
Add the following to your deploy.yml and save yourself the trouble:
aliases:
console: app exec --interactive --reuse "bin/rails console"
shell: app exec --interactive --reuse "bash"
logs: app logs -f
dbc: app exec --interactive --reuse "bin/rails dbconsole"Code language: JavaScript (javascript)
Now kamal console drops you into a Rails console. kamal shell gives you a bash session inside the running container. kamal logs streams your application logs. kamal dbc opens a direct database console. These are the commands you will reach for at 2 AM when something is broken in production, and the difference between a quick alias and a long command string matters more than you think when you are under pressure.
If you are running staging and production as separate deploy files (deploy.staging.yml and deploy.production.yml), remember that aliases are scoped to the deploy file. Running kamal console -d staging uses the aliases defined in your staging configuration. Define them in both files.
Staging and Production: Separate Deploy Files Done Right
Speaking of environments, most real projects need at least staging and production deployments, and Kamal handles this through destination-specific deploy files. The naming convention is deploy.staging.yml and deploy.production.yml, invoked with the -d flag: kamal deploy -d staging.
What catches people off guard is that these files are not full overrides of the base deploy.yml. They merge with it. Your base file contains shared configuration (registry credentials, builder settings, aliases), and the destination files contain environment-specific values (server IPs, domain names, environment variables). If you duplicate everything in each file, you end up maintaining three copies of the same configuration and introducing drift between them.
There is a subtlety with credentials that trips people up during staging deployments. Rails credentials are environment-specific, but the Docker build phase always uses the production environment. If your staging configuration references credentials that only exist in your staging credentials file, the asset precompilation step during the build will fail. You need to either structure your credentials to be available in the production context during build, or use SECRET_KEY_BASE_DUMMY=1 to bypass credential loading during asset compilation. The error messages when this goes wrong reference missing keys or nil values deep in initializer chains, giving no hint that the build environment is the actual problem.
Another common mistake is using the same service name across environments. If you deploy both staging and production to the same server (which is common for small projects), the service name must be unique per environment. Otherwise, Kamal treats them as the same application and your staging deployment will replace your production containers. Name them something like myapp-staging and myapp-production and save yourself from an unpleasant surprise.
Secrets Management: What Actually Happens to Your Credentials
Kamal 2 introduced a dedicated .kamal/secrets file that replaced the older .env approach. This file is committed to version control, but it does not contain your actual secrets. Instead, it contains references to where the secrets come from: environment variables, files, or external secret managers.
A typical secrets file looks like this:
KAMAL_REGISTRY_PASSWORD=$KAMAL_REGISTRY_PASSWORD
RAILS_MASTER_KEY=$(cat config/master.key)
POSTGRES_PASSWORD=$POSTGRES_PASSWORDCode language: PHP (php)
The important thing to understand is what happens to these values. They get written to an environment file on your server at .kamal/apps/your-app/env/roles/web.env and injected into your Docker containers as plain text environment variables. Anyone with SSH access to your server can read them. Anyone who can exec into your running container can see them with env or printenv.
This is not a flaw in Kamal specifically. It is how Docker environment variables work. But it is worth understanding that Kamal’s secrets management provides organizational convenience, not encryption at rest. If your compliance requirements demand encrypted secrets management, you need an external tool like HashiCorp Vault, AWS Secrets Manager, or 1Password’s integration with Kamal’s secrets adapter.
A practical tip: after any change to your secrets, you need to run kamal env push to update the values on your server. Forgetting this step after rotating a credential or changing a database password is a reliable way to cause your next deployment to fail. The new container starts with the updated secrets from your local machine, but the environment file on the server still has the old values. The error manifests as authentication failures that work locally but fail in production.
Background Workers: Sidekiq, Solid Queue, and the Role Configuration
Most production Rails applications need background job processing. Whether you are using Sidekiq with Redis or Solid Queue with the database, Kamal handles this through server roles. You define a separate role for your worker process that uses the same Docker image but runs a different command:
servers:
web:
hosts:
- 192.168.1.10
job:
hosts:
- 192.168.1.10
cmd: bundle exec sidekiq -C config/sidekiq.yml
If you are using Rails 8’s Solid Queue with the SOLID_QUEUE_IN_PUMA option, you can skip the separate role entirely and run the queue processor inside your Puma server process. This simplifies the deployment at the cost of sharing resources between web requests and background jobs. For small applications and side projects, this trade-off is usually acceptable. For anything processing significant job volumes or long-running tasks, a separate worker role is the better choice.
If you use Sidekiq, you need Redis. Adding Redis as an accessory follows the same pattern as Postgres, with the same firewall caveats. Bind it to localhost, set a password, and make sure your REDIS_URL includes the password and uses the Docker network hostname (which follows the pattern your-service-name-redis). The URL format becomes redis://:yourpassword@myapp-redis:6379/0. Note the colon before the password with no username. Getting this URL format wrong results in connection failures that work in development (where Redis typically has no password) but fail in production.
One more detail about worker roles: Kamal runs migrations only when starting the web role, not the job role. This is because the entrypoint script’s condition checks for the Rails server command. Your worker container boots after the web container has already prepared the database. If you deploy to separate servers for web and job roles, be aware that the job server will pull and start the new container without running migrations. The web server handles that. If your web deployment fails but your job deployment succeeds, you could end up with workers running new code against an old database schema.
Log Management: What Kamal Does Not Do for You
Kamal gives you kamal app logs to tail your application output, and it configures Docker’s log driver with a default max size of 10MB. For debugging, this is sufficient. For production operations, it is not.
Docker’s default JSON file log driver keeps logs on the host filesystem. Without rotation configuration, these logs grow until they fill your disk. Kamal sets a max-size option, but you should verify this is appropriate for your application’s log volume. High-traffic applications can generate 10MB of logs quickly, and when the log file rotates, you lose the older entries. If you need log retention for compliance or debugging historical issues, you need to ship logs to an external service.
You can adjust the log configuration in your deploy.yml:
logging:
options:
max-size: 100m
Make sure your Rails application is configured with RAILS_LOG_TO_STDOUT=true in production. Without this, Rails writes logs to files inside the container, which are ephemeral and disappear when the container is replaced during the next deployment. Docker captures stdout and stderr, so logging to stdout is not just a preference in containerized environments. It is a requirement for your logs to be accessible at all.
Running One-Off Tasks: The Commands You Will Need
Beyond the aliases for console and shell access, there are several one-off operations you will need during the lifetime of your deployment.
Running a specific rake task: kamal app exec "bin/rails some:task". This starts a new container, runs the task, and exits. If you need an interactive session (anything that reads from stdin), add the --interactive flag.
Checking your full resolved configuration including secrets: kamal config. This is invaluable when debugging why an environment variable is not being set correctly. It shows you the merged result of your deploy file, destination file, and secrets.
Viewing details about all running containers: kamal details. This shows you which version is deployed, which containers are running, and the state of your accessories.
Rolling back a bad deployment: kamal rollback VERSION. You can find the available versions from your container registry or from the output of previous deployments. Kamal will switch traffic back to the specified version without rebuilding anything.
Cleaning up old Docker images that accumulate on your server: kamal server exec --all "docker system prune -af". Deployed containers and images accumulate over time. On a small VPS with limited disk space, this becomes a real problem after a few dozen deployments. Schedule this periodically or add it to your deployment workflow.
Aliases Revisited: Operational Commands Deserve Shortcuts Too
Earlier I showed console, shell, and log aliases. But if you are adding production rake tasks (and you should), those deserve aliases too. Here is the expanded aliases block I use in the base deploy.yml:
aliases:
# Interactive sessions
console: app exec --interactive --reuse "bin/rails console"
shell: app exec --interactive --reuse "bash"
dbc: app exec --interactive --reuse "bin/rails dbconsole"
# Logs
logs: app logs -f
# Deployment verification
verify: app exec --roles=web "bin/rails deploy:verify"
# Database operations
db-health: app exec --roles=web "bin/rails db:health"
integrity: app exec --roles=web "bin/rails integrity:check"
# Operational tasks
clear-cache: app exec --roles=web "bin/rails ops:clear_cache"
env-check: app exec --roles=web "bin/rails ops:env_check"
pending: app exec --roles=web "bin/rails ops:pending_migrations"
warm-cache: app exec --roles=web "bin/rails ops:warm_cache"Code language: PHP (php)
Notice that every operational alias bakes in --roles=web. This is the entire point. You define the safety constraint once in the alias, and every invocation inherits it. Nobody on your team needs to remember to add the flag. The alias does it for them.
Now your post-deployment check becomes:
kamal verify -d production
kamal db-health -d production
kamal integrity -d production
kamal clear-cache -d production
kamal env-check -d production
kamal pending -d production
Compare that to typing kamal app exec -d production --roles=web "bin/rails deploy:verify" six times. Under pressure at 2 AM, kamal verify -d production is what you want to type.
Alias Inheritance Across Environments
A question that comes up often: do I need to copy these aliases into deploy.staging.yml and deploy.production.yml? No. Aliases defined in the base deploy.yml are inherited by all destination files. The destination-specific files merge with the base configuration, they do not replace it. Your staging and production environments both get every alias you define in the base file.
The -d flag tells Kamal which destination to target, and the alias resolves from the merged configuration. Running kamal verify -d staging uses the same alias definition but executes against your staging servers. Running kamal verify -d production targets production. One definition, every environment.
The only reason to override an alias in a destination file is if you need genuinely different behavior per environment. For example, if your staging deployment runs on a server where you also want to alias a database reset command that you would never want available in production:
# deploy.staging.yml only
aliases:
reset-db: app exec --roles=web "bin/rails db:drop db:create db:schema:load"Code language: PHP (php)
This alias exists only in staging. Production does not inherit it because it is defined in the destination file, not the base. Keep destructive shortcuts out of production configurations. The merge behavior works in your favor here.
Role-Scoped Commands: Why Every Exec Needs a Target
Here is a mistake that bites people once and never again: running a one-off command without specifying a role. When you execute kamal app exec "bin/rails db:schema:load db:seed" without role targeting, Kamal runs that command on every role concurrently. If you have a web role and a job role, both containers will attempt to load the schema and seed the database at the same time. Two simultaneous db:schema:load operations against the same database produce unpredictable results. Table creation conflicts, unique constraint violations on seed data, and partial schema states that leave you unsure which run completed and which failed halfway through.
Always scope your commands to a single role:
kamal app exec -d production --roles=web "bin/rails db:schema:load"Code language: JavaScript (javascript)
The --roles=web flag ensures the command runs only on the web container. This applies to anything that modifies shared state: migrations, schema operations, data corrections, cache clearing, and any custom rake task that writes to the database. Read-only operations like generating a report or checking record counts are safer across roles, but the habit of always specifying --roles will prevent the mistake that matters.
If you are running multiple web servers behind a load balancer, the same principle applies. A data migration running concurrently on three web containers creates three competing writes. Scope it to one host or one role instance. The few extra characters in the command save you from debugging race conditions in production data.
Production Rake Tasks You Should Have From Day One
Seeding is a development concern. In production, you need operational tasks: things that help you verify, diagnose, and maintain your application without opening a Rails console and typing commands from memory under pressure. Here are the rake tasks I add to every production Rails application deployed with Kamal, and the reasoning behind each one.
Deployment Verification
After every deployment, you want a quick way to confirm the application is running correctly beyond what the health check endpoint tells you. A health check says “the process is alive.” A verification task says “the application is functional.”
# lib/tasks/deploy.rake
namespace :deploy do
desc "Verify deployment health beyond basic health check"
task verify: :environment do
checks = []
# Database connectivity and basic query
begin
count = ActiveRecord::Base.connection.execute("SELECT 1").first
checks << { name: "Database", status: "OK" }
rescue => e
checks << { name: "Database", status: "FAIL", error: e.message }
end
# Redis connectivity (if using Sidekiq or caching)
if defined?(Redis)
begin
Redis.new(url: ENV["REDIS_URL"]).ping
checks << { name: "Redis", status: "OK" }
rescue => e
checks << { name: "Redis", status: "FAIL", error: e.message }
end
end
# Solid Queue health (if applicable)
if defined?(SolidQueue)
begin
pending = SolidQueue::Job.where(finished_at: nil).count
checks << { name: "Solid Queue pending jobs", status: "OK", count: pending }
rescue => e
checks << { name: "Solid Queue", status: "FAIL", error: e.message }
end
end
# Active Storage (if applicable)
if defined?(ActiveStorage)
begin
ActiveStorage::Blob.count
checks << { name: "Active Storage", status: "OK" }
rescue => e
checks << { name: "Active Storage", status: "FAIL", error: e.message }
end
end
# Application version and boot time
checks << { name: "Rails version", status: Rails.version }
checks << { name: "Ruby version", status: RUBY_VERSION }
checks << { name: "Environment", status: Rails.env }
checks.each do |check|
line = "#{check[:name]}: #{check[:status]}"
line += " (#{check[:count]})" if check[:count]
line += " - #{check[:error]}" if check[:error]
puts line
end
failures = checks.select { |c| c[:status] == "FAIL" }
exit(1) if failures.any?
end
end
Run it with kamal app exec -d production --roles=web "bin/rails deploy:verify". In under a second, you know whether your database, Redis, queue system, and storage layer are all reachable from the new container. This is the first thing I run after any deployment, and it catches connection issues that the health check endpoint misses because /up typically only verifies the application process is running.
Database Health and Table Statistics
When something feels slow in production, the first question is usually “what is happening in the database?” Opening a Rails console and writing queries from memory wastes time. A dedicated task gives you the numbers immediately.
# lib/tasks/db_health.rake
namespace :db do
desc "Show database size, table row counts, and index usage"
task health: :environment do
conn = ActiveRecord::Base.connection
# Database size
db_size = conn.execute(<<~SQL).first
SELECT pg_size_pretty(pg_database_size(current_database())) AS size
SQL
puts "Database size: #{db_size['size']}"
puts ""
# Table sizes and row counts
tables = conn.execute(<<~SQL)
SELECT
schemaname || '.' || relname AS table,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
n_live_tup AS row_estimate
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20
SQL
puts "Top 20 tables by size:"
puts "-" * 60
tables.each do |t|
puts " %-35s %10s ~%d rows" % [t["table"], t["total_size"], t["row_estimate"]]
end
puts ""
# Unused indexes (candidates for removal)
unused = conn.execute(<<~SQL)
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
idx_scan AS scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan < 50
AND NOT indisunique
AND NOT indisprimary
AND pg_relation_size(i.indexrelid) > 1024 * 1024
ORDER BY pg_relation_size(i.indexrelid) DESC
LIMIT 10
SQL
if unused.any?
puts "Potentially unused indexes (< 50 scans, > 1MB):"
puts "-" * 60
unused.each do |idx|
puts " %-30s %-35s %8s %d scans" % [idx["table"], idx["index"], idx["size"], idx["scans"]]
end
end
puts ""
# Long-running queries
long_queries = conn.execute(<<~SQL)
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '30 seconds'
AND state != 'idle'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 5
SQL
if long_queries.any?
puts "Long-running queries (> 30s):"
puts "-" * 60
long_queries.each do |q|
puts " PID: #{q['pid']} | Duration: #{q['duration']} | State: #{q['state']}"
puts " #{q['query'][0..120]}"
puts ""
end
else
puts "No long-running queries."
end
end
end
This surfaces the three things you always want to know: how large is the database, which tables are growing fastest, and is anything stuck. The unused index detection is a bonus that pays for itself the first time you discover a 500MB index that nothing queries. Drop it and your write performance improves overnight.
Data Integrity Checks
Every application accumulates orphaned records, broken associations, and inconsistencies that ActiveRecord validations cannot prevent at the database level. A periodic integrity check catches these before they become user-visible bugs.
# lib/tasks/integrity.rake
namespace :integrity do
desc "Check for orphaned records and data inconsistencies"
task check: :environment do
issues = []
# Check for orphaned records across belongs_to associations
ActiveRecord::Base.descendants.each do |model|
next if model.abstract_class?
next unless model.table_exists?
model.reflect_on_all_associations(:belongs_to).each do |assoc|
next if assoc.options[:optional]
next if assoc.options[:polymorphic]
foreign_key = assoc.foreign_key
parent_table = assoc.klass.table_name
child_table = model.table_name
orphans = ActiveRecord::Base.connection.execute(<<~SQL).first
SELECT COUNT(*) AS count FROM #{child_table}
WHERE #{foreign_key} IS NOT NULL
AND #{foreign_key} NOT IN (SELECT id FROM #{parent_table})
SQL
if orphans["count"].to_i > 0
issues << "#{child_table}.#{foreign_key}: #{orphans['count']} orphaned records (missing #{parent_table})"
end
end
rescue => e
issues << "Error checking #{model.name}: #{e.message}"
end
if issues.any?
puts "Data integrity issues found:"
issues.each { |i| puts " #{i}" }
exit(1)
else
puts "No integrity issues found."
end
end
end
Run this weekly or after any data migration. The task iterates through every model’s belongs_to associations and checks for foreign keys pointing to records that no longer exist. It is not exhaustive, but it catches the most common form of data corruption in Rails applications: records left behind after a parent was deleted without proper cascading.
Maintenance Mode and Operational Tasks
These are the tasks I reach for during routine maintenance and incident response. None of them are complex. All of them save time when you need them.
# lib/tasks/ops.rake
namespace :ops do
desc "Clear all Rails caches (fragment, action, page)"
task clear_cache: :environment do
Rails.cache.clear
puts "Rails cache cleared at #{Time.current}"
end
desc "Show environment configuration (redacted secrets)"
task env_check: :environment do
critical_vars = %w[
RAILS_ENV
DATABASE_URL
REDIS_URL
RAILS_LOG_TO_STDOUT
RAILS_SERVE_STATIC_FILES
RAILS_MAX_THREADS
WEB_CONCURRENCY
SOLID_QUEUE_IN_PUMA
]
critical_vars.each do |var|
value = ENV[var]
if value.nil?
puts " #{var}: NOT SET"
elsif var.match?(/PASSWORD|SECRET|KEY|TOKEN|URL/)
puts " #{var}: #{value[0..4]}*****"
else
puts " #{var}: #{value}"
end
end
end
desc "Show pending migrations"
task pending_migrations: :environment do
pending = ActiveRecord::Base.connection.migration_context.open.pending_migrations
if pending.any?
puts "Pending migrations:"
pending.each { |m| puts " #{m.version} - #{m.name}" }
else
puts "No pending migrations."
end
end
desc "Warm application caches after deployment"
task warm_cache: :environment do
puts "Warming critical caches..."
# Add your application-specific cache warming here
# Examples:
# - Preload frequently accessed configuration
# - Cache expensive database queries
# - Warm up connection pools
if defined?(Rails.application.config.cache_store)
puts " Cache store: #{Rails.application.config.cache_store}"
end
puts "Cache warming complete at #{Time.current}"
end
desc "Kill stuck Solid Queue jobs older than specified hours"
task :kill_stuck_jobs, [:hours] => :environment do |_t, args|
hours = (args[:hours] || 4).to_i
if defined?(SolidQueue)
stuck = SolidQueue::ClaimedExecution
.where("created_at < ?", hours.hours.ago)
count = stuck.count
stuck.each(&:release!)
puts "Released #{count} stuck jobs older than #{hours} hours"
else
puts "Solid Queue not available"
end
end
endCode language: HTML, XML (xml)
The ops:env_check task is particularly useful when debugging deployment issues. Instead of exec-ing into the container and running printenv | grep, you get a formatted view of the critical variables with secrets redacted. The ops:kill_stuck_jobs task handles a Solid Queue edge case where jobs get claimed but the worker process dies before completing them. Without intervention, these jobs sit in limbo until you manually release them.
With the aliases configured in deploy.yml, your post-deployment routine becomes:
# Post-deployment verification
kamal verify -d production
kamal db-health -d production
kamal pending -d production
# Routine maintenance
kamal integrity -d production
kamal clear-cache -d production
kamal env-check -d productionCode language: PHP (php)
For tasks that need arguments (like releasing stuck jobs with a custom hour threshold), you still use the full form since aliases do not support parameter passing:
kamal app exec -d production --roles=web "bin/rails ops:kill_stuck_jobs[6]"Code language: JavaScript (javascript)
Every alias bakes in --roles=web. Every full-form command should too. It becomes muscle memory. The one time you forget and a destructive task runs across all roles simultaneously is the last time you forget.
Database Backups: The Step Everyone Skips Until They Shouldn’t Have
If you are running Postgres as a Kamal accessory, your database lives on a Docker volume on your server. If that server fails, your data goes with it. Kamal does not set up backups for you, and most deployment tutorials mention this only in passing.
The most straightforward approach is adding a backup accessory using an image like kartoza/pg-backup that runs scheduled dumps to S3-compatible storage:
accessories:
pg-backup:
image: kartoza/pg-backup:latest
host: 192.168.1.10
env:
clear:
CRON_SCHEDULE: "@daily"
REMOVE_BEFORE: 30
STORAGE_BACKEND: S3
secret:
- POSTGRES_USER
- POSTGRES_PASS
- POSTGRES_HOST
- POSTGRES_PORT
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_BUCKET_NAMECode language: JavaScript (javascript)
Test your backup restoration process before you need it. A backup that has never been tested is not a backup. It is a hope. Run a restore to a local database, verify the data integrity, and document the process. When your production database needs recovery at 3 AM, you do not want to be reading documentation for the first time.
Asset Bridging: Avoiding 404s During Deployments
When Kamal swaps containers during a zero-downtime deployment, users who loaded a page from the old container may have references to fingerprinted assets (JavaScript and CSS files with unique hashes in their filenames) that no longer exist in the new container. The old container is gone, and the new container only has its own asset versions. Those users get 404 errors on their stylesheets and scripts until they refresh the page.
Kamal solves this with asset bridging. Add the following to your deploy.yml:
asset_path: /rails/public/assetsCode language: JavaScript (javascript)
This tells Kamal to merge assets from the old and new containers into a shared volume during deployment. Users with references to old asset filenames can still load them from the new container. It is a small configuration line that prevents a real user-facing problem, and it should be part of every production deployment configuration.
If you are using Propshaft instead of Sprockets, make sure the asset path matches where Propshaft outputs its compiled files. And if you are using a CDN for asset delivery, asset bridging becomes less critical since the CDN cache retains old versions. But for applications serving assets directly, this is essential.
If you found this useful, I write regularly about software engineering, architecture, and the practical realities of building production systems at ivanturkovic.com. You can follow me on LinkedIn for shorter takes and updates, or reach out directly if you want to discuss deployment strategies, Rails architecture, or anything covered in this post. I would love to hear about your own experience with Kamal, especially the edge cases and workarounds that are not covered in the official documentation.
If this post made you think, you'll probably like the next one. I write about what's actually changing in software engineering, not what LinkedIn wants you to believe. No spam, unsubscribe anytime.