Give backend more healthcheck time on redeploy

ERPNext gunicorn can exceed 180s + 15 retries (~6.5 min) after migrate.
Use 360s start_period, 20 retries, and fall back to SERVICE_FQDN_FRONTEND
when currentsite.txt is empty. Frontend gets the same Host fallback.
This commit is contained in:
epistemophiliac 2026-06-16 22:31:39 -04:00
parent faf2d847cf
commit 366da2d3cc
2 changed files with 8 additions and 7 deletions

View file

@ -156,11 +156,11 @@ services:
migrator:
condition: service_completed_successfully
healthcheck:
test: ['CMD-SHELL', 'H=$$(tr -d "\r\n" < sites/currentsite.txt 2>/dev/null); curl -sf -H "Host: $$H" http://127.0.0.1:8000/api/method/ping || exit 1']
test: ['CMD-SHELL', 'H=$$(tr -d "\r\n" < sites/currentsite.txt 2>/dev/null); [ -n "$$H" ] || H="$$SERVICE_FQDN_FRONTEND"; [ -z "$$H" ] && exit 1; curl -sf --max-time 8 -H "Host: $$H" http://127.0.0.1:8000/api/method/ping || exit 1']
interval: 15s
timeout: 10s
retries: 15
start_period: 180s
retries: 20
start_period: 360s
websocket:
<<: [*depends_on_configurator, *customizable_image, *frappe_platform, *sites_volume]
@ -179,6 +179,7 @@ services:
- export FRAPPE_SITE_NAME_HEADER=$$(tr -d '\r\n' < /home/frappe/frappe-bench/sites/currentsite.txt); exec nginx-entrypoint.sh
environment:
- SERVICE_URL_FRONTEND_8080
- SERVICE_FQDN_FRONTEND
- 'BACKEND=backend:8000'
- 'SOCKETIO=websocket:9000'
- 'UPSTREAM_REAL_IP_ADDRESS=${UPSTREAM_REAL_IP_ADDRESS:-127.0.0.1}'
@ -194,11 +195,11 @@ services:
websocket:
condition: service_started
healthcheck:
test: ['CMD-SHELL', 'H=$$(tr -d "\r\n" < sites/currentsite.txt 2>/dev/null); curl -sf -H "Host: $$H" http://127.0.0.1:8080/api/method/ping || exit 1']
test: ['CMD-SHELL', 'H=$$(tr -d "\r\n" < sites/currentsite.txt 2>/dev/null); [ -n "$$H" ] || H="$$SERVICE_FQDN_FRONTEND"; [ -z "$$H" ] && exit 1; curl -sf --max-time 8 -H "Host: $$H" http://127.0.0.1:8080/api/method/ping || exit 1']
interval: 15s
timeout: 10s
retries: 15
start_period: 120s
retries: 20
start_period: 180s
queue-short:
<<: *backend_defaults

View file

@ -81,7 +81,7 @@ Login: `https://your-domain` — user `Administrator`, password = `ADMIN_PASSWOR
| Symptom | Fix |
|---------|-----|
| Traefik `404 page not found` / URL unreachable | Domain on service `frontend` port **8080**; compose must declare `SERVICE_URL_FRONTEND_8080` (not `SERVICE_FQDN_*`); `ports_exposes` = 8080 |
| Backend unhealthy / deploy fails after migrator | Healthcheck must send `Host: <site>` — fixed in compose; redeploy |
| Backend unhealthy / deploy fails after migrator | Gunicorn can take 6+ min on redeploy — backend `start_period` is 360s; healthcheck uses `Host` from `currentsite.txt` or `SERVICE_FQDN_FRONTEND` |
| `SITE_NAME empty` on create-site | Assign domain on `frontend:8080` before deploy (`SERVICE_FQDN_FRONTEND`) |
| Wrong site / 404 nginx | Delete old `SITE_NAME` in Coolify UI; ensure header matches domain |
| Site created with wrong name | Wipe `sites` volume or rename site manually — env change alone won't rename |