KB-002: Hyperdrive Binding Connected but database Service Reports down
KB-002: Hyperdrive Binding Connected but database Service Reports down
Status: ✅ Active Affected version: v0.75.0 Resolved in: PR fixing
PrismaClientConfigSchemato acceptpostgres://+ enhanced/api/healthprobe Date: 2026-03-25
Session Log — 2026-03-25
This section captures the live troubleshooting session that led to the discovery of this KB article and the subsequent hardening work in v0.76.0+.
Symptoms Observed
- UI showed “Degraded performance — v0.75.0” and “Data may be stale” banners on every page load
/api/healthreturneddatabase.status: "down"withlatency_ms: 0- Cloudflare Hyperdrive admin page showed zero traffic despite Neon dashboard showing migration activity
- Health response showed
hyperdrive_host: "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local"— which is the correct Hyperdrive local proxy address (not a bug)
Diagnosis Steps Taken
# Step 1: Inspect the full health responsecurl -s https://bloqr-frontend.jk-com.workers.dev/api/health | jq .services.database# Result:# {# "status": "down",# "latency_ms": 0,# "hyperdrive_host": "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local"# }
# Step 2: Check the Hyperdrive binding configurationwrangler hyperdrive get 800f7e2edc86488ab24e8621982e9ad7# Result showed "scheme": "postgres" — Hyperdrive uses postgres://, not postgresql://
# Step 3: Check deployed versioncurl -s https://bloqr-frontend.jk-com.workers.dev/api/health | jq .version# "0.76.0" — schema fix was deployed
# Step 4: Tail live worker logswrangler tail --format=pretty# Revealed: ZodError thrown before any network call when parsing connectionStringKey Diagnostic Insight: .hyperdrive.local Is Correct
The hyperdrive_host: "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local" host is the Cloudflare-managed local proxy address inside a deployed Worker. This is expected and correct — it means the HYPERDRIVE binding is wired up properly. The failure was happening before any query reached this proxy.
Root Cause Found
PrismaClientConfigSchema in worker/lib/prisma-config.ts was only accepting postgresql:// as a valid scheme. Cloudflare Hyperdrive returns postgres:// from env.HYPERDRIVE.connectionString. This caused a ZodError to be thrown synchronously before any TCP connection was attempted — hence latency_ms: 0.
The fix (accepting both schemes via .regex(/^postgre(?:s|sql):\/\//)) was already applied in the first PR.
Remaining Issue at v0.76.0
Even after the schema fix was deployed at v0.76.0, the database was still down. The latency_ms: 0 pattern continued, meaning the error was still at instantiation time, not at query time. At this point there were two hypotheses:
- The schema fix wasn’t actually deployed —
wrangler deployments listshould confirm the version - Query-level failure — the
PrismaClientwas being created but the query itself was failing silently (catch block swallowed the error)
The original databaseProbe catch block did not capture error_code or error_message, making it impossible to distinguish these cases from the health response alone.
Resolution Path
The hardening work in this PR added:
error_codeanderror_messagein thedatabaseProbecatch block — redacted of any connection string fragments- 5-second timeout via
Promise.race— a hung Hyperdrive connection no longer blocks the health response indefinitely $disconnect()in afinallyblock — prevents connection pool leaks after each probe- New
GET /api/health/db-smokeendpoint — runscurrent_database(),pg_version,now(), andtable_countas a richer smoke test. This is the canonical way to verify DB connectivity after every production deploy.
Using the New Smoke Test Endpoint
After deploying, run:
curl -s https://bloqr-frontend.jk-com.workers.dev/api/health/db-smoke | jq .Expected healthy output:
{ "ok": true, "db_name": "bloqr-backend", "pg_version": "PostgreSQL 16.x ...", "server_time": "2026-03-25T21:59:15.917Z", "table_count": 17, "latency_ms": 42, "hyperdrive_host": "ep-winter-term-a8rxh2a9-pooler.eastus2.azure.neon.tech"}If it returns ok: false, the error field will now contain a redacted error message that pinpoints the failure layer.
Symptom
The live site at https://bloqr-frontend.jk-com.workers.dev/ displays two error banners:
- “Degraded performance — v0.75.0”
- “Data may be stale”
Hitting the health endpoint returns:
{ "status": "down", "version": "0.75.0", "timestamp": "2026-03-25T21:59:15.917Z", "services": { "gateway": { "status": "healthy" }, "database": { "status": "down", "latency_ms": 0 }, "compiler": { "status": "healthy" }, "auth": { "status": "healthy", "provider": "better-auth" }, "cache": { "status": "healthy", "latency_ms": 132 } }}Key tell: latency_ms: 0 on the database service.
A real network failure or timeout always returns a non-zero latency. An instant 0 ms failure means the probe threw before any connection attempt — i.e., at the validation layer.
The Cloudflare Hyperdrive dashboard shows zero queries/connections despite the Neon dashboard showing migration activity.
Diagnostic Commands
# 1. Inspect the full health responsecurl -s https://<your-worker>.workers.dev/api/health | jq .
# 2. Check the Hyperdrive binding configurationnpx wrangler hyperdrive get <hyperdrive-id>
# 3. Tail the live worker log to catch Zod validation errorswrangler tailLook for lines like ZodError: Invalid url or Expected string, received undefined in the tail output. A ZodError thrown during PrismaClientConfigSchema.parse() is definitive proof that the failure is at the config-validation layer, not the network.
Root Cause Decision Tree
❶ Is latency_ms exactly 0?
If YES — the database probe threw before opening any connection. This points to a config-validation failure, not a network failure. Proceed to ❷.
If NO (latency is non-zero) — the connection was attempted but timed out or was refused. Check Neon project status, Hyperdrive binding ID, and network egress. This article does not cover that case.
❷ What scheme does the Hyperdrive connection string use?
npx wrangler hyperdrive get <hyperdrive-id>The output will show a "scheme" field. Cloudflare Hyperdrive always returns postgres:// (not postgresql://) from the env.HYPERDRIVE.connectionString binding property.
{ "id": "...", "name": "adblock-hyperdrive", "origin": { "scheme": "postgres", "host": "...", "port": 5432, "database": "neondb" }}If the scheme is postgres — and PrismaClientConfigSchema only accepts postgresql://, the schema parse throws a ZodError instantly. This is the root cause. Proceed to Resolution.
❸ Does PrismaClientConfigSchema accept postgres://?
Open worker/lib/prisma-config.ts and check the connectionString validator:
// Before the fix — rejects Hyperdrive's actual schemeconnectionString: z.string().url().startsWith('postgresql://'),
// After the fix — accepts both schemesconnectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//),If the schema only allows postgresql://, every request that tries to build a PrismaClient from the Hyperdrive binding will fail at parse time with zero network activity.
Resolution
Step 1 — Update PrismaClientConfigSchema to accept both URL schemes
In worker/lib/prisma-config.ts, change the connectionString validation to accept both postgres:// and postgresql://:
// BeforeconnectionString: z.string().url().startsWith('postgresql://'),
// AfterconnectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//),This accepts postgres://... (Hyperdrive short alias) and postgresql://... (standard long form) while rejecting anything else.
Step 2 — Deploy
wrangler deployAfter deploying, hit the health endpoint again:
curl -s https://<your-worker>.workers.dev/api/health | jq .You should see database.status: "healthy" with a non-zero latency_ms.
Step 3 — Verify with the enhanced health probe
The fix also introduced an enhanced health check in worker/handlers/health.ts that runs SELECT current_database() instead of SELECT 1. This surfaces db_name and hyperdrive_host in the health response:
{ "services": { "database": { "status": "healthy", "latency_ms": 42, "db_name": "neondb", "hyperdrive_host": "...-pooler.us-east-2.aws.neon.tech" } }}Confirm that db_name matches the expected Neon database name. If it returns a different database name, the Hyperdrive binding is pointed at the wrong Neon project or branch.
Prevention
- The new
db_namefield in the health response acts as a continuous assertion that the correct database is connected. Monitor this value in your observability dashboards. - When configuring a new Hyperdrive binding, always run
wrangler hyperdrive get <id>to confirm theschemefield. If it is"postgres", ensure all Zod schemas that validate the connection string acceptpostgres://. - Add an integration test that builds
PrismaClientConfigSchema.parse()with apostgres://URL to catch future regressions.
Worker Code Reference
| File | Relevance |
|---|---|
worker/lib/prisma-config.ts | PrismaClientConfigSchema — validates the Hyperdrive connection string before PrismaClient is created |
worker/handlers/health.ts | handleHealth — runs the database probe; surface db_name and hyperdrive_host from the enhanced SELECT current_database() query |
ZTA Security Note
env.HYPERDRIVE.connectionString is a runtime binding secret — it is never logged, committed, or exposed in the health response. Only the host portion (hyperdrive_host) is surfaced for diagnostic purposes. The db_name field is also safe to expose: it is a non-secret label, not a credential.
Resolution Summary
| Symptom | Root Cause | Fix |
|---|---|---|
database: down, latency_ms: 0 | PrismaClientConfigSchema rejected postgres:// | Accept both postgres:// and postgresql:// in the regex |
| Hyperdrive dashboard shows zero activity | PrismaClient never created — Zod threw before any network call | Same fix as above |
Health shows wrong db_name | Hyperdrive binding points to wrong Neon project/branch | Update Hyperdrive binding origin in Cloudflare dashboard |
Related KB Articles
- KB-001 — “Getting API is not available” on the main page
- KB-003 — Database Down After Deploy — Live Debugging Session (2026-03-25)
- KB-004 —
WebAssembly.Module(): Wasm code generation disallowed by embedder— Prismaruntime = "deno"generates code blocked by Cloudflare Workers; fix isruntime = "cloudflare"+ regenerate - KB-005 — Better Auth Cloudflare integration issues: Worker CPU timeout on session endpoints and silent rate limiting bypass
- (planned) KB-006 — Cloudflare Queue consumer not processing messages
- (planned) KB-007 — Angular SPA serves stale build after worker deploy
Feedback & Contribution
If you discovered a new failure mode while using this article, please open an issue tagged troubleshooting and documentation in jaypatrick/bloqr-backend with the details so it can be captured in a follow-up KB entry.