Neon PostgreSQL Troubleshooting
Neon PostgreSQL Troubleshooting
Quick Reference — Diagnosis and resolution for common Neon PostgreSQL, Hyperdrive, Prisma, and Better Auth issues in the bloqr-backend stack.
Table of Contents
- Connection Architecture & Failure Points
- Connection Pool Exhaustion (Hyperdrive)
- Prisma Migration Errors
- Hyperdrive Timeout Handling
- Local Development Issues
- Neon Branching Issues
- Better Auth + Neon Issues
- Diagnostic Commands
Connection Architecture & Failure Points
Every database request passes through multiple layers, each with its own failure modes. The diagram below highlights where things typically go wrong:
flowchart LR
A["Client Request"] --> B["Cloudflare Worker<br/>(Hono)"]
B --> C["createPrismaClient()"]
C --> D["@prisma/adapter-pg<br/>(PrismaPg)"]
D --> E["Cloudflare Hyperdrive<br/>(Connection Pool)"]
E --> F["Neon PostgreSQL<br/>(Azure East US 2)"]
B -.-x|"❶ HYPERDRIVE undefined"| C
C -.-x|"❷ Zod validation fail"| D
D -.-x|"❸ Pool exhausted"| E
E -.-x|"❹ Timeout / TLS error"| F
F -.-x|"❺ Migration lock / schema drift"| F
style A fill:#e8f4f8,stroke:#2196F3
style B fill:#fff3e0,stroke:#FF9800
style C fill:#f3e5f5,stroke:#9C27B0
style D fill:#f3e5f5,stroke:#9C27B0
style E fill:#e8f5e9,stroke:#4CAF50
style F fill:#e3f2fd,stroke:#1565C0
| Failure Point | Layer | Typical Symptom |
|---|---|---|
| ❶ Binding missing | Worker → Prisma | TypeError: Cannot read properties of undefined (reading 'connectionString') |
| ❷ Bad connection string | Prisma factory | ZodError: String must be a valid URL |
| ❸ Pool exhausted | Hyperdrive | Error: too many connections or request hangs |
| ❹ Timeout / TLS | Hyperdrive → Neon | Error: connect ETIMEDOUT or SSL handshake failures |
| ❺ Schema drift | Neon database | Error: column "X" does not exist or migration lock errors |
Connection Pool Exhaustion (Hyperdrive)
Symptoms
- Requests hang for 10–30 seconds then fail
Error: too many connections for role "..."in Worker logs- Intermittent 500 errors under moderate load
- Neon dashboard shows connection count at or near limit
Root Causes
- PrismaClient not disconnected — forgetting
await prisma.$disconnect()keeps the Hyperdrive proxy socket open - Multiple PrismaClient instances per request — creating separate clients for auth and route handlers without sharing via middleware
- Long-running transactions — holding connections during slow operations
Resolution
Always disconnect in a finally block:
const prisma = createPrismaClient(c.env.HYPERDRIVE!.connectionString);try { const user = await prisma.user.findUnique({ where: { id } }); return c.json({ user });} finally { await prisma.$disconnect();}Use the Prisma middleware for request-scoped sharing:
// worker/middleware/prisma-middleware.ts handles this automaticallyapp.use('/api/*', prismaMiddleware());
// Both Better Auth and route handlers share the same PrismaClientapp.get('/api/users', async (c) => { const prisma = c.get('prisma'); // shared instance // ...});Check connection count in Neon:
-- Run via Neon SQL Editor or psqlSELECT count(*) FROM pg_stat_activity WHERE datname = 'neondb';
-- See who's holding connectionsSELECT pid, usename, application_name, state, query_startFROM pg_stat_activityWHERE datname = 'neondb'ORDER BY query_start DESC;Prisma Migration Errors
”prepared statement already exists”
Symptom: prisma migrate dev or prisma migrate deploy fails with pooling errors.
Cause: Migration commands use interactive transactions that conflict with connection pooling (PgBouncer in Neon’s pooler endpoint).
Fix: Always use DIRECT_DATABASE_URL (the non-pooled endpoint) for migrations:
# .env.local — note: NO "-pooler" in the hostnameDIRECT_DATABASE_URL=postgresql://<user>:<password>@ep-winter-term-a8rxh2a9.eastus2.azure.neon.tech/neondb?sslmode=requireThe prisma.config.ts automatically prefers DIRECT_DATABASE_URL over DATABASE_URL.
”migration failed — column already exists”
Symptom: A migration tries to add a column that already exists in the database.
Cause: Schema was pushed with db push (which doesn’t create migration files), then
a migration was created that duplicates the change.
Fix:
# Mark the problematic migration as already applieddeno run -A npm:prisma migrate resolve --applied <migration_name>
# Or reset the migration history (local dev only!)deno task db:local:reset“P3009: migrate found failed migrations”
Symptom: A previous migration failed partway, leaving the database in a dirty state.
Fix:
# 1. Check which migration faileddeno run -A npm:prisma migrate status
# 2. Fix the underlying issue, then mark it resolveddeno run -A npm:prisma migrate resolve --rolled-back <migration_name>
# 3. Re-run migrationsdeno task db:migrate“Cannot find module ‘prisma/generated/client.ts’”
Symptom: Import error after schema changes or fresh clone.
Fix: Always use the Deno task (not raw npx prisma generate):
deno task db:generateThis runs prisma generate followed by scripts/prisma-fix-imports.ts, which rewrites
import paths for Deno compatibility.
Hyperdrive Timeout Handling
Request Timeout (Worker-Side)
Symptom: Routes that involve database queries return 524 (timeout) after ~30 seconds.
Cause: Neon’s serverless compute may need a cold start (~1–5 seconds), and if the Worker is placed far from Neon, round-trip latency adds up.
Fix:
-
Verify Smart Placement is enabled in
wrangler.toml:[placement]mode = "smart" -
Add query timeouts to prevent unbounded database calls:
// Use Prisma's built-in timeout (in milliseconds)const user = await prisma.user.findUnique({where: { id },});// Note: Prisma driver adapter doesn't support statement_timeout natively;// set it at the PostgreSQL level if needed:// ALTER ROLE your_user SET statement_timeout = '10s'; -
Monitor Neon cold starts in the Neon dashboard → Monitoring → Compute activity
Stale Connections
Symptom: Queries intermittently fail with Error: Connection terminated unexpectedly
or Error: This socket has been ended by the other party.
Cause: Neon scales compute to zero after 5 minutes of inactivity. If Hyperdrive holds a connection that Neon closed, the next query on that socket fails.
Fix: Hyperdrive handles reconnection automatically. If you see persistent issues:
# Verify Hyperdrive confignpx wrangler hyperdrive get 800f7e2edc86488ab24e8621982e9ad7
# Recreate the Hyperdrive config if needednpx wrangler hyperdrive update 800f7e2edc86488ab24e8621982e9ad7 \ --origin-host=ep-winter-term-a8rxh2a9-pooler.eastus2.azure.neon.tech \ --origin-port=5432 \ --database=neondb \ --origin-user=<user> \ --origin-password=<password>SSL / Channel Binding Errors
Symptom: Error: channel binding is required but server did not offer it
Fix: Add channel_binding=require to the connection string (Neon’s pooler
requires it in some configurations):
postgresql://user:pass@...-pooler.eastus2.azure.neon.tech/db?sslmode=require&channel_binding=requireLocal Development Issues
”HYPERDRIVE is undefined” in Local Dev
Symptom: TypeError: Cannot read properties of undefined (reading 'connectionString')
Cause: .dev.vars is missing or has an empty Hyperdrive local override.
Fix:
# .dev.vars (gitignored) — point at your personal Neon dev branchCLOUDFLARE_HYPERDRIVE_LOCAL_CONNECTION_STRING_HYPERDRIVE=postgresql://<user>:<password>@<branch-host>.neon.tech/<dbname>?sslmode=requireCreate a personal dev branch at console.neon.tech → your project → Branches → New Branch. Use the Direct connection string (not pooled). See Local Dev Setup.
⚠️ Restart
wrangler devafter changing.dev.vars— it only reads the file at startup.
”SSL required” / sslmode errors
Symptom: Error: SSL required or Error: PGGSSENCMODE / TLS handshake failure.
Cause: The connection string is missing ?sslmode=require (required by Neon).
Fix: Append ?sslmode=require to every Neon connection string in .dev.vars and .env.local:
# Replace <user>, <password>, <branch-host>, and <dbname> with your Neon branch valuesCLOUDFLARE_HYPERDRIVE_LOCAL_CONNECTION_STRING_HYPERDRIVE=postgresql://<user>:<password>@<branch-host>.neon.tech/<dbname>?sslmode=requireDATABASE_URL="postgresql://<user>:<password>@<branch-host>.neon.tech/<dbname>?sslmode=require"DIRECT_DATABASE_URL="postgresql://<user>:<password>@<branch-host>.neon.tech/<dbname>?sslmode=require"“Cannot connect to server” / ETIMEDOUT
Symptom: Connections time out during wrangler dev or prisma migrate.
Cause: The Neon branch may be suspended or the connection string is incorrect.
Fix:
# 1. Verify your Neon branch is active# https://console.neon.tech → your project → Branches → check branch status
# 2. Test the connection directly# Replace <user>, <password>, <branch-host>, and <dbname> with your Neon branch valuespsql "postgresql://<user>:<password>@<branch-host>.neon.tech/<dbname>?sslmode=require" -c "SELECT 1 AS ok;"
# 3. Ensure your connection string uses the direct (non-pooler) hostname# ✅ ep-<name>.<region>.neon.tech (direct — use for local dev)# ❌ ep-<name>-pooler.<region>.neon.tech (pooler — only for production Hyperdrive)Neon Branching Issues
Branch Creation Fails
Symptom: NeonApiService.createBranch() returns a 400 or 422 error.
Common causes:
| Error | Cause | Fix |
|---|---|---|
branches_limit_exceeded | Free plan allows max 10 branches | Delete unused branches |
parent branch not found | Invalid parent branch ID | Use listBranches() to find correct ID |
project not found | Wrong project ID or API key scope | Verify NEON_PROJECT_ID |
Cleanup stale branches:
// Using the NeonApiServiceconst neon = createNeonApiService({ apiKey: env.NEON_API_KEY });const branches = await neon.listBranches('twilight-river-73901472');
// Find branches older than 7 days (excluding main)const stale = branches.filter(b => b.name !== 'main' && new Date(b.created_at) < new Date(Date.now() - 7 * 24 * 60 * 60 * 1000));
for (const branch of stale) { await neon.deleteBranch('twilight-river-73901472', branch.id);}Branch Endpoint Not Ready
Symptom: Branch was created but connection fails with timeout.
Cause: Neon endpoints take a few seconds to become active after branch creation.
Fix: Poll the endpoint status before connecting:
const { branch } = await neon.createBranch(projectId, { name: 'preview/pr-42' });const endpoints = await neon.listEndpoints(projectId);const ep = endpoints.find(e => e.branch_id === branch.id);// Wait for endpoint to become active (Neon Operations API handles this)Better Auth + Neon Issues
Session Creation Fails (500 on /api/auth/sign-in)
Symptom: Login returns 500, Worker logs show a database error.
Diagnosis checklist:
-
Is HYPERDRIVE configured?
Terminal window # Check wrangler.toml has [[hyperdrive]] sectiongrep -A2 'hyperdrive' wrangler.toml -
Is BETTER_AUTH_SECRET set?
Terminal window # Local dev: check .dev.varsgrep 'BETTER_AUTH_SECRET' .dev.vars -
Do the auth tables exist?
-- Check via Neon SQL EditorSELECT table_name FROM information_schema.tablesWHERE table_schema = 'public'AND table_name IN ('users', 'sessions', 'account', 'verification'); -
Run migrations if tables are missing:
Terminal window deno task db:migrate
Adapter Errors (“Cannot read property of undefined”)
Symptom: prismaAdapter throws during Better Auth initialization.
Cause: The PrismaClient was created with an invalid connection string, or the Prisma schema doesn’t match the database.
Fix:
# 1. Regenerate the Prisma clientdeno task db:generate
# 2. Verify schema matches the databasedeno run -A npm:prisma db pull # introspect live DBdeno run -A npm:prisma migrate status # check driftToken Validation Fails After Session Creation
Symptom: Sign-in succeeds but subsequent API calls return 401.
Cause: The session table’s token column may not be indexed, or the session
was created but the expiresAt is in the past.
Debug:
-- Check session exists and is not expiredSELECT id, token, expires_at, created_atFROM sessionsWHERE user_id = '<user-id>'ORDER BY created_at DESCLIMIT 5;Better Auth + Clerk Fallback Conflict
Symptom: Authenticated requests intermittently fail when both providers are active.
Cause: A client sends a Clerk JWT, Better Auth rejects it (not its format), and the Clerk fallback path is disabled.
Fix: Either:
- Enable the fallback: ensure
DISABLE_CLERK_FALLBACKis not set totrue - Or migrate the client to use Better Auth credentials
See Auth Chain Reference for the full authentication flow.
Diagnostic Commands
Quick Health Check
# Worker dev server running?curl -s http://localhost:8787/health | jq .
# Database reachable? (via Prisma)deno task db:studio # opens GUI at http://localhost:5555
# Neon API reachable?curl -s -H "Authorization: Bearer $NEON_API_KEY" \ "https://console.neon.tech/api/v2/projects/twilight-river-73901472" | jq .nameDatabase Connectivity
# Test direct PostgreSQL connection to your Neon dev branch (requires psql)psql "$DIRECT_DATABASE_URL" -c "SELECT 1 AS ok;"
# Check Prisma migration status against your Neon dev branchdeno run -A npm:prisma migrate statusWrangler / Hyperdrive
# Verify Hyperdrive bindingnpx wrangler hyperdrive get 800f7e2edc86488ab24e8621982e9ad7
# Check Worker environmentnpx wrangler dev --log-level=debugFurther Reading
- Neon Setup — Production Neon configuration
- Local Dev Guide — Neon branching setup for local development
- Auth Chain Reference — Authentication flow details
- Better Auth + Prisma — Prisma adapter configuration
- Neon Documentation — Official Neon docs
- Cloudflare Hyperdrive — Hyperdrive troubleshooting
Live Troubleshooting Session — 2026-03-25
This section documents the full sequence of events from the live debugging session that occurred on 2026-03-25. It is captured here as a reference for future on-call engineers.
Sequence of Events
- User noticed UI banners — “Degraded performance — v0.75.0” and “Data may be stale” — on every page visit, regardless of login state
- Initial diagnosis —
curl /api/healthshoweddatabase.status: "down"withlatency_ms: 0 - Checked Hyperdrive dashboard — zero traffic. Neon dashboard showed migration activity (migrations had run direct via
DIRECT_DATABASE_URL, not through Hyperdrive) - Ran
wrangler hyperdrive get— confirmed"scheme": "postgres"in the Hyperdrive binding config - Found root cause —
PrismaClientConfigSchemaonly acceptedpostgresql://; Hyperdrive returnspostgres:// - Applied schema fix — updated the regex to accept both schemes; deployed at v0.76.0
- Database still down at v0.76.0 —
latency_ms: 0persisted; catch block was silently swallowing errors - Added error surfacing — new
error_codeanderror_messagefields in the health response catch block - Added smoke test endpoint —
GET /api/health/db-smokefor detailed post-deploy diagnostics
What .hyperdrive.local Means (Important)
When you see hyperdrive_host: "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local" in the health response, this is correct and expected for a deployed Cloudflare Worker. It is the Cloudflare-managed internal proxy socket address that Hyperdrive injects at runtime. It is NOT a sign of a misconfiguration.
The .hyperdrive.local host is Hyperdrive’s local connection proxy. When your Worker runs env.HYPERDRIVE.connectionString, Cloudflare resolves this to a postgres://...hyperdrive.local/... URL that routes through the Hyperdrive proxy to your origin database. This is the architecture working as intended.
Bottom line: seeing .hyperdrive.local in hyperdrive_host means the binding IS connected to the proxy. The issue is always further down the stack.
Zero Hyperdrive Dashboard Traffic vs Non-Zero Neon Traffic
If the Hyperdrive dashboard shows zero queries but Neon shows activity, it means:
- Migrations ran directly via
DIRECT_DATABASE_URL/DATABASE_URL(which use the real Neon pooler URL, bypassing Hyperdrive) - Worker queries are failing at instantiation — the Prisma client is never created, so no queries ever reach Hyperdrive
This pattern is consistent with a Zod validation failure (latency_ms: 0) — the probe throws before opening any connection.
The latency_ms: 0 Instant Failure Pattern
latency_ms value | What it means |
|---|---|
0 | Failure happened synchronously — at schema validation or client construction. No network I/O occurred. |
1–50 | Very fast failure — connection was attempted but immediately refused (port unreachable, wrong host). |
1000–5000 | TCP timeout — host is reachable but not responding. |
5000 (exactly) | Probe timeout — Worker hit the 5s timeout guard added in the hardening PR. |
wrangler tail as the First Step for Production Debugging
Always run wrangler tail first when diagnosing a production issue. It shows the actual exception thrown by your Worker in real time, including ZodError messages with field paths that pinpoint exactly which validation failed.
wrangler tail --format=pretty# Look for: ZodError, connection refused, P2024, PROBE_TIMEOUT, etc.Without wrangler tail, the health response’s status: "down" is the only signal — with it, you get the full stack trace.
The Smoke Test Endpoint as the Canonical Post-Deploy Check
After every production deploy, run:
curl -s https://<your-worker>.workers.dev/api/health/db-smoke | jq .This endpoint (GET /api/health/db-smoke) runs current_database(), version(), now(), and COUNT(*) on information_schema.tables to verify:
- The Hyperdrive connection reaches Neon
- The correct database is selected
- The public schema has tables (migrations have run)
- The query latency is reasonable
If it returns ok: false, the error field (redacted of any credentials) will identify the failure layer immediately — no wrangler tail required.
Note on Future Sentry Integration
The current hardening captures error_code and error_message in the health response. A future improvement is to emit these as Sentry events via withSentryWorker() so production failures are automatically alerted rather than requiring manual health checks. See the Sentry integration docs for details.