fix: fail fast on upload stall instead of retrying on broken client

Previously a single TDLib event-stream degradation cost ~45 minutes
per archive: 3 retries x 15-min minimum timeout, all on the same
broken client. The retries had no chance of succeeding because the
underlying issue (missing updateMessageSendSucceeded events) is a
client-level problem, not a transient send failure.

Now the first stall throws UploadStallError immediately. The caller
in processArchiveSets already recreates the TDLib client on
UploadStallError, so we drop from ~45 min recovery to ~15 min
(one timeout cycle) per stalled archive.

The stalled set is recorded in SkippedPackage; with the watermark
cap from d99a506 it gets retried on the next ingestion cycle with
a fresh client.

FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle
legitimate rate limiting, not stalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 22:47:08 +02:00
parent d99a506b10
commit 84cc8d995b

View File

@@ -119,22 +119,20 @@ async function sendWithRetry(
continue;
}
// Stall or timeout — retry with a cooldown
// Stall or timeout — fail fast and let the caller recreate the TDLib
// client. Retrying on the same degraded event stream wastes ~15 min
// per attempt because the underlying issue (missing send-success
// events) is client-level, not transient. The set ends up in
// SkippedPackage and the caller's watermark cap ensures it gets
// retried next cycle on a fresh client.
const errMsg = err instanceof Error ? err.message : "";
if (errMsg.includes("stalled") || errMsg.includes("timed out")) {
if (!isLastAttempt) {
log.warn(
{ fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES },
"Upload stalled/timed out — retrying"
);
await sleep(10_000);
continue;
}
// All stall retries exhausted — throw UploadStallError so the caller
// knows the TDLib client's event stream is likely degraded and can
// recreate the client before continuing.
log.warn(
{ fileName, attempt: attempt + 1 },
"Upload stalled — failing fast so caller can recreate TDLib client"
);
throw new UploadStallError(
`Upload stalled after ${MAX_UPLOAD_RETRIES} retries for ${fileName}`
`Upload stalled for ${fileName}: ${errMsg}`
);
}