mirror of
https://github.com/xCyanGrizzly/DragonsStash.git
synced 2026-06-13 12:41:16 +00:00
feat(worker): per-account safeguards for second-account upload failures
Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.
1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
processArchiveSets now filters out SkippedPackage rows whose
attemptCount has reached the cap. Removing them from the working
list means they are not tracked in minFailedId, so the watermark
cap from d99a506 does not pin progress below them anymore. A bad
file no longer blocks the rest of the channel forever; the user
can manually retry via the UI to reset the count.
2. Account phone in error messages
Every SkippedPackage row and SystemNotification produced from a
failure is now prefixed with [<phone>] in errorMessage / message,
and the JSON context includes accountPhone. When two accounts
share a source channel and only one is failing, the UI tells you
which one.
3. Explicit getChat for destination at run start
loadChats only loads main/archive/folder chat lists. If an account
archived or moved the destination chat, sendMessage failed silently
per-archive. Now we getChat the destination once per cycle; on
failure we record a SystemNotification and skip the account's
entire ingestion cycle (no point downloading what we can't upload).
4. Retry on transient Telegram server errors
The "Turnbase Delivery Folder.7z" failure on the secondary and
"10. Kingdom of the Depth.part1.rar" on the main were both
"Internal Server Error during file upload" — a TG-side hiccup, not
a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
with linear backoff (15s, 30s, 45s + jitter) before giving up.
5. Channel-access-lost notification
"Iridium 2 w/ Add-ons [Completed]" has been throwing
"Can't access the chat" every cycle for the secondary. The worker
now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
24h per channel/account) so the admin sees it and can re-join or
unlink the channel instead of just losing visibility into the loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -136,6 +136,28 @@ async function sendWithRetry(
|
||||
);
|
||||
}
|
||||
|
||||
// Transient Telegram server-side error (HTTP 5xx returned via
|
||||
// updateMessageSendFailed). These are NOT FLOOD_WAIT, NOT stalls — just
|
||||
// TG having a bad moment. They typically resolve on a short backoff, so
|
||||
// retry up to MAX_UPLOAD_RETRIES with linear backoff before giving up.
|
||||
const lowerMsg = errMsg.toLowerCase();
|
||||
const isTransientServerError =
|
||||
lowerMsg.includes("internal server error") ||
|
||||
lowerMsg.includes("internal error") ||
|
||||
lowerMsg.includes("server error") ||
|
||||
lowerMsg.includes("bad gateway") ||
|
||||
lowerMsg.includes("service unavailable") ||
|
||||
lowerMsg.includes("gateway timeout");
|
||||
if (isTransientServerError && !isLastAttempt) {
|
||||
const backoffMs = 15_000 * (attempt + 1) + Math.random() * 5_000;
|
||||
log.warn(
|
||||
{ fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES, backoffMs: Math.round(backoffMs) },
|
||||
`Transient Telegram server error — retrying after backoff`
|
||||
);
|
||||
await sleep(backoffMs);
|
||||
continue;
|
||||
}
|
||||
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user