Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.
1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
processArchiveSets now filters out SkippedPackage rows whose
attemptCount has reached the cap. Removing them from the working
list means they are not tracked in minFailedId, so the watermark
cap from d99a506 does not pin progress below them anymore. A bad
file no longer blocks the rest of the channel forever; the user
can manually retry via the UI to reset the count.
2. Account phone in error messages
Every SkippedPackage row and SystemNotification produced from a
failure is now prefixed with [<phone>] in errorMessage / message,
and the JSON context includes accountPhone. When two accounts
share a source channel and only one is failing, the UI tells you
which one.
3. Explicit getChat for destination at run start
loadChats only loads main/archive/folder chat lists. If an account
archived or moved the destination chat, sendMessage failed silently
per-archive. Now we getChat the destination once per cycle; on
failure we record a SystemNotification and skip the account's
entire ingestion cycle (no point downloading what we can't upload).
4. Retry on transient Telegram server errors
The "Turnbase Delivery Folder.7z" failure on the secondary and
"10. Kingdom of the Depth.part1.rar" on the main were both
"Internal Server Error during file upload" — a TG-side hiccup, not
a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
with linear backoff (15s, 30s, 45s + jitter) before giving up.
5. Channel-access-lost notification
"Iridium 2 w/ Add-ons [Completed]" has been throwing
"Can't access the chat" every cycle for the secondary. The worker
now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
24h per channel/account) so the admin sees it and can re-join or
unlink the channel instead of just losing visibility into the loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sendMessage resolves with the temporary message ID inside a .then()
microtask. If TDLib emits updateMessageSendSucceeded synchronously
(cached file, already-known media), the event handler fires while
tempMsgId is still null — the success is dropped and the promise hangs
until the 15-min upload timeout fires.
Buffer success/failure events that arrive before tempMsgId is known,
then replay them in the .then() callback once tempMsgId is set.
Extract completeWithSuccess / completeWithFailure helpers so the
resolution path is shared between live events and replayed events.
This race matters more now that stalls fail fast — without the buffer,
a fast-completing upload could still hang for 15 min before recovery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously a single TDLib event-stream degradation cost ~45 minutes
per archive: 3 retries x 15-min minimum timeout, all on the same
broken client. The retries had no chance of succeeding because the
underlying issue (missing updateMessageSendSucceeded events) is a
client-level problem, not a transient send failure.
Now the first stall throws UploadStallError immediately. The caller
in processArchiveSets already recreates the TDLib client on
UploadStallError, so we drop from ~45 min recovery to ~15 min
(one timeout cycle) per stalled archive.
The stalled set is recorded in SkippedPackage; with the watermark
cap from d99a506 it gets retried on the next ingestion cycle with
a fresh client.
FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle
legitimate rate limiting, not stalls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When TDLib's event stream degrades, uploads complete (bytes sent) but
confirmations never arrive. Previously the worker retried 3x with the
same broken client, wasting 60+ min per archive and holding the mutex.
- Add UploadStallError class to distinguish stalls from other failures
- Reduce stall detection timeout from 5min to 3min (faster detection)
- Recreate TDLib client after consecutive upload stalls instead of
retrying on the same degraded connection
- Add forceReleaseMutex() to prevent cascade failures when one account
blocks others via stuck mutex after cycle timeout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-part send fix:
- Add destMessageIds BigInt[] to Package schema with backfill migration
- Worker uploadToChannel now returns all message IDs, stored in DB
- Bot forwards all parts of multi-part archives (not just the first)
- Add retry logic for upload rate limits (429) and download stalls
Kickstarter package linking:
- Add package search/linking queries and API routes
- Add PackageLinkerDialog with search + checkbox selection
- Add "Link Packages" and "Send All" actions to kickstarter table
- Add sendAllKickstarterPackages server action
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add downloadStarted flag to prevent false "stopped unexpectedly" errors
when TDLib emits initial updateFile before download is active
- Add 5-minute stall detection for both downloads and uploads
- Reduce max split part size from 2GiB to 1950MiB to stay under
Telegram's internal upload part count limits
- Increase timeouts from max(10min, 15min/GB) to max(15min, 20min/GB)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Distinguish failure reasons: inspect error messages to label skipped
packages as DOWNLOAD_FAILED, UPLOAD_FAILED, or EXTRACT_FAILED
instead of catch-all DOWNLOAD_FAILED.
2. Detect orphaned uploads: before uploading, check if the same content
hash already has a successful upload on the destination channel. Reuse
the existing message ID instead of re-uploading (prevents duplicates
when worker crashed between upload and DB write).
3. Increase timeouts: download from max(5min, GB*10min) to
max(10min, GB*15min), upload from GB*10min to GB*15min.
Prevents premature timeouts on slow connections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds full Telegram ZIP ingestion pipeline: TDLib worker service scans source
channels for archive files, deduplicates by content hash, extracts metadata,
uploads to archive channel, and indexes in Postgres. Forum supergroups are
scanned per-topic with topic names used as creator. Filename-based creator
extraction (e.g. "Mammoth Factory - 2026-01.zip") serves as fallback.
Includes admin UI for managing accounts/channels, simplified account setup
(API credentials via env vars), auth code/password submission dialog,
package browser with creator column, and live ingestion activity tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>