Commit Graph

11 Commits

Author SHA1 Message Date
379bf246cd feat(worker): per-account safeguards for second-account upload failures
Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.

1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
   processArchiveSets now filters out SkippedPackage rows whose
   attemptCount has reached the cap. Removing them from the working
   list means they are not tracked in minFailedId, so the watermark
   cap from d99a506 does not pin progress below them anymore. A bad
   file no longer blocks the rest of the channel forever; the user
   can manually retry via the UI to reset the count.

2. Account phone in error messages
   Every SkippedPackage row and SystemNotification produced from a
   failure is now prefixed with [<phone>] in errorMessage / message,
   and the JSON context includes accountPhone. When two accounts
   share a source channel and only one is failing, the UI tells you
   which one.

3. Explicit getChat for destination at run start
   loadChats only loads main/archive/folder chat lists. If an account
   archived or moved the destination chat, sendMessage failed silently
   per-archive. Now we getChat the destination once per cycle; on
   failure we record a SystemNotification and skip the account's
   entire ingestion cycle (no point downloading what we can't upload).

4. Retry on transient Telegram server errors
   The "Turnbase Delivery Folder.7z" failure on the secondary and
   "10. Kingdom of the Depth.part1.rar" on the main were both
   "Internal Server Error during file upload" — a TG-side hiccup, not
   a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
   with linear backoff (15s, 30s, 45s + jitter) before giving up.

5. Channel-access-lost notification
   "Iridium 2 w/ Add-ons [Completed]" has been throwing
   "Can't access the chat" every cycle for the secondary. The worker
   now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
   24h per channel/account) so the admin sees it and can re-join or
   unlink the channel instead of just losing visibility into the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:07:57 +02:00
26e2cba69d fix: buffer upload confirmation events to close tempMsgId race
sendMessage resolves with the temporary message ID inside a .then()
microtask. If TDLib emits updateMessageSendSucceeded synchronously
(cached file, already-known media), the event handler fires while
tempMsgId is still null — the success is dropped and the promise hangs
until the 15-min upload timeout fires.

Buffer success/failure events that arrive before tempMsgId is known,
then replay them in the .then() callback once tempMsgId is set.
Extract completeWithSuccess / completeWithFailure helpers so the
resolution path is shared between live events and replayed events.

This race matters more now that stalls fail fast — without the buffer,
a fast-completing upload could still hang for 15 min before recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:48:38 +02:00
84cc8d995b fix: fail fast on upload stall instead of retrying on broken client
Previously a single TDLib event-stream degradation cost ~45 minutes
per archive: 3 retries x 15-min minimum timeout, all on the same
broken client. The retries had no chance of succeeding because the
underlying issue (missing updateMessageSendSucceeded events) is a
client-level problem, not a transient send failure.

Now the first stall throws UploadStallError immediately. The caller
in processArchiveSets already recreates the TDLib client on
UploadStallError, so we drop from ~45 min recovery to ~15 min
(one timeout cycle) per stalled archive.

The stalled set is recorded in SkippedPackage; with the watermark
cap from d99a506 it gets retried on the next ingestion cycle with
a fresh client.

FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle
legitimate rate limiting, not stalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:47:08 +02:00
d6c82ede1e fix: auto-recover from TDLib upload stalls by recreating client
When TDLib's event stream degrades, uploads complete (bytes sent) but
confirmations never arrive. Previously the worker retried 3x with the
same broken client, wasting 60+ min per archive and holding the mutex.

- Add UploadStallError class to distinguish stalls from other failures
- Reduce stall detection timeout from 5min to 3min (faster detection)
- Recreate TDLib client after consecutive upload stalls instead of
  retrying on the same degraded connection
- Add forceReleaseMutex() to prevent cascade failures when one account
  blocks others via stuck mutex after cycle timeout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 18:02:42 +02:00
718007446f feat: fix multi-part archive forwarding and add kickstarter package linking
All checks were successful
continuous-integration/drone/push Build is passing
Multi-part send fix:
- Add destMessageIds BigInt[] to Package schema with backfill migration
- Worker uploadToChannel now returns all message IDs, stored in DB
- Bot forwards all parts of multi-part archives (not just the first)
- Add retry logic for upload rate limits (429) and download stalls

Kickstarter package linking:
- Add package search/linking queries and API routes
- Add PackageLinkerDialog with search + checkbox selection
- Add "Link Packages" and "Send All" actions to kickstarter table
- Add sendAllKickstarterPackages server action

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:11:35 +01:00
d6386209be fix: improve download/upload reliability and fix FILE_PARTS_INVALID
- Add downloadStarted flag to prevent false "stopped unexpectedly" errors
  when TDLib emits initial updateFile before download is active
- Add 5-minute stall detection for both downloads and uploads
- Reduce max split part size from 2GiB to 1950MiB to stay under
  Telegram's internal upload part count limits
- Increase timeouts from max(10min, 15min/GB) to max(15min, 20min/GB)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 21:40:00 +01:00
fe28c31b9e fix: improve worker error handling and reliability
All checks were successful
continuous-integration/drone/push Build is passing
1. Distinguish failure reasons: inspect error messages to label skipped
   packages as DOWNLOAD_FAILED, UPLOAD_FAILED, or EXTRACT_FAILED
   instead of catch-all DOWNLOAD_FAILED.

2. Detect orphaned uploads: before uploading, check if the same content
   hash already has a successful upload on the destination channel. Reuse
   the existing message ID instead of re-uploading (prevents duplicates
   when worker crashed between upload and DB write).

3. Increase timeouts: download from max(5min, GB*10min) to
   max(10min, GB*15min), upload from GB*10min to GB*15min.
   Prevents premature timeouts on slow connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 02:37:23 +01:00
xCyanGrizzly
d7bbb7587e Update tg issues 2026-03-16 16:51:30 +01:00
2763de2711 Fix multiple issues 2026-03-07 21:33:40 +01:00
xCyanGrizzly
4d0df6b1a4 addd TG integration 2026-03-02 11:57:17 +01:00
xCyanGrizzly
b427193d17 feat: add Telegram integration with forum topic support and creator tracking
Adds full Telegram ZIP ingestion pipeline: TDLib worker service scans source
channels for archive files, deduplicates by content hash, extracts metadata,
uploads to archive channel, and indexes in Postgres. Forum supergroups are
scanned per-topic with topic names used as creator. Filename-based creator
extraction (e.g. "Mammoth Factory - 2026-01.zip") serves as fallback.

Includes admin UI for managing accounts/channels, simplified account setup
(API credentials via env vars), auth code/password submission dialog,
package browser with creator column, and live ingestion activity tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 16:02:06 +01:00