55 Commits

Author SHA1 Message Date
4f6a6f0f75 feat(worker): forum-topic scan-skip + getForumTopic short-circuit
All checks were successful
continuous-integration/drone/push Build is passing
Mirror of the non-forum guards from 1a4bc6f, scoped to forum topics
inside the topic loop:

  - Top-of-topic-loop recency/backoff skip
  - getForumTopic short-circuit after the SkippedPackage retry pass
  - upsertTopicScanState for end-of-scan persistence (both the
    archives-found path and the no-archives path)

Same trulyIdle definition throughout: no archives this scan, no
failures this scan, no retryable SkippedPackage rows pending. Topics
with chronic failures stay out of backoff because their counter
never increments.

For MPE specifically (1,086 forum topics), per-cycle searchChatMessages
calls drop from ~1,086 to roughly the count of topics with new
activity in the last 5 minutes — typically <50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:58:44 +02:00
1a4bc6f9f3 feat(worker): non-forum channel-scan-skip + getChat short-circuit
For non-forum channels in runWorkerForAccount, three guards:

  1. Top-of-loop recency/backoff skip — if recently scanned with no
     pending work, or in backoff and not its turn, skip entirely.
     Bypassed when retryable SkippedPackages exist.

  2. After the SkippedPackage retry pass, a getChat short-circuit —
     if TDLib's local cache says the channel's last_message.id <= our
     effective watermark, skip the paginated searchChatMessages.

  3. End-of-scan persists lastScannedAt + lastScanFoundArchives +
     consecutiveEmptyScans via the new upsertChannelScanState helper.
     trulyIdle requires: no archives, no failures, no retryable pending.

scheduler.ts exposes getCurrentCycle() so the backoff "every Nth cycle"
modulo can be applied.

Forum-topic branch lands in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:56:54 +02:00
c6b23715e8 feat(tdlib): add getChannelLastMessageId / getForumTopicLastMessageId
Both read the server-side last message ID from TDLib's local cache.
Used by the channel-scan-skip guard to short-circuit a paginated
searchChatMessages when last_message.id <= our watermark.

getForumTopic uses forum_topic_id (renamed from message_thread_id in
TDLib 1.8.64, same pattern as searchChatMessages / getForumTopics).

Returns null on any failure so the caller can fall back to scanning —
we'd rather waste a scan than miss new content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:54:53 +02:00
3111d658f8 feat(db): add upsertChannelScanState / upsertTopicScanState helpers
Wraps the existing watermark write with the three new scan-state
columns from the previous commit. Single transaction, sets
lastScannedAt=NOW() server-side. Caller is responsible for computing
the trulyIdle bool and the new consecutiveEmptyScans value
(pre-increment vs reset).

Existing updateLastProcessedMessage / upsertTopicProgress are kept for
callers that don't need the new fields (the SkippedPackage retry pass,
which only adjusts the watermark).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:53:50 +02:00
6652fb8bc4 feat(config): add three scan-skip tuning env vars
WORKER_SKIP_RECENT_SCAN_WINDOW_MS    (default 300000 = 5 min)
  WORKER_EMPTY_SCAN_BACKOFF_THRESHOLD  (default 5 cycles)
  WORKER_EMPTY_SCAN_BACKOFF_EVERY_NTH  (default 5)

All optional with safe defaults. Not yet read by any code — the worker
integration lands in follow-up commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:53:12 +02:00
ff846b8e8e feat(db): add scan-state columns to AccountChannelMap + TopicProgress
Three new fields on each table:
  - lastScannedAt          — when the worker last touched this scope
  - lastScanFoundArchives  — true if last scan had archives OR pending
                             retryables; tracks "work might need revisit"
  - consecutiveEmptyScans  — counter for cold-channel backoff

Schema change only. Worker logic in follow-up commits. Migration is a
metadata-only ALTER (NOT NULL with default) so it runs in ms even on
21k+ Package rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:52:28 +02:00
3be3509151 docs: add implementation plan for channel-scan skip optimization
7-task plan covering schema migration, config knobs, DB + TDLib helpers,
and wiring the skip guards + getChat/getForumTopic short-circuits into
both the forum and non-forum branches of runWorkerForAccount.

Each task ends with a type-check step before its commit so the tree
compiles after every step. Task 7 is manual verification covering
restart safety, failure-retry preservation, and backoff behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:45:46 +02:00
6223c47549 docs: add channel-scan skip optimization design spec
Design for adding lastScannedAt + lastScanFoundArchives +
consecutiveEmptyScans columns to AccountChannelMap and TopicProgress,
plus a getChat / getForumTopicInfo short-circuit before
searchChatMessages.

Goal: on restart and during cold-channel cycles, skip scanning channels
and forum topics that have nothing new. For MPE specifically, drops
the per-cycle API call count from ~1,086 to ~50.

Key safety rule: "truly idle" requires both no new archives AND no
retryable SkippedPackage rows pending. The 901f32f retry pass continues
to run unchanged. Failure retries are never skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:26:36 +02:00
13b261c0c8 fix(worker): make pre-upload integrity test advisory, not a hard gate
Diagnosed from production: main was rejecting almost every 7z file with
exit code 137 — kernel OOM-killing 7z t mid-test. p7zip needs to
decompress into memory to verify CRCs; ~1.5GB+ 7z archives with solid
compression exhaust the container's RAM and get SIGKILL'd.

Plus the multipart ZIP false-positive from yesterday (unzip -t can't
span .zip.001 chunks).

Both failure modes are tool limitations, not actual corruption. But
the integrity test in 04effed was a hard gate that THREW on any
non-success, blocking the upload. Result: dozens of valid archives
downloaded then thrown away over the past 6 hours.

This commit demotes the test from gate → advisory:

  - Failures get logged at warn level with the actual reason
  - A SystemNotification is emitted so the admin sees them in the UI
  - Encrypted archives get a clearer notification title but STILL
    proceed (the existing UI gives the user a way to see what's
    encrypted and decide what to do)
  - Upload proceeds normally — we have hash verification + archive
    metadata parse for the structural integrity signals we actually
    need

Multipart ZIPs are still skipped entirely (they can't be tested at
all without concatenation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 17:57:12 +02:00
25a6196262 fix(worker): skip integrity test for multipart ZIPs — unzip -t can't span them
All checks were successful
continuous-integration/drone/push Build is passing
Diagnosed from production: main downloaded several 28 GB ZIP sets
(CA 3D STUDIOS 2023-07.zip.001..007, 2023-08.zip.001..006, ...) and
rejected every one of them with:

  "Archive integrity check failed: Command failed:
   unzip -tqq /tmp/zips/.../CA 3D STUDIOS 2023-07.zip.001"

Root cause: the integrity test I added in 04effed passed `uploadPaths[0]`
to the archive tester. For byte-split multipart ZIPs (`.zip.001`,
`.zip.002`, ...), the first chunk isn't a valid ZIP on its own — the
central directory only exists at the END of the assembled archive.
unzip's spanned-ZIP support uses `.z01/.z02/.../.zip` naming, not
`.zip.001/.002`, so even pointing at the assembled-form parts wouldn't
help.

Three correctness changes:

  1. Test runs on `tempPaths[0]` (the original downloaded file) instead
     of `uploadPaths[0]` (which may be byte-split chunks we created).
     For single-file ZIPs we re-split, this still tests the unsplit
     original.

  2. Skip the test entirely when archiveType=ZIP AND tempPaths.length>1
     — these are source multipart ZIPs we can't validate without
     concatenating, and the hash check + central-directory parse we
     already do are sufficient structural signals.

  3. RAR and 7Z multipart still ARE tested — `unrar t` and `7z t` both
     auto-discover sibling parts when pointed at the first one.

This unblocks all multipart-ZIP ingestion for the main account. Hours
of downloaded archives that were being rejected will now pass through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 11:15:07 +02:00
166dc556c9 fix(worker): match old General-topic progress rows under new TDLib forum_topic_id
All checks were successful
continuous-integration/drone/push Build is passing
After the TDLib 1.8.50 → 1.8.64 upgrade, the worker now correctly
enumerates all forum topics in MPE (1,086 of them) — a huge win. But a
data-shape mismatch was about to bite us: TDLib changed how the
General topic is identified.

  TDLib 1.8.50: info.message_thread_id = 1048576  (magic constant)
  TDLib 1.8.64: info.forum_topic_id    = 1

Existing topic_progress rows for General carry topicId=1048576. The
worker looks up progress via `topicProgressList.find(tp => tp.topicId === topic.topicId)`,
which fails for General under the new TDLib → progress becomes null →
the scan starts from message 0.

For MPE specifically, that means re-scanning all ~378k General-topic
messages. Dedup catches the previously-ingested ones (no double upload),
but it burns hours of bandwidth before the watermark catches up.

Fix: when topicId lookup misses for a topic named "General", fall back
to a name match. The first watermark write after that saves under the
new ID (1), so future runs hit the topicId match directly without the
fallback. The orphaned 1048576 row stays as harmless dead data — we
don't delete it in case a TDLib downgrade or revert ever happens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 21:43:39 +02:00
e8daabd28d fix(tdlib): handle 1.8.64 renames in searchChatMessages + message reply_to
All checks were successful
continuous-integration/drone/push Build is passing
Audit of every TDLib call site against the live 1.8.64 schema in
node_modules/@prebuilt-tdlib/types/tdlib-types.d.ts surfaced three
additional silent breakages beyond the getForumTopics fix in 106700b.

1. searchChatMessages parameter restructure
   The top-level `message_thread_id` and `saved_messages_topic_id`
   request fields were collapsed into a single tagged-union
   `topic_id: MessageTopic$Input`. Three call sites affected:

   - topics.ts getTopicMessages — was passing message_thread_id, now
     sends topic_id with the messageTopicForum variant carrying
     forum_topic_id. Without this the topic scan returns the whole
     channel (or nothing) instead of just the topic.
   - download.ts getChannelMessages — used to pass message_thread_id: 0;
     just omit the topic_id field entirely for a flat scan.
   - rebuild.ts — same treatment.

2. message.reply_to_message_id replaced with reply_to tagged union
   On incoming messages, the flat `reply_to_message_id` field was
   replaced with `reply_to: MessageReplyTo` (messageReplyToMessage or
   messageReplyToStory). Our reply-chain grouping needs the message-ID
   case.

   Added extractReplyToMessageId() that reads both old and new shapes
   so a transition build or future downgrade still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 16:45:06 +02:00
106700b13f fix(topics): handle TDLib 1.8.64 renamed forum-topic fields
After the TDLib upgrade in 18a0efb, getForumTopicList returned 0 topics
for every forum channel. Confirmed in production logs:

  "title":"Model Printing Emporium","topicCount":0
  "title":"GB_Butler_Bot2","topicCount":0
  "title":"Darnascus 2 : Flamigos Miniatures","topicCount":0

Cycle results: messagesScanned=0, zipsFound=0 — main account's entire
ingestion pipeline was a no-op because all source channels are forums.

Root cause: TDLib 1.8.64 renamed three fields without bumping the
breaking-change indicator we'd notice:

  Request  offset_message_thread_id           → offset_forum_topic_id
  Response next_offset_message_thread_id      → next_offset_forum_topic_id
  Response topics[].info.message_thread_id    → topics[].info.forum_topic_id

The old field names became no-ops in the new TDLib, so every request
came back with an empty topic list and the "stuck pagination" detection
correctly bailed out.

Fix: send the new field name on the request side, read both old and
new names on the response side (so a future TDLib version change in
either direction stays handled).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 16:18:08 +02:00
04effed825 feat(verify): pre-upload integrity test, post-upload read-back, batched recovery
All checks were successful
continuous-integration/drone/push Build is passing
Three independent verification improvements landing together.

1. Pre-upload archive integrity test (testArchiveIntegrity)
   Before sending an archive to the destination channel, runs the
   appropriate CLI test:
     - unzip -t   for ZIP
     - unrar t    for RAR
     - 7z t       for SEVEN_Z
   Catches truncated downloads, internal CRC errors, bad central
   directories, and password-protected archives BEFORE we burn upload
   bandwidth on a file that can't be extracted. Encrypted archives are
   specifically flagged so the SkippedPackage error message is clear.

2. Post-upload destination read-back
   updateMessageSendSucceeded tells us Telegram accepted the upload,
   but says nothing about whether the destination message actually
   contains the file we sent. After each successful upload, getMessage
   each destMessageId and confirm document.size matches uploadPaths[i]'s
   on-disk size.

   Mismatches don't abort ingestion — they surface as
   HASH_MISMATCH / UPLOAD_FAILED SystemNotifications so the admin can
   see them in the UI and decide whether to recover.

3. Batched recovery (verifyMessagesBatch)
   recoverIncompleteUploads previously called getMessage (singular)
   per Package — at 20k packages that's 20k round-trips. Switched to
   TDLib's getMessages (plural) with batch size 100 → 200 round-trips.
   On 20k packages this is ~100x faster.

   Per-message fallback if a whole batch errors out, so one bad batch
   never loses all verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:56:50 +02:00
c4d9be83bd feat(worker): auto-tag packages with the slicer(s) their files target
Indexes 86k+ Lychee Slicer (.lys/.lyt), 23k+ ChituBox (.chitubox/.ctb/
.cbddlp), 1k+ Anycubic (.photon/.pwmo/.pwmx), and Bambu (.3mf)
slicer-specific files. Until now they were just generic extensions in
PackageFile.

After this commit:
  - Newly-ingested packages get tags derived from their file list
    ("lychee", "chitubox", "anycubic", "bambu", "fdm", "mango")
  - The `backfill_filelists` listener also applies tags to re-indexed
    packages
  - A new pure-DB listener `backfill_slicer_tags` walks existing
    Packages with file lists and applies tags retroactively — no
    downloads, no TDLib, takes seconds for thousands of rows.

Trigger the one-shot retroactive backfill with:
  SELECT pg_notify('backfill_slicer_tags', '{"limit":5000}');

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:53:18 +02:00
7d39a13310 feat(worker): use TDLib remote.unique_id as zero-false-positive dedup signal
The fileName + size repost detection from ff4e150 works but has a
theoretical false-positive: two unrelated files in the same channel
with identical names and identical total sizes get treated as duplicates.

TDLib's document.remote.unique_id is a stable identifier per file
content — every repost of the exact same file across messages keeps
the same unique_id. Using it as the first dedup check eliminates the
false-positive risk entirely.

Schema:
  - Package.remoteUniqueId (nullable, since existing rows lack it)
  - Index on (sourceChannelId, remoteUniqueId)

Pipeline:
  1. Capture remoteUniqueId in getChannelMessages + getTopicMessages
  2. Pass through TelegramMessage type
  3. processOneArchiveSet checks findPackageByRemoteUniqueId FIRST
     (before packageExistsBySourceMessage / findRepostedPackage)
  4. createPackageStub stores it on the new Package row

Existing 19,952 Packages have remoteUniqueId = NULL — they fall through
to the existing checks (source-msg-id, name+size, content-hash). New
ingestions populate it and benefit from the strong signal immediately.
Old Packages get backfilled organically when their content is
re-encountered and a new Package would otherwise be created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:24 +02:00
18a0efb3d4 chore(tdlib): upgrade tdl 8.0.0 → 8.1.0 and prebuilt-tdlib 1.8.50 → 1.8.64
12 versions of TDLib bug fixes, performance improvements, and stricter
type definitions in @prebuilt-tdlib/types.

Two API breakages handled:

1. `getChatFolders` (plural) was removed — folder IDs now arrive via
   the `updateChatFolders` update event. Replaced the synchronous call
   with a 200ms event listener; if no folders arrive, we proceed with
   just main + archive lists. Chats inside folders are still reachable
   from chatListMain so this isn't a functional regression.

2. The new tdl `Client.invoke` signature requires a literal `_` field
   and rejects `Record<string, any>` shapes. Our `invokeWithTimeout`
   wrapper is intentionally generic — cast through `any` at the call
   site with a comment explaining why.

Both worker and bot type-check + build cleanly with the new versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:45:44 +02:00
2ccc9820cd fix(recovery): distinguish 'message gone' from 'TDLib couldn't tell us'
The old verifyMessageExists returned a bare boolean. Any error other
than HTTP 404 was treated as "exists" — meaning a TDLib connection
problem or transient TG hiccup at recovery time caused the worker to
declare "all destination messages verified" when it had actually
verified nothing.

Replaced with a discriminated VerifyResult:
  - exists         — message present and is a document, keep Package
  - deleted        — TG confirms it's gone (404 / MESSAGE_ID_INVALID /
                     "Message not found"), reset Package for re-upload
  - wrong-content  — message exists but isn't messageDocument, reset
  - unknown        — TDLib threw a non-404 error; do NOT reset, retry
                     next startup

Recovery summary now reports all four counts and switches to a
non-success message when unknownCount > 0, so a degraded TDLib run
doesn't hide behind a green log line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:43:09 +02:00
c72b5a4b48 feat(worker): add backfill_filelists pg_notify listener
Companion to 0bdd4ba (RAR parser fix). 4,380 RAR packages and ~450
ZIP/7Z packages in the DB have fileCount=0 because of the old broken
parser (and a handful of edge cases). This adds an on-demand backfill
that re-indexes their file lists.

Triggered by:
  SELECT pg_notify('backfill_filelists', '{"limit":50,"archiveType":"RAR"}');

Both payload fields are optional. archiveType filters to ZIP/RAR/SEVEN_Z;
default limit is 100. Multiple notifications queue sequentially so
TDLib downloads don't compete for the per-account mutex.

For each candidate:
  1. Resolve destChannel.telegramId from the Package
  2. getMessage for each destMessageId in destMessageIds[] (handles
     multipart) to recover the file_id from Telegram
  3. downloadFile (uses TDLib cache when available — most are fast)
  4. Run readZipCentralDirectory / readRarContents / read7zContents
  5. Transactionally replace PackageFile rows + update fileCount

Re-check of fileCount inside the transaction ensures a concurrent
backfill from another worker (or a fresh ingestion of the same archive)
doesn't get clobbered.

Prefers the Premium account when both are linked, for faster downloads
and to avoid the speed-limit throttling on the secondary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:42:01 +02:00
0bdd4ba0cc fix(rar-reader): use unrar lt (technical) so file listings actually work
Diagnosed from production: all 4,380 RAR packages in the database have
fileCount = 0. The old parser used \`unrar l -v\` and a regex that
expected an 8-column \`Attributes Size Packed Ratio% Date Time CRC32 Name\`
output. unrar 6.21's actual \`l -v\` output is 5 columns: \`Attributes
Size Date Time Name\` — no Packed, no Ratio, no CRC32. So every RAR
silently parsed to zero entries.

Switch to \`unrar lt\` (list technical), which emits one block per file
with key:value lines:

         Name: Lost Kingdom 2023 01 January/Nagas/NagaCaptainBody.stl
         Type: File
         Size: 22503584
  Packed size: 21430123
         CRC32: A1B2C3D4
         ...

The new parser tokenizes blocks on blank lines and matches "key: value"
lines per block. Handles multi-word keys ("Packed size", "Host OS") and
gracefully skips Directory entries and the archive header block. Also
tolerates BLAKE2sp checksums for newer RAR archives.

Verified against a live 644MB RAR with 201 entries (194 files, 7 dirs);
parser returns 194 entries with correct paths, sizes, and CRC32s.

Future RAR ingestions will populate fileCount and PackageFile rows
correctly. Backfilling existing 4,380 packages requires a separate
pass — added in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:38:46 +02:00
901f32ff41 feat(worker): retry old SkippedPackages + prefer specific topics over General
Three connected safeguards driven by user feedback after deploying the
incremental watermark and repost-detection fixes.

1. SkippedPackage retry pass (watermark pull-back)
   The auto-retry chain (d99a506 + watermark cap) only works for failures
   that occur AFTER the fix is deployed. Pre-existing SkippedPackages may
   sit below the current watermark — example from prod: secondary's
   "Turnbase Delivery Folder.7z" at msgId 37,109,104,640 vs watermark
   37,111,201,792. The auto-retry never sees it.

   Before scanning each channel/topic, we now query SkippedPackages with
   attemptCount < cap for that scope and pull the watermark back to
   (lowestSkippedMsgId - 1n) when needed. Both forum and non-forum
   branches handle this.

2. Topic scan order: specific topics first, General last
   In forum channels, files often appear in both a specific topic (e.g.,
   "Artisan Guild January 2022") AND in General. The first encounter
   created the Package and locked in the topic context. If we happened
   to scan General first, the Package recorded the less-informative
   topic.

   We now sort topics so General is processed last. New Packages get
   the more specific topic name as their context by default.

3. Backfill specific topic on existing Packages
   For Packages that were already created with General topic context,
   when findRepostedPackage matches and the current scan is in a more
   specific topic, update the existing Package's sourceTopicId (and
   creator, if it was derived from "General") to the more specific one.
   Audit log shows both old and new topic IDs.

The findRepostedPackage query also got an ORDER BY so it returns the
most-specific existing match (non-null sourceTopicId first) when
multiple Packages share the same filename + size in a channel — giving
the audit log richer context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 09:02:54 +02:00
ff4e150544 fix: skip download when the same file was already uploaded from this channel
Diagnosed from production: in 8 hours of main's current run, zero
uploads happened despite the worker being busy 100% of the time. Logs
showed continuous "Downloading archive part" entries with no
corresponding upload activity.

Root cause: the source channel ("Model Printing Emporium") frequently
reposts the same file at new Telegram message IDs. Concrete example
from the DB:
  - "(EN) PaintGuides All.zip"  → present 6 times, msgIds 44B → 92B
  - "00 Welcome Pack.7z"        → present 2 times, msgIds 91B and 177B
  - "FanteZi April 2022-...zip" → uploaded May 8 at msgId 24,697,110,528;
                                  current run re-downloading at 87,488,987,136

packageExistsBySourceMessage(channelId, msgId) correctly misses because
the msgId is different. We download the (potentially gigabyte-sized)
file, hash it, then packageExistsByHash hits and we discard the
download. ~30 seconds wasted per repost x thousands of reposts = whole
runs spent uploading nothing.

Fix: add findRepostedPackage(sourceChannelId, fileName, fileSize) — a
pre-download check that catches reposts by the strong (channel + name
+ total size) signal. On hit, skip the set entirely. Watermark
advances normally (no minFailedId tracking) so the next cycle sees
the channel as caught up.

False-positive risk: two unrelated files in the same channel with
identical name AND identical total fileSize. Extremely rare in
practice; if it ever happens, the new file is silently treated as a
duplicate. Logged at info level with the existing Package ID and dest
message ID so the user can audit if a file is mysteriously missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:54:20 +02:00
77aeb4cc00 fix: advance channel/topic watermark incrementally per successful set
All checks were successful
continuous-integration/drone/push Build is passing
Diagnosed from production logs for the main (Premium) account:

  RUNNING   2026-05-21 → in progress, 22h     ingested: 0
  FAILED    2026-05-14 → 2026-05-21 (7.4d)    ingested: 5,426 (killed by restart)
  FAILED    2026-05-06 → 2026-05-14 (7.7d)    ingested: 8,300 (killed by restart)

Main's two source channels have 378k+ messages each. A full scan takes
days, but the worker gets restarted (container update, cycle timeout,
etc.) every few days. updateLastProcessedMessage was only called at the
END of a channel's scan — so the watermark on AccountChannelMap stayed
NULL through restart after restart, and every new run re-scanned from
message 0.

That explains the user's symptom: "main wasn't uploading although it
said it did". The dashboard showed currentStep alternating through
downloading / hashing / deduplicating, but zipsIngested stayed at 0
because every archive the run encountered was already a hash-duplicate
of something uploaded by a previous run.

Fix: processArchiveSets now accepts an onWatermarkAdvance callback.
After each successful set (ingested OR confirmed duplicate), the callback
fires with a watermark capped below the current minFailedId. Both call
sites (forum/topic and non-forum) wire it to upsertTopicProgress /
updateLastProcessedMessage. The end-of-scan write is retained for the
no-archives and all-failures-with-fallback cases.

Worst-case progress loss on restart now is one in-flight archive set,
not the entire scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:20:20 +02:00
3b327eb3f3 feat(app): show attempt count column on the Skipped Packages tab
attemptCount goes through SkippedPackageItem and SkippedRow into a new
column on the data table. Badge color cues:
  - outline (1)        first failure, will auto-retry next cycle
  - secondary (2-4)    has retried but still below cap
  - destructive (>=5)  hit the cap; will not auto-retry until reset

The "Skipped" column is renamed to "Last Skipped" since the timestamp
now reflects the most recent attempt, not the first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:08:08 +02:00
379bf246cd feat(worker): per-account safeguards for second-account upload failures
Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.

1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
   processArchiveSets now filters out SkippedPackage rows whose
   attemptCount has reached the cap. Removing them from the working
   list means they are not tracked in minFailedId, so the watermark
   cap from d99a506 does not pin progress below them anymore. A bad
   file no longer blocks the rest of the channel forever; the user
   can manually retry via the UI to reset the count.

2. Account phone in error messages
   Every SkippedPackage row and SystemNotification produced from a
   failure is now prefixed with [<phone>] in errorMessage / message,
   and the JSON context includes accountPhone. When two accounts
   share a source channel and only one is failing, the UI tells you
   which one.

3. Explicit getChat for destination at run start
   loadChats only loads main/archive/folder chat lists. If an account
   archived or moved the destination chat, sendMessage failed silently
   per-archive. Now we getChat the destination once per cycle; on
   failure we record a SystemNotification and skip the account's
   entire ingestion cycle (no point downloading what we can't upload).

4. Retry on transient Telegram server errors
   The "Turnbase Delivery Folder.7z" failure on the secondary and
   "10. Kingdom of the Depth.part1.rar" on the main were both
   "Internal Server Error during file upload" — a TG-side hiccup, not
   a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
   with linear backoff (15s, 30s, 45s + jitter) before giving up.

5. Channel-access-lost notification
   "Iridium 2 w/ Add-ons [Completed]" has been throwing
   "Can't access the chat" every cycle for the secondary. The worker
   now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
   24h per channel/account) so the admin sees it and can re-join or
   unlink the channel instead of just losing visibility into the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:07:57 +02:00
7a79b52baf feat(db): add attemptCount on SkippedPackage + CHANNEL_ACCESS_LOST enum
attemptCount tracks how many times the worker has tried each failed
source message. Combined with WORKER_MAX_SKIP_ATTEMPTS (default 5), the
worker will auto-retry across cycles but eventually let the watermark
advance past a chronically failing file so cycles aren't pinned forever.
The SkippedPackage row stays so the user can manually retry via the UI.

CHANNEL_ACCESS_LOST is a new notification type the worker emits when a
source channel becomes inaccessible (account got removed, channel
deleted, etc.) — surfaces the issue instead of silently failing every
cycle as we've been doing with "Iridium 2 w/ Add-ons [Completed]".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:07:40 +02:00
26e2cba69d fix: buffer upload confirmation events to close tempMsgId race
sendMessage resolves with the temporary message ID inside a .then()
microtask. If TDLib emits updateMessageSendSucceeded synchronously
(cached file, already-known media), the event handler fires while
tempMsgId is still null — the success is dropped and the promise hangs
until the 15-min upload timeout fires.

Buffer success/failure events that arrive before tempMsgId is known,
then replay them in the .then() callback once tempMsgId is set.
Extract completeWithSuccess / completeWithFailure helpers so the
resolution path is shared between live events and replayed events.

This race matters more now that stalls fail fast — without the buffer,
a fast-completing upload could still hang for 15 min before recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:48:38 +02:00
84cc8d995b fix: fail fast on upload stall instead of retrying on broken client
Previously a single TDLib event-stream degradation cost ~45 minutes
per archive: 3 retries x 15-min minimum timeout, all on the same
broken client. The retries had no chance of succeeding because the
underlying issue (missing updateMessageSendSucceeded events) is a
client-level problem, not a transient send failure.

Now the first stall throws UploadStallError immediately. The caller
in processArchiveSets already recreates the TDLib client on
UploadStallError, so we drop from ~45 min recovery to ~15 min
(one timeout cycle) per stalled archive.

The stalled set is recorded in SkippedPackage; with the watermark
cap from d99a506 it gets retried on the next ingestion cycle with
a fresh client.

FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle
legitimate rate limiting, not stalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:47:08 +02:00
d99a506b10 fix: cap watermark below failed sets so failures retry next cycle
Previously the channel/topic watermark could advance past failed
archive sets in two ways:

1. A later successful set raised maxProcessedId past a failed earlier
   set within the same scan.
2. scanResult.maxScannedMessageId was used as fallback even when
   archives in the scan had failed (added in 77c26ad to prevent
   re-scanning empty channels).

Both paths buried failed archives below the watermark on the next
cycle — they sat permanently in SkippedPackage with no auto-recovery.

Now processArchiveSets returns the lowest failed source message ID
alongside the highest processed one. The caller caps the watermark at
(minFailedId - 1n) so the next scan re-includes the failed messages
and processOneArchiveSet retries them. Successful sets above the
failure boundary are not re-uploaded — packageExistsBySourceMessage
early-skips them on the second pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:46:26 +02:00
59038889ae fix: prevent pool exhaustion that caused 4-hour duplicate check stall
All checks were successful
continuous-integration/drone/push Build is passing
The pg pool had max=5 connections shared between Prisma operations and
advisory locks. With 2 account locks held permanently and hash locks
from timed-out (but still running) background work, pool.connect()
would block forever — causing the Turnbase.7z stall.

- Increase pool max from 5 to 15 for headroom
- Add 30s connectionTimeoutMillis so pool.connect() throws instead of
  hanging forever when the pool is exhausted
- On startup, terminate zombie PostgreSQL sessions from previous worker
  instances that hold stale advisory locks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:39:00 +02:00
77c26adb31 perf: set watermarks even when no archives found to prevent re-scanning
All checks were successful
continuous-integration/drone/push Build is passing
Previously, channels/topics with no new archives never had their
watermark updated. This meant every cycle re-scanned all messages from
scratch just to discover nothing new — especially costly for the 1079-
topic Model Printing Emporium forum.

- Add maxScannedMessageId to ChannelScanResult (highest msg ID seen)
- Set channel watermark to scan boundary when no archives are found
- Set topic watermark to scan boundary when no archives are found
- Fall back to scan watermark when archive processing doesn't advance it

After one full cycle, subsequent cycles will skip already-scanned
messages via the early-exit boundary check, dramatically reducing
TDLib API calls on channels with mostly non-archive content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 20:37:42 +02:00
35cce3151c perf: early-exit channel scan when all messages are below watermark
searchChatMessages returns newest-first. Once the oldest message on a
page is at or below the lastProcessedMessageId boundary, all remaining
pages are even older. Stop scanning immediately instead of reading every
message in the channel.

This was already implemented for topic scans but missing from channel
scans. On a test run, total messages scanned dropped from 3805 to 1615
(57% reduction) for an account with no new archives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 19:58:30 +02:00
d6c82ede1e fix: auto-recover from TDLib upload stalls by recreating client
When TDLib's event stream degrades, uploads complete (bytes sent) but
confirmations never arrive. Previously the worker retried 3x with the
same broken client, wasting 60+ min per archive and holding the mutex.

- Add UploadStallError class to distinguish stalls from other failures
- Reduce stall detection timeout from 5min to 3min (faster detection)
- Recreate TDLib client after consecutive upload stalls instead of
  retrying on the same degraded connection
- Add forceReleaseMutex() to prevent cascade failures when one account
  blocks others via stuck mutex after cycle timeout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 18:02:42 +02:00
7e48131f67 fix: clear timeout on race settlement to prevent orphaned timers
All checks were successful
continuous-integration/drone/push Build is passing
2026-05-02 23:44:18 +02:00
a79cb4749b fix: use per-account mutex keys in fetch/extract listeners, add cycle timeout and error logging 2026-05-02 23:40:37 +02:00
e9017fc518 feat: parallel account ingestion via per-key TDLib mutex 2026-05-02 23:31:02 +02:00
4f59d19ac2 feat: apply per-account Premium 4GB upload limit to bypass repacking 2026-05-02 23:28:00 +02:00
579276ee2d fix: widen hash lock try/finally to prevent lock leak on error paths 2026-05-02 23:24:08 +02:00
b48cc510a4 feat: add two-phase DB write and hash advisory lock to prevent double-uploads 2026-05-02 23:13:55 +02:00
614c8e5b74 feat: add createPackageStub and updatePackageWithMetadata for two-phase DB write 2026-05-02 23:06:17 +02:00
3019c23f70 feat: add per-content-hash advisory lock to prevent concurrent duplicate uploads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:04:43 +02:00
436a576085 feat: detect and persist Telegram Premium status after authentication
After TDLib login completes, calls getMe() to detect isPremium, persists
it to DB via updateAccountPremiumStatus, and returns { client, isPremium }
from createTdlibClient. All callers updated to destructure accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:02:46 +02:00
f454303352 feat: add isPremium field to TelegramAccount
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:58:53 +02:00
e29bd79d66 chore: ignore .worktrees directory 2026-05-02 22:54:56 +02:00
61e61d0085 docs: add worker improvements implementation plan
7-task plan covering double-upload fix (hash lock + two-phase write),
parallel account ingestion (per-key mutex), and Premium 4GB upload
limit with automatic detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:47:52 +02:00
925d916a3c Merge branch 'main' of https://github.com/xCyanGrizzly/DragonsStash 2026-05-02 22:38:32 +02:00
27bacaf24c docs: add worker improvements design spec
Covers double-upload fix (two-phase DB write + hash advisory lock),
parallel account processing (remove TDLib mutex), and per-account
Premium 4GB upload limit with automatic is_premium detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:35:27 +02:00
be4daf950b fix: correct User table reference in manual_uploads migration
All checks were successful
continuous-integration/drone/push Build is passing
The FK referenced "users" but the actual table is "User" (no @@map in Prisma schema).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 21:29:55 +02:00
af7094637d feat: file upload from UI, notification dismiss, audit false positive fix
Manual file upload:
- Upload dialog in STL page with drag-and-drop file picker
- Files saved to shared Docker volume (/data/uploads)
- Worker processes via pg_notify('manual_upload') channel
- Hashes, reads metadata, splits >2GB, uploads to Telegram
- Multiple files automatically grouped
- Status polling shows upload/processing/complete states

Notification fixes:
- Add dismiss (X) button on each notification
- Add "Clear" button to remove all notifications
- Fix false positive MISSING_PART alerts from legacy packages
  (only flag when >1 destMessageIds stored but count wrong,
  not when only 1 ID from backfill)

Infrastructure:
- ManualUpload + ManualUploadFile schema + migration
- Shared manual_uploads Docker volume between app and worker
- Upload API routes (POST /api/uploads, GET /api/uploads/[id])
- Worker manual-upload processor with full pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:26:06 +02:00
f4aa9d9a2f feat: complete remaining features — training, FTS, bot groups, repair, re-tag
All checks were successful
continuous-integration/drone/push Build is passing
Manual override training (GroupingRule):
- Learn patterns from manual group creation (common filename prefix or creator)
- Apply learned rules as first auto-grouping pass (highest confidence after albums)
- GroupingRule model stores pattern, channel, signal type, confidence

Hash verification after upload:
- Re-hash upload files on disk before indexing to catch disk corruption
- Creates HASH_MISMATCH notification on discrepancy

Grouping conflict detection:
- After all grouping passes, check if grouped packages match rules from different groups
- Creates GROUPING_CONFLICT notification for manual review

Per-channel grouping flags:
- Add autoGroupEnabled boolean to TelegramChannel (default true)
- Auto-grouping passes (all except album) gated behind this flag
- Album grouping always runs as it reflects Telegram's native behavior

Full-text search (tsvector):
- Add searchVector tsvector column with GIN index and auto-update trigger
- Backfill 1870 existing packages
- FTS with ts_rank for ranked results, ILIKE fallback for short/failed queries
- Applied to both web app and bot search

Bot group awareness:
- /group <query> — view group info or search groups by name
- /sendgroup <id> — send all packages in a group to linked Telegram account

Bulk repair:
- repairPackageAction clears dest info and resets watermark for re-processing
- Repair button in notification bell for MISSING_PART and HASH_MISMATCH alerts
- /api/notifications/repair endpoint

Retroactive category re-tagging:
- When channel category changes, auto-update tags on all existing packages
- Removes old category tag, adds new one

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:34:14 +02:00
7f9a03d4ee feat: group merge, ZIP/reply/caption grouping, integrity audit
Group merge UI:
- Add mergeGroups query and mergeGroupsAction server action
- Add "Start Merge" / "Merge Here" buttons to group row actions
- Two-step UX: click Start on source, click Merge Here on target

ZIP path prefix grouping (Signal 7):
- Compare PackageFile.path root folders across ungrouped packages
- Auto-group if 2+ packages share the same dominant root folder

Reply chain grouping (Signal 6):
- Capture reply_to_message_id during channel scanning
- Group archives that reply to the same root message
- Add replyToMessageId field to Package schema

Caption fuzzy match grouping (Signal 8):
- Capture source caption during channel scanning
- Normalize captions (strip extensions, extract significant words)
- Group packages with matching normalized caption keys
- Add sourceCaption field to Package schema

Periodic integrity audit:
- Check multipart packages for completeness (parts vs destMessageIds)
- Detect orphaned indexes (destChannelId set but no destMessageId)
- Runs after each ingestion cycle, deduplicates notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:19:36 +02:00
2c46ab0843 feat: pattern/creator grouping, notification UI, failure alerts
Pattern grouping (Signal 3):
- Extract YYYY-MM dates, month names, and project prefixes from filenames
- Auto-group packages sharing the same pattern within a channel
- Groups created with groupingSource=AUTO_PATTERN

Creator grouping (Signal 4):
- Auto-group 3+ ungrouped packages from the same creator within a channel
- Runs after pattern grouping as lowest-priority automatic signal

Notification UI:
- Add NotificationBell component to header with unread badge
- Popover panel shows recent notifications with severity icons
- Mark individual or all notifications as read
- Polls every 30 seconds for updates

Failure notifications:
- Upload/download failures now create SystemNotification records
- Visible in the notification bell alongside hash mismatch alerts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:43:55 +02:00
9e78cc5d19 feat: grouping phase 1 — schema, ungrouped tab, time-window grouping, hash verification
Schema:
- Add GroupingSource enum (ALBUM, MANUAL, AUTO_TIME, AUTO_PATTERN, etc.)
- Add groupingSource field to PackageGroup with backfill
- Add SystemNotification model for persistent alerts
- Add NotificationType and NotificationSeverity enums

Ungrouped staging tab:
- Add listUngroupedPackages/countUngroupedPackages queries
- Add "Ungrouped" tab to STL page showing packages without a group

Time-window auto-grouping:
- After album grouping, cluster ungrouped packages within configurable
  time window (default 5 min, AUTO_GROUP_TIME_WINDOW_MINUTES env var)
- Groups named from common filename prefix
- Groups created with groupingSource=AUTO_TIME

Hash verification after split:
- Re-hash split parts and compare to original contentHash
- Log error and create SystemNotification on mismatch
- Prevents silently corrupted split uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:00:27 +02:00
194c87a256 fix: raise size limit and make MAX_PART_SIZE configurable
All checks were successful
continuous-integration/drone/push Build is passing
- Raise WORKER_MAX_ZIP_SIZE_MB from 4GB to 200GB (production .env)
- Make MAX_PART_SIZE configurable via MAX_PART_SIZE_MB env var
  (default 1950 MiB, set to 3900 for Premium accounts)
- Remove hardcoded 1950 MiB constants in split.ts and worker.ts
- Add grouping system audit report with real-world failure cases

10 archives were blocked by the 4GB limit (up to 70.5GB).
They will be retried on next ingestion cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:41:37 +02:00
718007446f feat: fix multi-part archive forwarding and add kickstarter package linking
All checks were successful
continuous-integration/drone/push Build is passing
Multi-part send fix:
- Add destMessageIds BigInt[] to Package schema with backfill migration
- Worker uploadToChannel now returns all message IDs, stored in DB
- Bot forwards all parts of multi-part archives (not just the first)
- Add retry logic for upload rate limits (429) and download stalls

Kickstarter package linking:
- Add package search/linking queries and API routes
- Add PackageLinkerDialog with search + checkbox selection
- Add "Link Packages" and "Send All" actions to kickstarter table
- Add sendAllKickstarterPackages server action

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:11:35 +01:00
77 changed files with 10501 additions and 471 deletions

1
.gitignore vendored
View File

@@ -54,3 +54,4 @@ src/generated
# temp files
nul
tmpclaude-*
.worktrees/

94
bot/package-lock.json generated
View File

@@ -12,8 +12,8 @@
"@prisma/client": "^7.4.0",
"pg": "^8.18.0",
"pino": "^9.6.0",
"prebuilt-tdlib": "^0.1008050.0",
"tdl": "^8.0.0"
"prebuilt-tdlib": "^0.1008064.0",
"tdl": "^8.1.0"
},
"devDependencies": {
"@types/node": "^20",
@@ -566,9 +566,9 @@
"license": "MIT"
},
"node_modules/@prebuilt-tdlib/darwin-arm64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-arm64/-/darwin-arm64-0.1008050.0.tgz",
"integrity": "sha512-XrWN7M1gfvnzOBRX0YdXVfhSxIDSs/ZJ16QJ0ILDKe+grOFl/cfl7lwB/hK/MlHC6Rev56f5X7xaWnjMh0vktQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-arm64/-/darwin-arm64-0.1008064.0.tgz",
"integrity": "sha512-Oq5us+o0g68Jag74RIV3LdLkZxQxJMcOdrVbgmyE7Unk+WcifqTb/gZw1rS6BrW+2SX2LNeGY4zQqqBTNDr17Q==",
"cpu": [
"arm64"
],
@@ -579,9 +579,9 @@
]
},
"node_modules/@prebuilt-tdlib/darwin-x64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-x64/-/darwin-x64-0.1008050.0.tgz",
"integrity": "sha512-a1UfBW0lYx4tUy5viMPtsbqBfBncCAgDu3FPjljfYTHjP8wfkKFxpp5+8wdxhyqdy3QriWaipVtUXQgOeEWMJg==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-x64/-/darwin-x64-0.1008064.0.tgz",
"integrity": "sha512-Pz11xjET2Y3uUJKxkWKBc0dmOtlykmBdZ9D6Ahh+EsoLDLIWHm7M91p6nZT396YZ4n2BL+FtDYK65Ae3LDIA5g==",
"cpu": [
"x64"
],
@@ -592,9 +592,22 @@
]
},
"node_modules/@prebuilt-tdlib/linux-arm64-glibc": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-glibc/-/linux-arm64-glibc-0.1008050.0.tgz",
"integrity": "sha512-HRGspdQYzaBkU+W2M8uY5OgOkmgfTkyHkTYan/dn7EE/38QdIFW0YTvmGrl3DoFV2PA+SeJQw0xqK8tMSyHKaA==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-glibc/-/linux-arm64-glibc-0.1008064.0.tgz",
"integrity": "sha512-1kML9+RCfTOTWLzxq2klCN962/XwYhd+SGd4BxOwcmvPniYDNRUXtgMi3qRyV/Flola8dchGFrqZJU4kNZNLuQ==",
"cpu": [
"arm64"
],
"license": "0BSD",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@prebuilt-tdlib/linux-arm64-musl": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-musl/-/linux-arm64-musl-0.1008064.0.tgz",
"integrity": "sha512-tN9FJOR8VDfmOoTHMivAqBfQ/d9Bry9T/9cGSTcms3H4ORun/WO5U5zT8VqadAsqjuiQ8Y9HaUqqz65xBDtcgw==",
"cpu": [
"arm64"
],
@@ -605,9 +618,9 @@
]
},
"node_modules/@prebuilt-tdlib/linux-x64-glibc": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-glibc/-/linux-x64-glibc-0.1008050.0.tgz",
"integrity": "sha512-Yf6ve3Dzxc66kV1cijFLn7EXKhPN5YHTjtJABEaCR5euetCI2wZp/1uBsXvyYTuFXqQbMfjO3xUCXUIBhLoChw==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-glibc/-/linux-x64-glibc-0.1008064.0.tgz",
"integrity": "sha512-7fyCp2uk0BdeHKJ9PyQOCditC9vBXeeIjYPAKKBcrkum5bi1e9txy2g5kkGjqwUkN0ntIniS5QfHEyr17Idr9g==",
"cpu": [
"x64"
],
@@ -617,10 +630,30 @@
"linux"
]
},
"node_modules/@prebuilt-tdlib/linux-x64-musl": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-musl/-/linux-x64-musl-0.1008064.0.tgz",
"integrity": "sha512-e2zRucrRrrK6M04iQWMfwtrts+VvVtyUwtTP1hF2g3a6jW+AHMzFoB9Wu8fWr+vuJflLIQ6sG9r3lI07Q8NenQ==",
"cpu": [
"x64"
],
"license": "0BSD",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@prebuilt-tdlib/types": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/types/-/types-0.1008064.0.tgz",
"integrity": "sha512-eqr1+fiHZ+Gj4lwcITzMp6FwPg8UrxlxxaFjhiJRHL9BlbmD2QkCRHac4wW1Sx8Dzwzd7f+xO21Pgi7TBRSwmw==",
"license": "0BSD",
"optional": true
},
"node_modules/@prebuilt-tdlib/win32-x64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/win32-x64/-/win32-x64-0.1008050.0.tgz",
"integrity": "sha512-4v8tU5bodMcLhzrWWXzIzqdHBIpq0wim+7sDmQWQIMy3kDeIzVtpuM+vQjxrGoeH9oWr2WXSRKuj93ld7G5NbQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/win32-x64/-/win32-x64-0.1008064.0.tgz",
"integrity": "sha512-rkacZWexQw52/EUaLAmbsu2+P3C1/AtinlCjfiX07oQAEg3327BCEZqrcY0ER83D8+MMf2pfwMPCDJKytr4hcg==",
"cpu": [
"x64"
],
@@ -1669,16 +1702,19 @@
}
},
"node_modules/prebuilt-tdlib": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/prebuilt-tdlib/-/prebuilt-tdlib-0.1008050.0.tgz",
"integrity": "sha512-CfeQE1rG51d2iC6m72fzrbCW4mqI17ugil9pVurWHtfUJi1Fcn7zadpTzDoUl4oc1dEtKgM7S24DVP67gcl4SQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/prebuilt-tdlib/-/prebuilt-tdlib-0.1008064.0.tgz",
"integrity": "sha512-jJLowKZoH4slXYrkTkKlEgyGsIGv61AWjDZcxxVxJYu21X3kmukGwbCpk4ML99cJp2CwRsD41GCEQBkKJAwCUg==",
"license": "MIT",
"optionalDependencies": {
"@prebuilt-tdlib/darwin-arm64": "0.1008050.0",
"@prebuilt-tdlib/darwin-x64": "0.1008050.0",
"@prebuilt-tdlib/linux-arm64-glibc": "0.1008050.0",
"@prebuilt-tdlib/linux-x64-glibc": "0.1008050.0",
"@prebuilt-tdlib/win32-x64": "0.1008050.0"
"@prebuilt-tdlib/darwin-arm64": "0.1008064.0",
"@prebuilt-tdlib/darwin-x64": "0.1008064.0",
"@prebuilt-tdlib/linux-arm64-glibc": "0.1008064.0",
"@prebuilt-tdlib/linux-arm64-musl": "0.1008064.0",
"@prebuilt-tdlib/linux-x64-glibc": "0.1008064.0",
"@prebuilt-tdlib/linux-x64-musl": "0.1008064.0",
"@prebuilt-tdlib/types": "0.1008064.0",
"@prebuilt-tdlib/win32-x64": "0.1008064.0"
}
},
"node_modules/prisma": {
@@ -1971,13 +2007,13 @@
"license": "MIT"
},
"node_modules/tdl": {
"version": "8.0.2",
"resolved": "https://registry.npmjs.org/tdl/-/tdl-8.0.2.tgz",
"integrity": "sha512-KYxlJ4eao7FUu91U1dCDkaHmK70JAyZ1KqitkKqpPC7rxAiXWhaYxddWvt84UxIYoWbgdd0B70FYJ4p/YqpFCA==",
"version": "8.1.0",
"resolved": "https://registry.npmjs.org/tdl/-/tdl-8.1.0.tgz",
"integrity": "sha512-idpw60gjJdiJALQg0+6UbxtJTMxVhzZAgCO6QzL81gqBYCkEFjm9zM9HwTTQGeOaAavw4yRHymR68yUUiCoKrA==",
"hasInstallScript": true,
"license": "MIT",
"dependencies": {
"debug": "^4.4.0",
"debug": "^4.4.3",
"node-addon-api": "^7.1.1",
"node-gyp-build": "^4.8.4"
},

View File

@@ -13,8 +13,8 @@
"@prisma/client": "^7.4.0",
"pg": "^8.18.0",
"pino": "^9.6.0",
"prebuilt-tdlib": "^0.1008050.0",
"tdl": "^8.0.0"
"prebuilt-tdlib": "^0.1008064.0",
"tdl": "^8.1.0"
},
"devDependencies": {
"@types/node": "^20",

View File

@@ -10,7 +10,10 @@ import {
getSubscriptions,
addSubscription,
removeSubscription,
getGroupById,
searchGroups,
} from "./db/queries.js";
import { db } from "./db/client.js";
import { sendTextMessage, sendPhotoMessage } from "./tdlib/client.js";
const log = childLogger("commands");
@@ -78,6 +81,12 @@ export async function handleMessage(msg: IncomingMessage): Promise<void> {
case "/status":
await handleStatus(chatId, userId);
break;
case "/group":
await handleGroup(chatId, args);
break;
case "/sendgroup":
await handleSendGroup(chatId, userId, args);
break;
default:
await sendTextMessage(
chatId,
@@ -117,6 +126,8 @@ async function handleStart(
`/search &lt;query&gt; — Search packages`,
`/latest [n] — Show latest packages`,
`/package &lt;id&gt; — Package details`,
`/group &lt;id or name&gt; — View group info and package list`,
`/sendgroup &lt;id&gt; — Send all packages in a group to yourself`,
`/link &lt;code&gt; — Link your Telegram to your web account`,
`/subscribe &lt;keyword&gt; — Get notified for new packages`,
`/subscriptions — View your subscriptions`,
@@ -136,6 +147,8 @@ async function handleHelp(chatId: bigint): Promise<void> {
`/search &lt;query&gt; — Search by filename or creator`,
`/latest [n] — Show n most recent packages (default: 5)`,
`/package &lt;id&gt; — View package details and file list`,
`/group &lt;id or name&gt; — View group info and package list`,
`/sendgroup &lt;id&gt; — Send all packages in a group to yourself`,
``,
`🔗 <b>Account Linking</b>`,
`/link &lt;code&gt; — Link Telegram to your web account`,
@@ -432,6 +445,168 @@ async function handleStatus(chatId: bigint, userId: bigint): Promise<void> {
}
}
async function handleGroup(chatId: bigint, query: string): Promise<void> {
if (!query) {
await sendTextMessage(
chatId,
"Usage: /group &lt;id or name&gt;\n\nProvide a group ID (starts with 'c') or a name to search.",
"textParseModeHTML"
);
return;
}
const trimmed = query.trim();
// If it looks like a cuid (starts with 'c', ~25 chars), look up by ID directly
if (/^c[a-z0-9]{20,}$/i.test(trimmed)) {
const group = await getGroupById(trimmed);
if (!group) {
await sendTextMessage(chatId, "Group not found.", "textParseModeHTML");
return;
}
const packageLines = group.packages.slice(0, 20).map((pkg, i) => {
const size = formatSize(pkg.fileSize);
return ` ${i + 1}. <b>${escapeHtml(pkg.fileName)}</b> (${size}, ${pkg.fileCount} files) — <code>${pkg.id}</code>`;
});
const more = group.packages.length > 20
? `\n ... and ${group.packages.length - 20} more`
: "";
const response = [
`📦 <b>Group: ${escapeHtml(group.name)}</b>`,
``,
`Packages: ${group.packages.length}`,
`ID: <code>${group.id}</code>`,
``,
`<b>Contents:</b>`,
...packageLines,
more,
``,
`Use /sendgroup ${group.id} to receive all packages.`,
]
.filter((l) => l !== "")
.join("\n");
await sendTextMessage(chatId, response, "textParseModeHTML");
return;
}
// Otherwise search by name
const groups = await searchGroups(trimmed, 5);
if (groups.length === 0) {
await sendTextMessage(
chatId,
`No groups found matching "<b>${escapeHtml(trimmed)}</b>".`,
"textParseModeHTML"
);
return;
}
const lines = groups.map(
(g, i) =>
`${i + 1}. <b>${escapeHtml(g.name)}</b> — ${g._count.packages} package(s)\n ID: <code>${g.id}</code>`
);
const response = [
`🔍 <b>Groups matching "${escapeHtml(trimmed)}":</b>`,
``,
...lines,
``,
`Use /group &lt;id&gt; for full details.`,
].join("\n");
await sendTextMessage(chatId, response, "textParseModeHTML");
}
async function handleSendGroup(
chatId: bigint,
userId: bigint,
args: string
): Promise<void> {
if (!args) {
await sendTextMessage(
chatId,
"Usage: /sendgroup &lt;group-id&gt;",
"textParseModeHTML"
);
return;
}
const groupId = args.trim();
const group = await getGroupById(groupId);
if (!group) {
await sendTextMessage(chatId, "Group not found.", "textParseModeHTML");
return;
}
// Require account linking
const link = await findLinkByTelegramUserId(userId);
if (!link) {
await sendTextMessage(
chatId,
"You must link your account before receiving packages.\nUse /link &lt;code&gt; to connect.",
"textParseModeHTML"
);
return;
}
// Only send packages that have been uploaded to the destination channel
const sendable = group.packages.filter(
(pkg) => pkg.destChannelId && pkg.destMessageId
);
if (sendable.length === 0) {
await sendTextMessage(
chatId,
`No packages in group "<b>${escapeHtml(group.name)}</b>" are ready to send yet.`,
"textParseModeHTML"
);
return;
}
// Create a BotSendRequest for each sendable package
const requests = await Promise.all(
sendable.map((pkg) =>
db.botSendRequest.create({
data: {
packageId: pkg.id,
telegramLinkId: link.id,
requestedByUserId: link.userId,
status: "PENDING",
},
})
)
);
// Fire pg_notify for each request so the send listener picks them up
for (const req of requests) {
await db.$queryRawUnsafe(
`SELECT pg_notify('bot_send', $1)`,
req.id
).catch(() => {
// Best-effort — the bot also processes PENDING requests on its send queue
});
}
await sendTextMessage(
chatId,
[
`✅ <b>Queued ${requests.length} package(s) from "${escapeHtml(group.name)}"</b>`,
``,
`You'll receive each archive shortly. Use /package &lt;id&gt; to check individual packages.`,
].join("\n"),
"textParseModeHTML"
);
log.info(
{ groupId, packageCount: requests.length, userId: userId.toString() },
"Group send queued"
);
}
function escapeHtml(text: string): string {
return text
.replace(/&/g, "&amp;")

View File

@@ -53,7 +53,52 @@ export async function createTelegramLink(
// ── Package search ──
export async function searchPackages(query: string, limit = 10) {
const packages = await db.package.findMany({
// Try full-text search first
if (query.length >= 3) {
const tsQuery = query
.trim()
.split(/\s+/)
.filter((w) => w.length >= 2)
.map((w) => w.replace(/[^a-zA-Z0-9]/g, ""))
.filter(Boolean)
.join(" & ");
if (tsQuery) {
try {
const ftsResults = await db.$queryRawUnsafe<{ id: string }[]>(
`SELECT id FROM packages
WHERE "searchVector" @@ to_tsquery('english', $1)
ORDER BY ts_rank("searchVector", to_tsquery('english', $1)) DESC
LIMIT $2`,
tsQuery,
limit
);
if (ftsResults.length > 0) {
return db.package.findMany({
where: { id: { in: ftsResults.map((r) => r.id) } },
orderBy: { indexedAt: "desc" },
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
fileCount: true,
creator: true,
indexedAt: true,
destChannelId: true,
destMessageId: true,
},
});
}
} catch {
// FTS failed — fall back to ILIKE
}
}
}
// Fallback: ILIKE search
return db.package.findMany({
where: {
OR: [
{ fileName: { contains: query, mode: "insensitive" } },
@@ -74,7 +119,44 @@ export async function searchPackages(query: string, limit = 10) {
destMessageId: true,
},
});
return packages;
}
// ── Group queries ──
export async function getGroupById(groupId: string) {
return db.packageGroup.findUnique({
where: { id: groupId },
include: {
packages: {
orderBy: { indexedAt: "desc" },
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
fileCount: true,
creator: true,
destChannelId: true,
destMessageId: true,
},
},
},
});
}
export async function searchGroups(query: string, limit = 5) {
return db.packageGroup.findMany({
where: {
name: { contains: query, mode: "insensitive" },
},
orderBy: { createdAt: "desc" },
take: limit,
select: {
id: true,
name: true,
_count: { select: { packages: true } },
},
});
}
export async function getLatestPackages(limit = 5) {
@@ -122,6 +204,9 @@ export async function getPendingSendRequest(requestId: string) {
archiveType: true,
destChannelId: true,
destMessageId: true,
destMessageIds: true,
isMultipart: true,
partCount: true,
previewData: true,
sourceChannel: { select: { title: true, telegramId: true } },
},

View File

@@ -7,7 +7,7 @@ import {
findMatchingSubscriptions,
getGlobalDestinationChannel,
} from "./db/queries.js";
import { copyMessageToUser, sendTextMessage, sendPhotoMessage } from "./tdlib/client.js";
import { copyMessageToUser, copyMultipleMessagesToUser, sendTextMessage, sendPhotoMessage } from "./tdlib/client.js";
import { sleep } from "./util/flood-wait.js";
const log = childLogger("send-listener");
@@ -154,11 +154,25 @@ async function processSendRequest(requestId: string): Promise<void> {
}
// Forward the actual archive file(s) from destination channel
const messageIds = pkg.destMessageIds as bigint[] | undefined;
if (messageIds && messageIds.length > 1) {
log.info(
{ requestId, parts: messageIds.length },
"Sending multi-part archive"
);
await copyMultipleMessagesToUser(
destChannel.telegramId,
messageIds,
targetUserId
);
} else {
// Single part or legacy (no destMessageIds populated)
await copyMessageToUser(
destChannel.telegramId,
pkg.destMessageId,
targetUserId
);
}
await updateSendRequest(requestId, "SENT");
log.info({ requestId }, "Send request completed successfully");

View File

@@ -121,6 +121,25 @@ export async function copyMessageToUser(
}, fileName);
}
/**
* Send multiple document messages from a channel to a user's DM.
* Used for multi-part archives where each part is a separate Telegram message.
* Sends parts sequentially with a small delay to avoid rate limits.
*/
export async function copyMultipleMessagesToUser(
fromChatId: bigint,
messageIds: bigint[],
toUserId: bigint
): Promise<void> {
for (let i = 0; i < messageIds.length; i++) {
await copyMessageToUser(fromChatId, messageIds[i], toUserId);
// Small delay between parts to avoid rate limits
if (i < messageIds.length - 1) {
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
}
/**
* Send a message and wait for Telegram to confirm delivery.
* Returns when updateMessageSendSucceeded fires for the temp message.

View File

@@ -28,6 +28,8 @@ services:
timeout: 5s
retries: 3
start_period: 60s
volumes:
- manual_uploads:/data/uploads
restart: unless-stopped
deploy:
resources:
@@ -54,6 +56,7 @@ services:
volumes:
- tdlib_state:/data/tdlib
- tmp_zips:/tmp/zips
- manual_uploads:/data/uploads
depends_on:
db:
condition: service_healthy
@@ -121,6 +124,7 @@ volumes:
tdlib_state:
tdlib_bot_state:
tmp_zips:
manual_uploads:
networks:
frontend:

View File

@@ -0,0 +1,964 @@
# Multi-Part Send Fix & Kickstarter Package Linking
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Fix multi-part package forwarding so all archive parts reach the user, and add UI to link STL packages to kickstarters with "send all" capability.
**Architecture:** Two independent subsystems. (A) Store all destination message IDs when the worker uploads multi-part archives, then have the bot forward every part. (B) Add a package-linker dialog in the kickstarter UI using the existing `linkPackages` action, plus a "send all" action that queues every linked package.
**Tech Stack:** Prisma (schema + migration), TypeScript worker/bot services, Next.js App Router (server actions + React client components), shadcn/ui, TanStack Table.
---
## File Map
### Subsystem A — Multi-Part Send Fix
| Action | File | Responsibility |
|--------|------|----------------|
| Modify | `prisma/schema.prisma` | Add `destMessageIds BigInt[]` to Package |
| Create | `prisma/migrations/<ts>_add_dest_message_ids/migration.sql` | Migration SQL |
| Modify | `worker/src/upload/channel.ts` | Return all message IDs from `uploadToChannel` |
| Modify | `worker/src/db/queries.ts` | Add `destMessageIds` to `CreatePackageInput` and `createPackageWithFiles` |
| Modify | `worker/src/worker.ts` | Pass all message IDs when creating package |
| Modify | `bot/src/db/queries.ts` | Include `destMessageIds` in `getPendingSendRequest` |
| Modify | `bot/src/send-listener.ts` | Forward all parts, not just the first |
### Subsystem B — Kickstarter Package Linking UI
| Action | File | Responsibility |
|--------|------|----------------|
| Create | `src/app/(app)/kickstarters/_components/package-linker-dialog.tsx` | Dialog with package search + selection for linking |
| Modify | `src/app/(app)/kickstarters/_components/kickstarter-columns.tsx` | Add "Link Packages" and "Send All" actions to row menu |
| Modify | `src/app/(app)/kickstarters/_components/kickstarter-table.tsx` | Wire up new dialogs + state |
| Modify | `src/app/(app)/kickstarters/actions.ts` | Add `sendAllKickstarterPackages` action |
| Modify | `src/data/kickstarter.queries.ts` | Add query to search packages for linking |
---
## Task 1: Add `destMessageIds` to Prisma Schema + Migration
**Files:**
- Modify: `prisma/schema.prisma:470-471`
- Create: migration SQL
- [ ] **Step 1: Add field to schema**
In `prisma/schema.prisma`, add `destMessageIds` after `destMessageId`:
```prisma
destMessageId BigInt?
destMessageIds BigInt[] @default([])
```
- [ ] **Step 2: Create migration SQL manually**
Create the migration directory and SQL file. The migration adds the column with a default and backfills existing rows by copying `destMessageId` into the array where it's non-null:
```sql
-- AlterTable
ALTER TABLE "packages" ADD COLUMN "destMessageIds" BIGINT[] DEFAULT ARRAY[]::BIGINT[];
-- Backfill: copy existing destMessageId into the array
UPDATE "packages"
SET "destMessageIds" = ARRAY["destMessageId"]
WHERE "destMessageId" IS NOT NULL;
```
- [ ] **Step 3: Apply migration to database**
```bash
docker exec dragonsstash-db psql -U dragons -d dragonsstash -f - < migration.sql
```
- [ ] **Step 4: Regenerate Prisma client**
Use the app container (which has node/prisma) to regenerate:
```bash
docker exec dragonsstash npx prisma generate
```
Or, if running locally with node: `npx prisma generate`
- [ ] **Step 5: Commit**
```bash
git add prisma/schema.prisma prisma/migrations/
git commit -m "feat: add destMessageIds field to Package for multi-part forwarding"
```
---
## Task 2: Worker — Return All Message IDs from Upload
**Files:**
- Modify: `worker/src/upload/channel.ts:10-12,25-74`
- [ ] **Step 1: Update UploadResult interface**
In `worker/src/upload/channel.ts`, change the interface to include all IDs:
```typescript
export interface UploadResult {
messageId: bigint;
messageIds: bigint[];
}
```
- [ ] **Step 2: Collect all message IDs in uploadToChannel**
Replace the upload loop to track all message IDs:
```typescript
export async function uploadToChannel(
client: Client,
chatId: bigint,
filePaths: string[],
caption?: string
): Promise<UploadResult> {
const allMessageIds: bigint[] = [];
for (let i = 0; i < filePaths.length; i++) {
const filePath = filePaths[i];
const fileCaption = i === 0 && caption ? caption : undefined;
const fileName = path.basename(filePath);
let fileSizeMB = 0;
try {
const s = await stat(filePath);
fileSizeMB = Math.round(s.size / (1024 * 1024));
} catch {
// Non-critical
}
log.info(
{ chatId: Number(chatId), fileName, sizeMB: fileSizeMB, part: i + 1, total: filePaths.length },
"Uploading file to channel"
);
const serverMsgId = await sendWithRetry(client, chatId, filePath, fileCaption, fileName, fileSizeMB);
allMessageIds.push(serverMsgId);
// Rate limit delay between uploads
if (i < filePaths.length - 1) {
await sleep(config.apiDelayMs);
}
}
if (allMessageIds.length === 0) {
throw new Error("Upload failed: no messages sent");
}
log.info(
{ chatId: Number(chatId), messageId: Number(allMessageIds[0]), files: filePaths.length },
"All uploads confirmed by Telegram"
);
return { messageId: allMessageIds[0], messageIds: allMessageIds };
}
```
- [ ] **Step 3: Commit**
```bash
git add worker/src/upload/channel.ts
git commit -m "feat: return all message IDs from uploadToChannel for multi-part"
```
---
## Task 3: Worker — Store All Message IDs in Database
**Files:**
- Modify: `worker/src/db/queries.ts:104-155`
- Modify: `worker/src/worker.ts:1056-1086`
- [ ] **Step 1: Add destMessageIds to CreatePackageInput**
In `worker/src/db/queries.ts`, add the field to the interface:
```typescript
export interface CreatePackageInput {
// ... existing fields ...
destMessageId?: bigint;
destMessageIds?: bigint[];
// ... rest ...
}
```
- [ ] **Step 2: Store destMessageIds in createPackageWithFiles**
In the `db.package.create` call inside `createPackageWithFiles`, add:
```typescript
destMessageIds: input.destMessageIds ?? (input.destMessageId ? [input.destMessageId] : []),
```
- [ ] **Step 3: Pass messageIds from worker pipeline**
In `worker/src/worker.ts`, the upload section (around line 1068-1085) currently does:
```typescript
destResult = await uploadToChannel(client, destChannelTelegramId, uploadPaths);
```
After this, when calling `createPackageWithFiles`, add `destMessageIds`:
```typescript
const pkg = await createPackageWithFiles({
// ... existing fields ...
destMessageId: destResult.messageId,
destMessageIds: destResult.messageIds,
// ... rest ...
});
```
- [ ] **Step 4: Commit**
```bash
git add worker/src/db/queries.ts worker/src/worker.ts
git commit -m "feat: store all multi-part message IDs in package record"
```
---
## Task 4: Bot — Forward All Parts
**Files:**
- Modify: `bot/src/db/queries.ts:110-132`
- Modify: `bot/src/send-listener.ts:105-169`
- Modify: `bot/src/tdlib/client.ts:66-122`
- [ ] **Step 1: Include destMessageIds in bot query**
In `bot/src/db/queries.ts`, add `destMessageIds` to the `getPendingSendRequest` select:
```typescript
package: {
select: {
id: true,
fileName: true,
fileSize: true,
fileCount: true,
creator: true,
tags: true,
archiveType: true,
destChannelId: true,
destMessageId: true,
destMessageIds: true, // <-- ADD THIS
isMultipart: true, // <-- ADD THIS (for logging)
partCount: true, // <-- ADD THIS (for logging)
previewData: true,
sourceChannel: { select: { title: true, telegramId: true } },
},
},
```
- [ ] **Step 2: Add copyMultipleMessagesToUser helper**
In `bot/src/tdlib/client.ts`, add a new export after `copyMessageToUser`:
```typescript
/**
* Send multiple document messages from a channel to a user's DM.
* Used for multi-part archives where each part is a separate Telegram message.
* Sends parts sequentially with a small delay to avoid rate limits.
*/
export async function copyMultipleMessagesToUser(
fromChatId: bigint,
messageIds: bigint[],
toUserId: bigint
): Promise<void> {
for (let i = 0; i < messageIds.length; i++) {
await copyMessageToUser(fromChatId, messageIds[i], toUserId);
// Small delay between parts to avoid rate limits
if (i < messageIds.length - 1) {
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
}
```
- [ ] **Step 3: Update processSendRequest to forward all parts**
In `bot/src/send-listener.ts`, update the import to include the new function:
```typescript
import { copyMessageToUser, copyMultipleMessagesToUser, sendTextMessage, sendPhotoMessage } from "./tdlib/client.js";
```
Then replace the single `copyMessageToUser` call (around line 157) with logic that forwards all parts:
```typescript
// Forward the actual archive file(s) from destination channel
const messageIds = pkg.destMessageIds as bigint[] | undefined;
if (messageIds && messageIds.length > 1) {
log.info(
{ requestId, parts: messageIds.length },
"Sending multi-part archive"
);
await copyMultipleMessagesToUser(
destChannel.telegramId,
messageIds,
targetUserId
);
} else {
// Single part or legacy (no destMessageIds populated)
await copyMessageToUser(
destChannel.telegramId,
pkg.destMessageId,
targetUserId
);
}
```
- [ ] **Step 4: Commit**
```bash
git add bot/src/db/queries.ts bot/src/send-listener.ts bot/src/tdlib/client.ts
git commit -m "feat: forward all parts of multi-part archives via bot"
```
---
## Task 5: Rebuild & Deploy Worker + Bot
- [ ] **Step 1: Rebuild worker image**
```bash
docker compose -f docker-compose.dev.yml build worker
docker tag dragonsstash-worker:latest git.samagsteribbe.nl/admin/dragonsstash-worker:latest
docker compose -p dragonsstash -f /opt/stacks/DragonsStash/docker-compose.yml up -d worker
```
- [ ] **Step 2: Rebuild bot image**
```bash
docker compose -f docker-compose.dev.yml build bot
docker tag dragonsstash-bot:latest git.samagsteribbe.nl/admin/dragonsstash-bot:latest
docker compose -p dragonsstash -f /opt/stacks/DragonsStash/docker-compose.yml up -d bot
```
- [ ] **Step 3: Verify bot startup**
```bash
docker logs dragonsstash-bot --tail=20
```
Expected: Bot starts cleanly, "Send listener started" message.
---
## Task 6: Kickstarter — Package Search Query
**Files:**
- Modify: `src/data/kickstarter.queries.ts`
- [ ] **Step 1: Add searchPackagesForLinking query**
Append to `src/data/kickstarter.queries.ts`:
```typescript
export async function searchPackagesForLinking(query: string, limit = 20) {
if (!query || query.length < 2) return [];
return prisma.package.findMany({
where: {
OR: [
{ fileName: { contains: query, mode: "insensitive" } },
{ creator: { contains: query, mode: "insensitive" } },
],
},
orderBy: { indexedAt: "desc" },
take: limit,
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
creator: true,
fileCount: true,
},
});
}
export async function getLinkedPackageIds(kickstarterId: string): Promise<string[]> {
const links = await prisma.kickstarterPackage.findMany({
where: { kickstarterId },
select: { packageId: true },
});
return links.map((l) => l.packageId);
}
```
- [ ] **Step 2: Commit**
```bash
git add src/data/kickstarter.queries.ts
git commit -m "feat: add package search query for kickstarter linking"
```
---
## Task 7: Kickstarter — Package Linker Dialog Component
**Files:**
- Create: `src/app/(app)/kickstarters/_components/package-linker-dialog.tsx`
- [ ] **Step 1: Create the package linker dialog**
This component provides a search input to find packages and checkboxes to select/deselect them. It calls the existing `linkPackages` action on save.
```tsx
"use client";
import { useState, useTransition, useCallback, useEffect } from "react";
import { Search, Package, X, Loader2 } from "lucide-react";
import { toast } from "sonner";
import { linkPackages } from "../actions";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Badge } from "@/components/ui/badge";
import { Checkbox } from "@/components/ui/checkbox";
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
import { ScrollArea } from "@/components/ui/scroll-area";
interface PackageResult {
id: string;
fileName: string;
fileSize: bigint;
archiveType: string;
creator: string | null;
fileCount: number;
}
interface PackageLinkerDialogProps {
open: boolean;
onOpenChange: (open: boolean) => void;
kickstarterId: string;
kickstarterName: string;
initialPackageIds: string[];
}
function formatSize(bytes: bigint | number): string {
const b = Number(bytes);
if (b >= 1024 * 1024 * 1024) return `${(b / (1024 * 1024 * 1024)).toFixed(1)} GB`;
if (b >= 1024 * 1024) return `${(b / (1024 * 1024)).toFixed(0)} MB`;
return `${(b / 1024).toFixed(0)} KB`;
}
export function PackageLinkerDialog({
open,
onOpenChange,
kickstarterId,
kickstarterName,
initialPackageIds,
}: PackageLinkerDialogProps) {
const [isPending, startTransition] = useTransition();
const [searchQuery, setSearchQuery] = useState("");
const [searchResults, setSearchResults] = useState<PackageResult[]>([]);
const [isSearching, setIsSearching] = useState(false);
const [selectedIds, setSelectedIds] = useState<Set<string>>(new Set(initialPackageIds));
// Reset state when dialog opens
useEffect(() => {
if (open) {
setSelectedIds(new Set(initialPackageIds));
setSearchQuery("");
setSearchResults([]);
}
}, [open, initialPackageIds]);
const doSearch = useCallback(async (query: string) => {
if (query.length < 2) {
setSearchResults([]);
return;
}
setIsSearching(true);
try {
const res = await fetch(`/api/packages/search?q=${encodeURIComponent(query)}&limit=20`);
if (res.ok) {
const data = await res.json();
setSearchResults(data.packages ?? []);
}
} catch {
// Ignore search errors
} finally {
setIsSearching(false);
}
}, []);
// Debounced search
useEffect(() => {
const timer = setTimeout(() => doSearch(searchQuery), 300);
return () => clearTimeout(timer);
}, [searchQuery, doSearch]);
function togglePackage(id: string) {
setSelectedIds((prev) => {
const next = new Set(prev);
if (next.has(id)) next.delete(id);
else next.add(id);
return next;
});
}
function handleSave() {
startTransition(async () => {
const result = await linkPackages(kickstarterId, Array.from(selectedIds));
if (result.success) {
toast.success(`Linked ${selectedIds.size} package(s) to "${kickstarterName}"`);
onOpenChange(false);
} else {
toast.error(result.error);
}
});
}
return (
<Dialog open={open} onOpenChange={onOpenChange}>
<DialogContent className="sm:max-w-lg">
<DialogHeader>
<DialogTitle>Link Packages</DialogTitle>
<DialogDescription>
Search and select STL packages to link to &ldquo;{kickstarterName}&rdquo;.
</DialogDescription>
</DialogHeader>
<div className="space-y-3">
{/* Selected count */}
{selectedIds.size > 0 && (
<div className="flex items-center gap-2 text-sm text-muted-foreground">
<Package className="h-4 w-4" />
{selectedIds.size} package(s) selected
<Button
variant="ghost"
size="sm"
className="h-6 px-2 text-xs"
onClick={() => setSelectedIds(new Set())}
>
Clear all
</Button>
</div>
)}
{/* Search input */}
<div className="relative">
<Search className="absolute left-2.5 top-2.5 h-4 w-4 text-muted-foreground" />
<Input
placeholder="Search packages by name or creator..."
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
className="pl-9"
autoFocus
/>
{isSearching && (
<Loader2 className="absolute right-2.5 top-2.5 h-4 w-4 animate-spin text-muted-foreground" />
)}
</div>
{/* Results */}
<ScrollArea className="h-[300px] rounded-md border">
<div className="p-2 space-y-1">
{searchResults.length === 0 && searchQuery.length >= 2 && !isSearching && (
<p className="text-sm text-muted-foreground text-center py-8">
No packages found
</p>
)}
{searchQuery.length < 2 && (
<p className="text-sm text-muted-foreground text-center py-8">
Type at least 2 characters to search
</p>
)}
{searchResults.map((pkg) => (
<label
key={pkg.id}
className="flex items-center gap-3 p-2 rounded-md hover:bg-muted/50 cursor-pointer"
>
<Checkbox
checked={selectedIds.has(pkg.id)}
onCheckedChange={() => togglePackage(pkg.id)}
/>
<div className="flex-1 min-w-0">
<p className="text-sm font-medium truncate">{pkg.fileName}</p>
<div className="flex items-center gap-2 text-xs text-muted-foreground">
{pkg.creator && <span>{pkg.creator}</span>}
<span>{formatSize(pkg.fileSize)}</span>
<Badge variant="outline" className="text-[10px] h-4 px-1">
{pkg.archiveType}
</Badge>
{pkg.fileCount > 0 && <span>{pkg.fileCount} files</span>}
</div>
</div>
{selectedIds.has(pkg.id) && (
<X className="h-3.5 w-3.5 text-muted-foreground shrink-0" />
)}
</label>
))}
</div>
</ScrollArea>
</div>
<DialogFooter>
<Button variant="outline" onClick={() => onOpenChange(false)}>
Cancel
</Button>
<Button onClick={handleSave} disabled={isPending}>
{isPending ? <Loader2 className="h-4 w-4 animate-spin mr-1" /> : null}
Save ({selectedIds.size})
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
}
```
- [ ] **Step 2: Commit**
```bash
git add src/app/(app)/kickstarters/_components/package-linker-dialog.tsx
git commit -m "feat: add package linker dialog for kickstarters"
```
---
## Task 8: Package Search API Route
**Files:**
- Create: `src/app/api/packages/search/route.ts`
- [ ] **Step 1: Create the API route**
The package linker dialog needs a client-side fetch for debounced search. Create a lightweight API route:
```typescript
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { searchPackagesForLinking } from "@/data/kickstarter.queries";
export const dynamic = "force-dynamic";
export async function GET(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { searchParams } = new URL(request.url);
const query = searchParams.get("q") ?? "";
const limit = Math.min(Number(searchParams.get("limit") ?? "20"), 50);
const packages = await searchPackagesForLinking(query, limit);
// Serialize BigInt for JSON
const serialized = packages.map((p) => ({
...p,
fileSize: p.fileSize.toString(),
}));
return NextResponse.json({ packages: serialized });
}
```
- [ ] **Step 2: Commit**
```bash
git add src/app/api/packages/search/route.ts
git commit -m "feat: add package search API route for kickstarter linking"
```
---
## Task 9: Kickstarter — Send All Packages Action
**Files:**
- Modify: `src/app/(app)/kickstarters/actions.ts`
- [ ] **Step 1: Add sendAllKickstarterPackages action**
Append to `src/app/(app)/kickstarters/actions.ts`:
```typescript
export async function sendAllKickstarterPackages(
kickstarterId: string
): Promise<ActionResult<{ queued: number }>> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
const telegramLink = await prisma.telegramLink.findUnique({
where: { userId: session.user.id },
});
if (!telegramLink) {
return { success: false, error: "No linked Telegram account. Link one in Settings." };
}
const kickstarter = await prisma.kickstarter.findFirst({
where: { id: kickstarterId, userId: session.user.id },
select: {
packages: {
select: {
package: {
select: { id: true, destChannelId: true, destMessageId: true, fileName: true },
},
},
},
},
});
if (!kickstarter) {
return { success: false, error: "Kickstarter not found" };
}
const sendablePackages = kickstarter.packages
.map((lnk) => lnk.package)
.filter((p) => p.destChannelId && p.destMessageId);
if (sendablePackages.length === 0) {
return { success: false, error: "No linked packages are available for sending" };
}
let queued = 0;
for (const pkg of sendablePackages) {
const existing = await prisma.botSendRequest.findFirst({
where: {
packageId: pkg.id,
telegramLinkId: telegramLink.id,
status: { in: ["PENDING", "SENDING"] },
},
});
if (!existing) {
const sendRequest = await prisma.botSendRequest.create({
data: {
packageId: pkg.id,
telegramLinkId: telegramLink.id,
requestedByUserId: session.user.id,
status: "PENDING",
},
});
try {
await prisma.$queryRawUnsafe(
`SELECT pg_notify('bot_send', $1)`,
sendRequest.id
);
} catch {
// Best-effort
}
queued++;
}
}
revalidatePath(REVALIDATE_PATH);
return { success: true, data: { queued } };
} catch {
return { success: false, error: "Failed to send packages" };
}
}
```
- [ ] **Step 2: Commit**
```bash
git add src/app/(app)/kickstarters/actions.ts
git commit -m "feat: add sendAllKickstarterPackages action"
```
---
## Task 10: Kickstarter Table — Wire Up Link & Send Actions
**Files:**
- Modify: `src/app/(app)/kickstarters/_components/kickstarter-columns.tsx`
- Modify: `src/app/(app)/kickstarters/_components/kickstarter-table.tsx`
- [ ] **Step 1: Add actions to column menu**
In `kickstarter-columns.tsx`, add `Link2` and `Send` imports from lucide-react, add `onLinkPackages` and `onSendAll` to props, and add menu items:
```typescript
import { MoreHorizontal, Pencil, Trash2, ExternalLink, Link2, Send } from "lucide-react";
// Update interface:
interface KickstarterColumnsProps {
onEdit: (kickstarter: KickstarterRow) => void;
onDelete: (id: string) => void;
onLinkPackages: (kickstarter: KickstarterRow) => void;
onSendAll: (kickstarter: KickstarterRow) => void;
}
```
In the actions column dropdown, add between Edit and the separator:
```tsx
<DropdownMenuItem onClick={() => onLinkPackages(row.original)}>
<Link2 className="mr-2 h-3.5 w-3.5" />
Link Packages
</DropdownMenuItem>
{row.original._count.packages > 0 && (
<DropdownMenuItem onClick={() => onSendAll(row.original)}>
<Send className="mr-2 h-3.5 w-3.5" />
Send All ({row.original._count.packages})
</DropdownMenuItem>
)}
```
Update the function signature to destructure the new props:
```typescript
export function getKickstarterColumns({
onEdit,
onDelete,
onLinkPackages,
onSendAll,
}: KickstarterColumnsProps): ColumnDef<KickstarterRow, unknown>[] {
```
- [ ] **Step 2: Wire up state in kickstarter-table.tsx**
Add imports and state for the new dialogs:
```typescript
import { PackageLinkerDialog } from "./package-linker-dialog";
import { sendAllKickstarterPackages } from "../actions";
// Inside KickstarterTable:
const [linkTarget, setLinkTarget] = useState<KickstarterRow | null>(null);
const [sendAllTarget, setSendAllTarget] = useState<KickstarterRow | null>(null);
```
Update the columns call:
```typescript
const columns = getKickstarterColumns({
onEdit: (kickstarter) => {
setEditKickstarter(kickstarter);
setModalOpen(true);
},
onDelete: (id) => setDeleteId(id),
onLinkPackages: (kickstarter) => setLinkTarget(kickstarter),
onSendAll: (kickstarter) => {
startTransition(async () => {
const result = await sendAllKickstarterPackages(kickstarter.id);
if (result.success) {
toast.success(`Queued ${result.data!.queued} package(s) for delivery`);
} else {
toast.error(result.error);
}
});
},
});
```
Add the `PackageLinkerDialog` before the closing `</div>` of the component's return:
```tsx
{linkTarget && (
<PackageLinkerDialog
open={!!linkTarget}
onOpenChange={(open) => !open && setLinkTarget(null)}
kickstarterId={linkTarget.id}
kickstarterName={linkTarget.name}
initialPackageIds={[]}
/>
)}
```
Note: `initialPackageIds` is `[]` because the table doesn't fetch linked packages. The dialog will start empty but preserve selections during the session. For a better UX, we fetch the linked IDs when the dialog opens — see step 3.
- [ ] **Step 3: Fetch initial linked packages when dialog opens**
To populate the dialog with already-linked packages, add an API route or use a server action. The simplest approach: modify the `PackageLinkerDialog` to fetch linked IDs on mount.
In `package-linker-dialog.tsx`, add to the `useEffect` that runs when `open` changes:
```typescript
useEffect(() => {
if (open) {
setSearchQuery("");
setSearchResults([]);
// Fetch currently linked packages
fetch(`/api/packages/linked?kickstarterId=${kickstarterId}`)
.then((res) => res.json())
.then((data) => {
if (data.packageIds) {
setSelectedIds(new Set(data.packageIds));
}
})
.catch(() => {});
}
}, [open, kickstarterId]);
```
Create the API route at `src/app/api/packages/linked/route.ts`:
```typescript
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { getLinkedPackageIds } from "@/data/kickstarter.queries";
export const dynamic = "force-dynamic";
export async function GET(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { searchParams } = new URL(request.url);
const kickstarterId = searchParams.get("kickstarterId");
if (!kickstarterId) {
return NextResponse.json({ error: "kickstarterId required" }, { status: 400 });
}
const packageIds = await getLinkedPackageIds(kickstarterId);
return NextResponse.json({ packageIds });
}
```
- [ ] **Step 4: Commit**
```bash
git add src/app/(app)/kickstarters/_components/ src/app/api/packages/
git commit -m "feat: wire up package linking and send-all in kickstarter table"
```
---
## Task 11: Rebuild & Deploy App
- [ ] **Step 1: Rebuild app image**
```bash
docker compose build app # or equivalent for the production compose
docker tag dragonsstash:latest git.samagsteribbe.nl/admin/dragonsstash:latest
docker compose -p dragonsstash -f /opt/stacks/DragonsStash/docker-compose.yml up -d app
```
- [ ] **Step 2: Verify app startup**
```bash
docker logs dragonsstash --tail=20
```
Expected: App starts cleanly, health check passes.
- [ ] **Step 3: Manual test**
1. Go to Kickstarters tab
2. Open a kickstarter's row menu → "Link Packages"
3. Search for a package, select it, save
4. Verify the package count column updates
5. Use "Send All" to queue all linked packages for Telegram delivery

View File

@@ -0,0 +1,472 @@
# Dragonstash Grouping System Audit & Enhancement Report
## Appendix: Real-World Failure Cases (2026-03-29/30)
These skipped packages reveal two concrete issues:
### Issue A: `WORKER_MAX_ZIP_SIZE_MB` was 4 GB — blocking all large multipart archives
| File | Parts | Total Size | Status |
|------|-------|-----------|--------|
| DM-Stash - Guide to Tharador - Complete STL | 19 | 70.5 GB | SIZE_LIMIT |
| DM-Stash - 2023-05 - Greywinds All-in | 16 | 58.9 GB | SIZE_LIMIT |
| Axolote Gaming - Castle of the Vampire Lord | 10 | 18 GB | SIZE_LIMIT |
| Dungeon Blocks - THE ULTIMATE DUNGEON | 5 | 7.6 GB | SIZE_LIMIT |
| Dungeon Blocks - The Toxic sewer | 4 | 6.2 GB | SIZE_LIMIT |
| Soulmist | 4 | 6.3 GB | SIZE_LIMIT |
| Medieval Town PT1 | 3 | 5.7 GB | SIZE_LIMIT |
| Knight Models - Game Of Thrones | 3 | 5.5 GB | SIZE_LIMIT |
| Dungeon Blocks - The Lost Cave | 3 | 4.9 GB | SIZE_LIMIT |
| El Miniaturista 2025-05 Fulgrim Part II and III | 5 | 4.7 GB | SIZE_LIMIT |
**Root cause:** Production env had `WORKER_MAX_ZIP_SIZE_MB=4096`. The default in code is 204800 (200 GB), but docker-compose.yml defaulted to 4096.
**Fix applied:** Raised to 204800 in `/opt/stacks/DragonsStash/.env`. Worker restarted. These archives will be retried on the next ingestion cycle. The worker downloads parts individually (each under 2-4 GB), concatenates, re-splits at 1950 MiB for upload. Peak temp disk usage for the 70.5 GB archive: ~211 GB (353 GB available).
**Code fix:** `MAX_PART_SIZE` is now configurable via `MAX_PART_SIZE_MB` env var (was hardcoded at 1950). Set to 3900 for Telegram Premium accounts to avoid unnecessary splitting.
### Issue B: Download failure at 98% (DE1-Supported.7z)
| File | Size | Error |
|------|------|-------|
| DE1-Supported.7z | 1.9 GB | Download stopped unexpectedly at 2043674624/2078338541 bytes (98%) |
**Root cause:** Download stalled near completion with no retry mechanism.
**Fix applied:** Earlier in this session, download retry logic was added (max 3 retries with `cancelDownloadFile` before each retry). This file will be retried automatically on next ingestion cycle.
---
## Deliverable 1: Audit Report — Current State
### 1.1 Grouping Signal Stack (Current)
The system currently uses exactly **one automatic grouping signal**:
| Priority | Signal | Status | Location |
|----------|--------|--------|----------|
| 1 | `mediaAlbumId` | Implemented | `worker/src/grouping.ts:26-33` |
| 2 | Manual override | Implemented | `src/lib/telegram/queries.ts:606-639` |
**How it works:**
- `processAlbumGroups()` in `worker/src/grouping.ts` groups indexed packages by `mediaAlbumId` (filtering out "0" and null)
- For albums with 2+ members: creates `PackageGroup`, links packages, assigns name from album photo caption or first filename
- Manual grouping via UI: select 2+ packages, enter name, creates group in `createManualGroup()`
**What does NOT exist:**
- No `message_thread_id` (forum topic) scoping
- No project/month pattern extraction from filenames
- No creator/sender grouping
- No time-window + sender clustering
- No reply chain analysis
- No ZIP internal path prefix matching
- No caption fuzzy matching
- No staging queue for ungrouped files
### 1.2 Multipart Archive Detection (`worker/src/archive/multipart.ts`)
This is a **separate system** from display grouping. `groupArchiveSets()` groups Telegram messages into `ArchiveSet[]` based on filename patterns:
- `.zip.001`, `.zip.002` → ZIP_NUMBERED
- `.z01`, `.z02`, `.zip` → ZIP_LEGACY
- `.part1.rar`, `.part2.rar` → RAR_PART
- `.r00`, `.r01`, `.rar` → RAR_LEGACY
These are grouped by `format:baseName.toLowerCase()` key. This is about **reassembling split archives**, not UI grouping. An `ArchiveSet` becomes a single `Package` in the database.
### 1.3 TDLib Ingestion Handler
**Pipeline in `worker/src/worker.ts:801-1197`:**
```
processOneArchiveSet():
1. Early skip check (source message ID)
2. Size guard (maxZipSizeMB)
3. Download all parts
4. Compute SHA-256 hash
5. Check hash dedup
6. Read archive metadata
7. Split/repack if needed
8. Upload to destination
9. Download preview
10. Extract fallback preview
11. Resolve creator
12. Index in database
13. Cleanup temp files
```
**Post-indexing:** `processAlbumGroups()` is called once per channel/topic scan to create album-based groups.
**Gaps:**
- Messages are never "dropped" silently — failures go to `SkippedPackage` table with reason
- Watermark only advances past successfully processed sets (failed sets block advancement)
- No messages are missed within a channel, but there's no audit to verify completeness after the fact
### 1.4 Hash Verification
**What IS verified:**
| Check | Where | When |
|-------|-------|------|
| Download file size | `download.ts:verifyAndMove()` | After each file download |
| SHA-256 content hash | `worker.ts:952` | After download, used for dedup |
| Telegram upload confirmation | `channel.ts:updateMessageSendSucceeded` | Waits for server ACK |
**What is NOT verified:**
| Gap | Impact |
|-----|--------|
| No hash after upload | Can't detect Telegram-side corruption |
| No hash after split | Split files could be silently corrupted |
| CRC-32 extracted but never checked | ZIP/RAR per-file integrity not validated |
| No end-to-end hash | Split files have different hash than original |
| No periodic audit job | Stale/missing data never detected |
### 1.5 File Size Limit
| Setting | Value | Configurable? | Location |
|---------|-------|---------------|----------|
| `MAX_PART_SIZE` | 1950 MiB | **Hardcoded** | `worker/src/archive/split.ts:14` |
| `MAX_UPLOAD_SIZE` | 1950 MiB | **Hardcoded** | `worker/src/worker.ts:1023` |
| `maxZipSizeMB` | 200 GB | `WORKER_MAX_ZIP_SIZE_MB` env var | `worker/src/util/config.ts:6` |
The 1950 MiB limit is deliberately below 2 GiB to avoid TDLib's `FILE_PARTS_INVALID` error. There is **no Premium awareness** — all accounts are treated as non-Premium.
### 1.6 Search Implementation
- **No fuzzy search** — uses Prisma's `contains` with `mode: "insensitive"` (translates to PostgreSQL `ILIKE`)
- **No full-text search infrastructure** — no `tsvector`, no GiST/GIN indexes
- **Indexes:** B-tree on `fileName`, `creator`, `archiveType`, `indexedAt`, plus `PackageFile.fileName` and `extension`
- Search works for substring matching but won't match typos or similar names
### 1.7 Notification Infrastructure
- **pg_notify channels:** `bot_send`, `new_package` (bot), plus 7 worker channels
- **Bot subscriptions:** pattern-match (case-insensitive substring) on `fileName` and `creator`
- **UI notifications:** Sonner toast (ephemeral only)
- **No persistent notification store** — no database model for notifications
- **No notification UI panel** in the web app
- **No alerts for:** grouping conflicts, hash mismatches, missing parts, upload failures (beyond SkippedPackage table)
---
## Deliverable 2: Revised Grouping Signal Stack
### Recommended Implementation Plan
I recommend an **incremental approach** — implement signals in phases, starting with highest-value/lowest-risk.
### Phase 1: Foundation (Required Before Other Signals)
#### Signal 9: Manual Override Persistence
**Status:** Partially implemented. Manual groups exist but don't influence future auto-grouping.
**Implementation:**
- Add `groupingSource` field to `PackageGroup`: `"ALBUM" | "MANUAL" | "AUTO_PATTERN" | "AUTO_TIME" | "AUTO_REPLY" | "AUTO_ZIP" | "AUTO_CAPTION"`
- Manual groups already persist. What's missing is the **training feedback** where a manual grouping teaches the system to auto-group similar future files.
- This requires a `GroupingRule` model (see schema diff below) that stores learned patterns from manual overrides.
#### Ungrouped Staging Queue
**Implementation:**
- After ingestion, packages without a `packageGroupId` are naturally "ungrouped"
- Add a filter/tab to the STL page: "Ungrouped" showing packages where `packageGroupId IS NULL`
- No schema change needed — just a query filter
### Phase 2: High-Value Automatic Signals
#### Signal 1: `mediaAlbumId` (Already Implemented)
No changes needed. This is working correctly.
#### Signal 2: `message_thread_id` Forum Topic Scoping
**Status:** Already used for scan scoping (worker scans by topic), but not used as a grouping signal.
**Implementation:**
- `sourceTopicId` is already stored on `Package` (schema line 469)
- Use it as a **scoping constraint** for all other signals: time-window, caption matching, etc. only apply within the same topic
- No additional schema changes needed
#### Signal 5: Time Window + Sender Grouping
**Implementation:**
- After album grouping, find ungrouped packages from the same source channel + topic
- Within a configurable window (default 5 min), cluster by proximity
- Since we don't have `sender_id` from the source channel (TDLib `searchChatMessages` doesn't return it for channels), this becomes **time-window within topic/channel**
- New config: `AUTO_GROUP_TIME_WINDOW_MINUTES` (default: 5)
#### Signal 3: Project/Month Pattern Extraction
**Implementation:**
- Extract date patterns from filenames/captions: `YYYY-MM`, `YYYY_MM`, `MonthName Year`
- Extract project slugs: common prefix before separator (e.g., "ProjectName - File1.zip" and "ProjectName - File2.zip")
- Group packages with matching patterns from the same channel
- This should run as a **post-processing pass** after time-window grouping, merging small time-window groups that share a pattern
#### Signal 4: Creator Grouping
**Implementation:**
- The `creator` field is already extracted from filenames and stored per-package
- Within a channel, if multiple ungrouped packages have the same `creator` and were indexed within the same ingestion run, auto-group them
- Lower priority than time-window (might create overly broad groups)
### Phase 3: Advanced Signals
#### Signal 6: Reply Chain
**Implementation:**
- TDLib messages have `reply_to_message_id` but this isn't currently captured during scanning
- Would need to modify `getChannelMessages()` in `download.ts` to extract `reply_to_message_id`
- Then: if message B replies to message A, and both are archives, group them
- **Moderate complexity**, deferred to Phase 3
#### Signal 7: ZIP Internal Path Prefix
**Implementation:**
- Already have `PackageFile.path` stored for each file inside an archive
- After indexing, find the common root folder across all files
- If two packages share the same root prefix and same channel, suggest grouping
- This is a **post-hoc analysis** that could run as a background job
#### Signal 8: Caption Fuzzy Match
**Implementation:**
- Currently captions from source messages are NOT stored (only photo captions for preview matching)
- Would need to capture `msg.content?.caption?.text` during scanning and store on Package
- Then: fuzzy-match captions from nearby messages in same channel
- **Requires schema change + scan modification**, deferred to Phase 3
---
## Deliverable 3: Schema Diff
All changes are **additive** — no columns dropped, no types changed.
```prisma
// ── PackageGroup additions ──
model PackageGroup {
// ... existing fields ...
groupingSource GroupingSource @default(MANUAL) // NEW: how this group was created
}
// NEW enum
enum GroupingSource {
ALBUM // From Telegram mediaAlbumId
MANUAL // User-created via UI
AUTO_PATTERN // Filename/date pattern matching
AUTO_TIME // Time-window clustering
AUTO_REPLY // Reply chain
AUTO_ZIP // ZIP path prefix
AUTO_CAPTION // Caption fuzzy match
}
// ── Package additions ──
model Package {
// ... existing fields ...
sourceCaption String? // NEW: caption text from source Telegram message
}
// ── New model: GroupingRule (training from manual overrides) ──
model GroupingRule {
id String @id @default(cuid())
sourceChannelId String
pattern String // Regex or glob pattern learned from manual grouping
signalType GroupingSource // Which signal this rule applies to
confidence Float @default(1.0)
createdAt DateTime @default(now())
createdByGroupId String? // The manual group that spawned this rule
sourceChannel TelegramChannel @relation(fields: [sourceChannelId], references: [id], onDelete: Cascade)
@@index([sourceChannelId])
@@map("grouping_rules")
}
// ── New model: SystemNotification ──
model SystemNotification {
id String @id @default(cuid())
type NotificationType
severity NotificationSeverity @default(INFO)
title String
message String
context Json? // Structured data: packageId, groupId, sourceMessageId, etc.
isRead Boolean @default(false)
createdAt DateTime @default(now())
@@index([isRead, createdAt])
@@index([type])
@@map("system_notifications")
}
enum NotificationType {
HASH_MISMATCH
MISSING_PART
UPLOAD_FAILED
DOWNLOAD_FAILED
GROUPING_CONFLICT
INTEGRITY_AUDIT
}
enum NotificationSeverity {
INFO
WARNING
ERROR
}
// ── Config additions (worker/src/util/config.ts) ──
// maxPartSizeMB: parseInt(process.env.MAX_PART_SIZE_MB ?? "1950", 10)
// autoGroupTimeWindowMinutes: parseInt(process.env.AUTO_GROUP_TIME_WINDOW_MINUTES ?? "5", 10)
// telegramPremium: process.env.TELEGRAM_PREMIUM === "true"
```
**Migration notes:**
- All new fields are optional/have defaults — zero-risk to existing data
- `GroupingSource` enum added with `@default(MANUAL)` — existing groups unaffected
- `GroupingRule` and `SystemNotification` are new tables — no impact on existing
- Backfill: set `groupingSource = ALBUM` for groups where `mediaAlbumId IS NOT NULL`
---
## Deliverable 4: Notification Contract
### Event Shape
```typescript
interface SystemNotificationEvent {
type: NotificationType;
severity: "INFO" | "WARNING" | "ERROR";
title: string;
message: string;
context: {
packageId?: string;
groupId?: string;
sourceChannelId?: string;
sourceMessageId?: bigint;
fileName?: string;
partNumber?: number;
totalParts?: number;
expectedHash?: string;
actualHash?: string;
reason?: string;
};
}
```
### Where Notifications Fire
| Event | Where | Trigger |
|-------|-------|---------|
| `HASH_MISMATCH` | `worker/src/worker.ts` after split | SHA-256 of concatenated split parts != original hash |
| `MISSING_PART` | Periodic audit job (new) | Group has `partCount > 1` but fewer than `partCount` dest messages exist |
| `UPLOAD_FAILED` | `worker/src/worker.ts` catch block | Upload fails after all retries exhausted |
| `DOWNLOAD_FAILED` | `worker/src/worker.ts` catch block | Download fails after all retries |
| `GROUPING_CONFLICT` | Auto-grouping pass (new) | Two signals suggest different groups for the same package |
| `INTEGRITY_AUDIT` | Periodic job (new) | Scheduled check finds inconsistencies |
### Delivery
1. **Database:** Always persisted to `SystemNotification` table
2. **pg_notify:** `SELECT pg_notify('system_notification', jsonPayload)` for real-time
3. **Web UI:** Notification bell/panel that polls or listens for new notifications
4. **Telegram (optional):** Forward critical notifications to admin via bot
---
## Deliverable 5: Feature Flag Plan
### Runtime Configuration (Environment Variables)
| Flag | Type | Default | Purpose |
|------|------|---------|---------|
| `TELEGRAM_PREMIUM` | boolean | `false` | Enable 4GB upload limit |
| `MAX_PART_SIZE_MB` | number | `1950` | Split threshold in MiB (overrides hardcoded value) |
| `AUTO_GROUP_ENABLED` | boolean | `false` | Enable automatic grouping beyond album |
| `AUTO_GROUP_TIME_WINDOW_MINUTES` | number | `5` | Time-window clustering threshold |
| `AUTO_GROUP_PATTERN_ENABLED` | boolean | `false` | Enable filename/date pattern grouping |
| `INTEGRITY_AUDIT_ENABLED` | boolean | `false` | Enable periodic integrity audit |
| `INTEGRITY_AUDIT_INTERVAL_HOURS` | number | `24` | How often to run the audit |
### Premium Mode Behavior
When `TELEGRAM_PREMIUM=true`:
1. `MAX_PART_SIZE_MB` defaults to `3900` (safely under 4 GiB) instead of `1950`
2. Files under 4 GB: uploaded as-is (no splitting)
3. Files over 4 GB: split using existing `byteLevelSplit()` at the new threshold
4. Existing split/rejoin logic is **kept as fallback** — never removed
5. `isMultipart` and `partCount` continue to track actual upload state
### Implementation in `split.ts`:
```typescript
// Replace hardcoded constant with config-driven:
const MAX_PART_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n;
```
And in `config.ts`:
```typescript
maxPartSizeMB: parseInt(
process.env.MAX_PART_SIZE_MB ??
(process.env.TELEGRAM_PREMIUM === "true" ? "3900" : "1950"),
10
),
```
### Rollout Strategy
1. **All flags default to off** — zero behavior change on deploy
2. Enable `TELEGRAM_PREMIUM` first (simple, well-understood)
3. Enable `AUTO_GROUP_ENABLED` on a **per-channel basis** (see test plan) before globally
4. Enable `INTEGRITY_AUDIT_ENABLED` after manual validation
5. Pattern-based grouping enabled last (highest complexity)
---
## Deliverable 6: Test Plan
### Phase 0: Pre-Implementation Validation
Before touching any code, verify the current system baseline:
1. **Pick one test channel** with known content (a mix of albums, single files, and multipart archives)
2. Run an ingestion cycle and record: number of packages, groups, skipped
3. Verify all album-based groups are correct
4. Note any ungrouped files that "should" be grouped
5. This becomes the **regression baseline**
### Phase 1: Premium Mode Testing
1. Set `TELEGRAM_PREMIUM=true` and `MAX_PART_SIZE_MB=3900`
2. Manually upload a 3 GB test file to a source channel
3. Trigger ingestion — verify it uploads as a single message (not split)
4. Manually upload a 5 GB test file
5. Trigger ingestion — verify it splits at ~3.9 GB threshold
6. Verify `isMultipart`, `partCount`, `destMessageIds` are correct
7. Send the package via bot — verify all parts arrive
### Phase 2: Time-Window Grouping Testing
1. Enable `AUTO_GROUP_ENABLED=true` on the test channel only
2. Post 3 files to the channel within 2 minutes (no album)
3. Trigger ingestion — verify they auto-group
4. Post 2 files 10 minutes apart
5. Trigger ingestion — verify they stay ungrouped
6. Manually group them — verify `GroupingRule` is created
7. Post similar files — verify auto-grouping kicks in
### Phase 3: Manual QA via API
Add a **test endpoint** (dev-only) that accepts a fake message payload and runs it through the grouping pipeline without hitting Telegram:
```
POST /api/dev/test-grouping
Body: { messages: [...], channelId: "..." }
Response: { suggestedGroups: [...] }
```
This allows testing grouping logic against crafted scenarios without waiting for real Telegram messages.
### Phase 4: Integrity Audit Testing
1. Enable `INTEGRITY_AUDIT_ENABLED=true`
2. Manually corrupt a record (set wrong `contentHash` in DB)
3. Run audit — verify `HASH_MISMATCH` notification is created
4. Delete one `destMessageId` from a multipart package's `destMessageIds`
5. Run audit — verify `MISSING_PART` notification is created
6. Check notification UI shows both
### Regression Checks After Each Phase
- Re-run ingestion on test channel — same number of packages/groups as baseline
- Search for known filenames — still returns correct results
- Send a package via bot — still delivers correctly
- Album groups unchanged
- Manual groups unchanged

View File

@@ -0,0 +1,67 @@
# Grouping Phase 1: Foundation + Time-Window Grouping
> **For agentic workers:** Use superpowers:subagent-driven-development to implement this plan.
**Goal:** Add grouping infrastructure (schema, enums, notifications model), an ungrouped staging queue in the UI, and time-window auto-grouping as the first automatic signal beyond album grouping.
**Architecture:** Schema changes lay the foundation. Ungrouped tab is a query filter. Time-window grouping runs as a post-processing pass after album grouping in the worker pipeline.
**Tech Stack:** Prisma schema + migration, worker TypeScript, Next.js App Router.
---
## Task 1: Schema Migration
**Files:**
- Modify: `prisma/schema.prisma`
- Create: migration SQL
Add:
1. `GroupingSource` enum: `ALBUM`, `MANUAL`, `AUTO_TIME`, `AUTO_PATTERN`, `AUTO_REPLY`, `AUTO_ZIP`, `AUTO_CAPTION`
2. `groupingSource GroupingSource @default(MANUAL)` on `PackageGroup`
3. `SystemNotification` model with `type`, `severity`, `title`, `message`, `context` (Json), `isRead`
4. `NotificationType` enum: `HASH_MISMATCH`, `MISSING_PART`, `UPLOAD_FAILED`, `DOWNLOAD_FAILED`, `GROUPING_CONFLICT`, `INTEGRITY_AUDIT`
5. `NotificationSeverity` enum: `INFO`, `WARNING`, `ERROR`
Backfill: `UPDATE package_groups SET "groupingSource" = 'ALBUM' WHERE "mediaAlbumId" IS NOT NULL`
---
## Task 2: Ungrouped Staging Tab in STL Page
**Files:**
- Modify: `src/lib/telegram/queries.ts` — add `listUngroupedPackages()` query
- Modify: `src/app/(app)/stls/page.tsx` — add tab parameter support
- Modify: `src/app/(app)/stls/_components/stl-table.tsx` — add "Ungrouped" tab
Add a tab next to the existing "Skipped" tab that shows packages where `packageGroupId IS NULL`. Uses the existing `PackageListItem` type and table rendering. This gives users a clear view of files that need manual grouping.
---
## Task 3: Time-Window Auto-Grouping in Worker
**Files:**
- Create: `worker/src/grouping.ts` — add `processTimeWindowGroups()` after existing `processAlbumGroups()`
- Modify: `worker/src/worker.ts` — call time-window grouping after album grouping
- Modify: `worker/src/util/config.ts` — add `autoGroupTimeWindowMinutes` config
After album grouping completes, find remaining ungrouped packages from the same channel scan. Cluster packages whose `sourceMessageId` timestamps are within the configured window (default 5 minutes). Create groups for clusters of 2+ with `groupingSource = AUTO_TIME` and name derived from the common filename prefix or first file's base name.
---
## Task 4: Hash Verification After Split
**Files:**
- Modify: `worker/src/worker.ts` — add hash re-check after concat+split
- Modify: `worker/src/archive/hash.ts` — (no changes needed, reuse `hashParts`)
After `concatenateFiles()` + `byteLevelSplit()`, re-hash the split parts and compare to the original `contentHash`. If mismatch, log error and create a `SystemNotification` (once that table exists). This closes the integrity gap identified in the audit.
---
## Task 5: Build & Deploy
Rebuild worker and app images. Deploy. Verify:
- Worker logs show `maxPartSizeMB` and new `autoGroupTimeWindowMinutes` in config
- Ungrouped tab visible in STL page
- Previously-skipped large archives begin processing

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,184 @@
# Worker Improvements Design
**Date:** 2026-05-02
**Status:** Approved
**Scope:** Dragon's Stash Telegram ingestion worker
## Problem Statement
Three issues to address:
1. **Double-uploads**: The same archive occasionally appears twice in the destination Telegram channel. Root causes: (a) the worker crashes between `uploadToChannel()` confirming success and `createPackageWithFiles()` writing to the DB — no DB record means `recoverIncompleteUploads()` can't detect the orphaned Telegram message, and the next cycle re-uploads; (b) two accounts scanning the same source channel can both pass the hash dedup check before either creates a DB record, racing to upload the same file.
2. **Sequential account processing**: Both Telegram accounts are processed one after another via `withTdlibMutex`, even though TDLib fully supports multiple concurrent clients in the same process (each with separate `databaseDirectory` and `filesDirectory`). This halves throughput unnecessarily.
3. **Premium upload limit not used**: The Premium account can upload up to 4 GB per file, but `MAX_UPLOAD_SIZE` is hardcoded at ~1,950 MB. This causes unnecessary file splitting and expensive repack operations for files that could upload directly.
## Solution Overview
Three targeted changes, no architectural overhaul:
1. Two-phase DB write + hash advisory lock (fixes double-uploads)
2. Remove TDLib mutex from the scheduler loop (enables parallel accounts)
3. Per-account `maxUploadSize` from `getMe().is_premium` (enables 4 GB for Premium)
---
## Section 1: Double-Upload Fix
### 1a. Two-Phase DB Write
**Current flow:**
```
uploadToChannel() → preview download → metadata extraction → createPackageWithFiles()
```
If the worker crashes anywhere between upload confirmation and `createPackageWithFiles()`, no DB record exists. `recoverIncompleteUploads()` only checks packages with an existing `destMessageId` in the DB — it cannot find an orphaned Telegram message with no corresponding row.
**New flow:**
```
uploadToChannel()
→ createPackageStub() ← minimal record, destMessageId set immediately
→ preview download
→ metadata extraction
→ updatePackageWithMetadata() ← adds file list, preview, creator, tags
```
`createPackageStub()` writes: `contentHash`, `fileName`, `fileSize`, `archiveType`, `sourceChannelId`, `sourceMessageId`, `destChannelId`, `destMessageId`, `isMultipart`, `partCount`, `ingestionRunId`. File list and preview are left empty.
If the worker crashes after the stub is written:
- `recoverIncompleteUploads()` finds the record (has `destMessageId`), verifies the Telegram message exists, keeps it.
- Next cycle: `packageExistsByHash()` returns true → skips re-upload.
- The stub has `fileCount = 0` and no file listing. The UI shows "metadata pending" rather than failing silently.
Stubs with `fileCount = 0` are valid deliverable packages (the bot can still send the file). Backfilling metadata on stubs is out of scope for this change — the crash case is rare and the stub is functional.
### 1b. Hash Advisory Lock
**The race (two accounts, shared source channel):**
```
Worker A: packageExistsByHash(X) → false (no record yet)
Worker B: packageExistsByHash(X) → false (no record yet)
Worker A: uploads file → destMessageId_A
Worker B: uploads file → destMessageId_B ← duplicate Telegram message
Worker A: createPackageStub() → succeeds (contentHash @unique satisfied)
Worker B: createPackageStub() → fails unique constraint on contentHash
```
Result: two Telegram messages, one DB record. Worker B's upload is wasted.
**Fix:** Before calling `uploadToChannel()`, acquire a PostgreSQL session advisory lock keyed on the content hash:
```sql
SELECT pg_try_advisory_lock(hash_bigint)
```
Where `hash_bigint` is the first 8 bytes of the SHA-256 content hash interpreted as a signed bigint.
- `pg_try_advisory_lock` is non-blocking. If another worker holds the lock (same file, shared channel), return `false` → treat as duplicate, skip.
- After acquiring the lock, **re-run `packageExistsByHash()`** before uploading. This catches the case where another worker finished and released the lock between the first check and this one — without the re-check, the current worker would proceed to re-upload.
- The lock is session-scoped: released automatically on DB session end. No manual cleanup needed on crash.
- The lock is released explicitly after `createPackageStub()` completes (or on any error path).
**Implementation location:** New helper `tryAcquireHashLock(contentHash)` / `releaseHashLock(contentHash)` in `worker/src/db/locks.ts`, reusing the existing DB client pattern.
---
## Section 2: Parallel Account Processing
### Current Constraint
`withTdlibMutex` in `scheduler.ts` serializes all TDLib operations across accounts. This was a conservative guard, but TDLib explicitly supports multiple concurrent clients in the same process provided each has its own `databaseDirectory` and `filesDirectory`.
The codebase already satisfies this requirement:
```typescript
// worker/src/tdlib/client.ts
const dbPath = path.join(config.tdlibStateDir, account.id);
const client = createClient({
databaseDirectory: dbPath,
filesDirectory: path.join(dbPath, "files"),
});
```
Each account gets `<TDLIB_STATE_DIR>/<account.id>/` — fully isolated.
### Change
Replace the sequential `for` loop in `scheduler.ts` with `Promise.allSettled()`:
```typescript
// Before
for (const account of accounts) {
await withTdlibMutex(`ingest:${account.phone}`, () => runWorkerForAccount(account));
}
// After
await Promise.allSettled(accounts.map((account) => runWorkerForAccount(account)));
```
The per-account PostgreSQL advisory lock in `db/locks.ts` already prevents any account from being processed twice simultaneously. `Promise.allSettled()` ensures one account's failure doesn't abort the other.
The `withTdlibMutex` wrapper can be removed from the ingest path entirely. The auth path (`authenticateAccount`) should also be run in parallel but may remain guarded if TDLib auth flows have ordering dependencies — verify during implementation.
**No Docker Compose changes needed.** Both accounts run in the same container.
### Speed Limit Notifications
TDLib fires `updateSpeedLimitNotification` when an account's upload or download speed is throttled (non-Premium accounts). Log this event at `warn` level in the client update handler so it's visible in logs without being actionable.
---
## Section 3: Per-Account Premium Upload Limit
### Premium Detection
After successful authentication, call `getMe()` and read `is_premium: bool` from the returned `user` object. Store this on `TelegramAccount.isPremium` (new boolean field, default `false`, updated on each successful auth).
```typescript
const me = await client.invoke({ _: 'getMe' }) as { is_premium?: boolean };
await updateAccountPremiumStatus(account.id, me.is_premium ?? false);
```
### Upload Size Limits
| Account type | `maxUploadSize` | Effect |
|---|---|---|
| Premium | 3,950 MB | Parts ≤ 3.95 GB upload as-is; repack only for parts >3.95 GB (extremely rare) |
| Non-Premium | 1,950 MB | Current behavior unchanged |
Pass `maxUploadSize` into `processOneArchiveSet()` as a parameter (currently hardcoded as `MAX_UPLOAD_SIZE` at `worker.ts:1023` and in `archive/split.ts`).
The `hasOversizedPart` check and `byteLevelSplit` call both use this value, so the repack step is effectively eliminated for Premium accounts in practice — no separate "skip repack" flag needed.
### Migration
```prisma
model TelegramAccount {
// ... existing fields
isPremium Boolean @default(false)
}
```
One migration, one new query `updateAccountPremiumStatus(accountId, isPremium)`.
---
## Files to Change
| File | Change |
|---|---|
| `prisma/schema.prisma` | Add `isPremium Boolean @default(false)` to `TelegramAccount` |
| `worker/src/db/queries.ts` | Add `updateAccountPremiumStatus()`, `createPackageStub()`, `updatePackageWithMetadata()` |
| `worker/src/db/locks.ts` | Add `tryAcquireHashLock()`, `releaseHashLock()` |
| `worker/src/tdlib/client.ts` | Call `getMe()` after auth, return `isPremium` from `createTdlibClient()` |
| `worker/src/worker.ts` | Two-phase write, hash lock acquire/release, pass `maxUploadSize` per account |
| `worker/src/archive/split.ts` | Accept `maxPartSize` parameter instead of hardcoded constant |
| `worker/src/scheduler.ts` | Replace sequential loop with `Promise.allSettled()`, remove `withTdlibMutex` from ingest path |
---
## What Is Explicitly Out of Scope
- Backfilling metadata on stub records (rare crash case, functional without it)
- Download pre-fetching / pipeline parallelism within one account
- Two separate worker containers (single container is sufficient)
- Bot or app changes (worker-only)

View File

@@ -0,0 +1,353 @@
# Channel-Scan Skip Optimization — Design
**Goal:** stop the worker from re-scanning channels and forum topics that haven't changed since the last scan, especially on restart. Reduce the per-cycle API call count for the Model Printing Emporium channel (1,086 forum topics) from ~1,000+ to ~50.
**Non-goals:**
- Replacing polling with event-driven ingestion (`updateNewMessage`). That's a separate, larger design (Phase 2 in the original brainstorm).
- Surfacing per-channel scan history in the UI (also a separate, observability-only design).
**Architecture sketch:**
Add three persisted columns to `AccountChannelMap` and `TopicProgress`, plus one runtime `getChat`/`getForumTopicInfo` lookup before each scan. The new state survives restarts because it's in PostgreSQL; the lookup is a cheap TDLib local-cache call. Failure-retry semantics (`d99a506` + `901f32f`) must be preserved — a channel sitting on retryable `SkippedPackage` rows is never considered idle.
---
## Problem statement
### Today's behavior
Every ingestion cycle the worker walks every linked source channel for every authenticated account. For each channel/topic it calls TDLib's `searchChatMessages` paginated from `lastProcessedMessageId`. Even when nothing has changed since the previous scan:
- One `searchChatMessages` call (sometimes paginated) is still made
- For Model Printing Emporium, that's ~1,086 calls per cycle (one per forum topic)
- The 1-second `apiDelayMs` between pages multiplies the cost
- Most calls return zero new messages — the work is wasted
The cost is most acute right after a restart: the worker boots, runs recovery, then issues 1,000+ effectively-empty calls before any productive work happens.
### What we already track
- `AccountChannelMap.lastProcessedMessageId` — highest processed message ID (per non-forum channel, per account)
- `TopicProgress.lastProcessedMessageId` — same per forum topic
- Both are advanced incrementally per archive set (`77aeb4c`)
- Both are pulled back below failed messages by the `SkippedPackage` retry pass (`901f32f`)
### What we don't track and want to add
- When was the last scan?
- Did the last scan find any archives, OR is there outstanding retry work?
- How many cycles in a row have been totally idle?
These let us skip the scan entirely when nothing has changed.
---
## High-level approach
Three guards at the top of the per-channel and per-topic processing loops:
1. **DB-persistent "skip if recently scanned and truly idle"** — checks `lastScannedAt`, `lastScanFoundArchives`, and a `retryableSkippedCount` query. If all three say "nothing new, nothing failing", skip without any TDLib call.
2. **Adaptive backoff for cold channels**`consecutiveEmptyScans` counter. After it crosses a threshold, scan only every Nth cycle. Reset to 0 whenever the channel is "not idle".
3. **`chat.last_message.id` short-circuit** — if (1) and (2) don't skip but the channel's last server-side message ID matches our watermark, skip the `searchChatMessages` paginated call. This runs after the existing `SkippedPackage` retry pass, which pulls the watermark back below failures, so it correctly forces a scan when retries are pending.
The retry pass from `901f32f` is preserved untouched — it runs in front of these guards and adjusts the watermark, so retries always happen.
---
## Schema changes
### `AccountChannelMap` (worker/src/db/schema.prisma)
```prisma
model AccountChannelMap {
// ... existing fields ...
lastScannedAt DateTime?
lastScanFoundArchives Boolean @default(false)
consecutiveEmptyScans Int @default(0)
}
```
### `TopicProgress`
```prisma
model TopicProgress {
// ... existing fields ...
lastScannedAt DateTime?
lastScanFoundArchives Boolean @default(false)
consecutiveEmptyScans Int @default(0)
}
```
### Migration
```sql
-- Both tables get the same three columns. Existing rows get defaults:
-- lastScannedAt = NULL (next scan will populate)
-- lastScanFoundArchives = false (safe default — will be overwritten by next scan)
-- consecutiveEmptyScans = 0 (resets backoff for existing channels)
ALTER TABLE "account_channel_map"
ADD COLUMN "lastScannedAt" TIMESTAMP(3),
ADD COLUMN "lastScanFoundArchives" BOOLEAN NOT NULL DEFAULT false,
ADD COLUMN "consecutiveEmptyScans" INTEGER NOT NULL DEFAULT 0;
ALTER TABLE "topic_progress"
ADD COLUMN "lastScannedAt" TIMESTAMP(3),
ADD COLUMN "lastScanFoundArchives" BOOLEAN NOT NULL DEFAULT false,
ADD COLUMN "consecutiveEmptyScans" INTEGER NOT NULL DEFAULT 0;
```
NULL `lastScannedAt` means "never scanned" — every channel will be scanned the first cycle after deploy. Subsequent cycles benefit from the new fields.
---
## Configuration
Two new env vars in `worker/src/util/config.ts`:
```typescript
/** Window in which a recent successful empty scan lets us skip. Default 5 min. */
skipRecentScanWindowMs:
parseInt(process.env.WORKER_SKIP_RECENT_SCAN_WINDOW_MS ?? "300000", 10),
/** After this many consecutive empty scans, channel enters backoff mode. */
emptyScanBackoffThreshold:
parseInt(process.env.WORKER_EMPTY_SCAN_BACKOFF_THRESHOLD ?? "5", 10),
/** Backoff factor — N means "scan every Nth cycle once in backoff". */
emptyScanBackoffEveryNth:
parseInt(process.env.WORKER_EMPTY_SCAN_BACKOFF_EVERY_NTH ?? "5", 10),
```
All three are tunable per deployment without code changes.
---
## Decision logic per channel / topic
The skip decision sits at the top of each channel/topic iteration in `runWorkerForAccount`. It runs BEFORE the existing `SkippedPackage` retry pass.
```text
For each channel (or topic):
1. Query retryableSkippedCount for this scope (already a query we do elsewhere)
2. If retryableSkippedCount > 0:
Force scan (don't skip — failures need retry)
Proceed to existing flow (retry pass → scan)
3. Else if lastScannedAt is NULL:
Force scan (we've never touched this)
Proceed to existing flow
4. Else if Date.now() - lastScannedAt.getTime() < skipRecentScanWindowMs
AND lastScanFoundArchives === false:
Skip — recently scanned and truly idle
5. Else if consecutiveEmptyScans >= emptyScanBackoffThreshold
AND (cycleCount % emptyScanBackoffEveryNth !== 0):
Skip — channel is cold, not its turn to scan
6. Else:
Run the existing flow:
a. SkippedPackage retry pass (901f32f) — may pull watermark back
b. NEW: getChat (or getForumTopicInfo) — if last_message.id <= watermark, skip
c. searchChatMessages scan
```
`cycleCount` is the global ingestion-cycle counter from `scheduler.ts`. It already increments per cycle.
---
## End-of-scan bookkeeping
After every scan (whether it found archives or not), update the three new fields atomically with the existing watermark write:
```typescript
// "Truly idle" means: nothing new this scan AND nothing failed AND no leftover
// retryable failures. The retry-pending check is critical — without it, a
// scan that found no new archives but left SkippedPackage retries pending
// would be marked idle and incorrectly skipped next cycle.
const retryablePending = await getRetryableSkippedMessageIds({
accountId, sourceChannelId, topicId, cap: maxSkipAttempts,
});
const trulyIdle =
scanResult.archives.length === 0
&& minFailedId === null
&& retryablePending.length === 0;
const newConsecutiveEmpty = trulyIdle
? (prev.consecutiveEmptyScans ?? 0) + 1
: 0;
await upsertChannelOrTopicScanState({
// ... existing watermark fields ...
lastScannedAt: new Date(),
lastScanFoundArchives: !trulyIdle,
consecutiveEmptyScans: newConsecutiveEmpty,
});
```
The `consecutiveEmptyScans` counter resets to 0 the moment *anything* happens — archives found, archives failed, or unresolved retries pending. A channel with a chronically-failing archive (whose attemptCount is still below the cap) keeps the counter at 0 and never enters backoff.
If a SkippedPackage hits `attemptCount === maxSkipAttempts`, it's no longer "retryable pending" (it's been given up on), so the counter increments correctly. Same for SkippedPackages that get deleted via the UI's "retry" button — the counter behaves correctly without special-casing.
---
## `getChat` / `getForumTopicInfo` short-circuit
After the retry pass has finalized the effective watermark, but before `searchChatMessages`:
```typescript
// For non-forum channels:
const chat = await client.invoke({ _: "getChat", chat_id: Number(channel.telegramId) });
const channelLastMessageId = chat.last_message?.id;
if (channelLastMessageId && BigInt(channelLastMessageId) <= effectiveWatermark) {
// Nothing new server-side — skip the paginated search entirely.
// Still update lastScannedAt / consecutiveEmptyScans so the recent-scan
// skip kicks in next cycle.
await persistScanState({ trulyIdle: true });
continue;
}
// For forum topics:
const topicInfo = await client.invoke({
_: "getForumTopicInfo",
chat_id: Number(channel.telegramId),
message_thread_id: Number(topic.topicId),
});
const topicLastMessageId = topicInfo.info?.last_message_id;
if (topicLastMessageId && BigInt(topicLastMessageId) <= effectiveWatermark) {
await persistScanState({ trulyIdle: true });
continue;
}
```
`getChat` is served from TDLib's local cache (no network) for chats we've already loaded, which we do up front via `loadChats`. `getForumTopicInfo` is a single round-trip but much cheaper than a paginated `searchChatMessages` call.
The comparison is `<=` because the watermark is the highest message we've fully processed — if the server's last is the same, we're caught up.
This step is correct in the failure-retry case because the retry pass runs FIRST: if there were retryable failures, the retry pass pulled the watermark back below them, and `channelLastMessageId > effectiveWatermark` (since the failed message exists in TG), so we don't skip — we scan and re-pick-up the failure.
---
## Restart behavior
The improvements compose for restart safety:
| Scenario | Today | After this change |
|---|---|---|
| Restart 5 min after a clean cycle | ~2,000 API calls for MPE | ~10 calls (only retryable + truly-active topics) |
| Restart 1 hour later (one missed cycle) | ~2,000 API calls | `getChat` per channel + scan only those where `last_message.id > watermark` (≈ 50 for MPE) |
| Restart after long downtime (12h) | ~2,000 calls + lots of new content | `getChat` per channel, scan everything with new activity |
The three new columns are in PostgreSQL — they survive container restarts directly. `consecutiveEmptyScans = 47` for a cold topic stays at 47 across restart, so backoff continues to apply.
---
## Edge cases and their handling
### 1. Manual SkippedPackage retry via UI between cycles
The UI's `retrySkippedPackageAction` lowers the watermark and deletes the SkippedPackage. Next cycle: `retryableSkippedCount === 0` (the row is gone), but the watermark is lower than `chat.last_message.id` (the retried message exists in TG). So step 6 in the decision tree triggers a scan via the `getChat` check. ✓
### 2. SkippedPackage hits the attempt cap mid-cycle
Once `attemptCount === maxSkipAttempts`, the row is no longer in `getRetryableSkippedMessageIds` results. The channel correctly becomes idle-eligible. The capped SkippedPackage stays in the table as "permanently failed (manual retry only)" — that's the existing behavior. ✓
### 3. New SkippedPackage is created mid-cycle (e.g., an upload fails)
At the end of that scan, `retryablePending` includes the new row → `trulyIdle = false``lastScanFoundArchives = true` → next cycle does NOT skip. ✓
### 4. Channel/topic added after deploy
New rows in `AccountChannelMap` / `TopicProgress` have `lastScannedAt = NULL`, so step 3 in the decision tree always triggers a scan. After the first scan, the fields are populated normally. ✓
### 5. Clock skew / drift
The `lastScannedAt < 5 min ago` check uses `Date.now() - lastScannedAt.getTime()`. Both are application-side clocks (Node.js + PostgreSQL `NOW()` at write). A few seconds of drift doesn't matter; an hour of clock jump (rare but possible) just means one cycle either skips or re-scans — recoverable.
### 6. TDLib `getChat` returns stale data
TDLib's local cache could theoretically be stale (e.g., the account hasn't received the latest update yet). If `channelLastMessageId` is stale (lower than server reality), we'd skip a scan that should have happened. Mitigation: the next cycle's `getChat` likely has fresh data; the watermark guards correctness (we don't lose data, we just process it one cycle later). Acceptable.
### 7. `getForumTopicInfo` rate limit
Calling it per-topic could add up for channels with 1000+ topics. Mitigation: skip-on-recent-scan (step 4) eliminates the call for most topics; only "stale-but-was-active" topics get the call. Worst case is ~50 calls per cycle for MPE, comfortably under the 30 req/sec global limit.
### 8. Channel becomes a forum (or vice versa) between cycles
Existing code handles this — `isChatForum` is rechecked each cycle and `setChannelForum` updates the DB. The new fields live on the same rows, so no extra handling needed.
---
## File-level changes
### New / modified
- `prisma/schema.prisma` — add the six new columns
- `prisma/migrations/<timestamp>_channel_scan_state/migration.sql` — the ALTER TABLE
- `worker/src/util/config.ts` — three new env vars
- `worker/src/db/queries.ts` — new helpers:
- `getChannelScanState(mappingId)` and `getTopicScanState(topicProgressId)`
- `upsertChannelScanState(...)` and `upsertTopicScanState(...)`
- Both wrap the existing `updateLastProcessedMessage` / `upsertTopicProgress` so callers don't need to remember to update the new fields too.
- `worker/src/worker.ts` — top-of-loop skip checks in both the forum and non-forum branches, plus end-of-scan state writes
- `worker/src/tdlib/chats.ts` — small helper `getChatLastMessageId(client, chatId)` and `getForumTopicLastMessageId(client, chatId, topicId)` wrapping the TDLib calls with the existing `invokeWithTimeout` pattern
### Untouched
- `recovery.ts` — recovery is per-startup and one-shot; not affected
- `scheduler.ts``cycleCount` is already there; just expose it where needed
- The existing `SkippedPackage` retry pass logic in `runWorkerForAccount` is unchanged
---
## Testing plan
The project has no automated tests, so verification is manual via Docker logs after deploy:
1. **Build cleanly:** `docker compose up -d --build worker` — no migration errors
2. **First cycle after deploy:** all channels scan (NULL `lastScannedAt`), all fields populated at end of cycle. Log lines confirm normal scan flow.
3. **Second cycle 5 min later:**
- Check logs for `"Skipping recently-scanned idle channel"` — should appear for any channel/topic that was empty last cycle
- Total `searchChatMessages` calls per cycle should drop dramatically (compare to first cycle)
4. **Failure-retry preservation:**
- Find a SkippedPackage with `attemptCount < cap`
- Run a cycle — confirm the channel/topic is NOT skipped (log says it's scanned)
- Confirm the SkippedPackage gets re-tried
5. **Backoff:**
- Pick a cold channel, wait for it to scan 5+ cycles cleanly
- Confirm `consecutiveEmptyScans` climbs to 5+
- Confirm subsequent cycles skip it (only scan every 5th)
6. **`getChat` short-circuit:**
- Pick an active channel
- Trigger an immediate cycle (UI button)
- If `last_message.id <= watermark`, expect log `"Channel caught up via getChat — skipping searchChatMessages"`
7. **Restart safety:**
- Push the change, restart worker
- First cycle after restart should log multiple "Skipping recently-scanned idle channel" lines (because the DB state survived)
- Total cycle time should be a fraction of a baseline restart
---
## Risks and mitigations
| Risk | Mitigation |
|---|---|
| Skip incorrectly applied → real failures never retried | Rule 1 (truly-idle includes `retryablePending === 0`) + dedicated test step 4 |
| `getChat` returns stale data | Next cycle's `getChat` corrects it; watermark guards correctness (no data loss) |
| `getForumTopicInfo` not available in TDLib 1.8.64 | Verify the method exists in the schema; fall back to scan if it throws |
| Backoff applies during legitimate activity bursts | Counter resets to 0 the moment any archive is found OR any retry is pending |
| Migration takes too long on the live DB | Both columns have NOT NULL defaults — Postgres can add them as fast metadata changes (no table rewrite) |
---
## What's explicitly NOT in this design
To keep scope tight:
- **Event-driven ingestion via `updateNewMessage`.** Bigger design, addressed separately. This design is compatible with it — when (D) lands, polling becomes a 4-hour safety net using these same skip rules.
- **Per-channel scan history UI.** Observability layer; separate design.
- **Surfacing the new counters in the admin dashboard.** Can come after the worker-side change is verified.
- **Backfilling `consecutiveEmptyScans` from historical `IngestionRun` data.** Not worth it — it'll converge to the correct value within ~6 cycles.
---
## Open questions
None — the failure-retry interaction was the main risk and is handled by Rule 1 + the existing retry pass.

View File

@@ -0,0 +1,7 @@
-- AlterTable
ALTER TABLE "packages" ADD COLUMN "destMessageIds" BIGINT[] DEFAULT ARRAY[]::BIGINT[];
-- Backfill: copy existing destMessageId into the array
UPDATE "packages"
SET "destMessageIds" = ARRAY["destMessageId"]
WHERE "destMessageId" IS NOT NULL;

View File

@@ -0,0 +1,32 @@
-- CreateEnum GroupingSource
CREATE TYPE "GroupingSource" AS ENUM ('ALBUM', 'MANUAL', 'AUTO_TIME', 'AUTO_PATTERN', 'AUTO_REPLY', 'AUTO_ZIP', 'AUTO_CAPTION');
-- CreateEnum NotificationType
CREATE TYPE "NotificationType" AS ENUM ('HASH_MISMATCH', 'MISSING_PART', 'UPLOAD_FAILED', 'DOWNLOAD_FAILED', 'GROUPING_CONFLICT', 'INTEGRITY_AUDIT');
-- CreateEnum NotificationSeverity
CREATE TYPE "NotificationSeverity" AS ENUM ('INFO', 'WARNING', 'ERROR');
-- AlterTable: add groupingSource to package_groups
ALTER TABLE "package_groups" ADD COLUMN "groupingSource" "GroupingSource" NOT NULL DEFAULT 'MANUAL';
-- Backfill: mark album-based groups
UPDATE "package_groups" SET "groupingSource" = 'ALBUM' WHERE "mediaAlbumId" IS NOT NULL;
-- CreateTable: system_notifications
CREATE TABLE "system_notifications" (
"id" TEXT NOT NULL,
"type" "NotificationType" NOT NULL,
"severity" "NotificationSeverity" NOT NULL DEFAULT 'INFO',
"title" TEXT NOT NULL,
"message" TEXT NOT NULL,
"context" JSONB,
"isRead" BOOLEAN NOT NULL DEFAULT false,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT "system_notifications_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE INDEX "system_notifications_isRead_createdAt_idx" ON "system_notifications"("isRead", "createdAt");
CREATE INDEX "system_notifications_type_idx" ON "system_notifications"("type");

View File

@@ -0,0 +1,3 @@
-- AlterTable: add sourceCaption and replyToMessageId to packages
ALTER TABLE "packages" ADD COLUMN "sourceCaption" TEXT;
ALTER TABLE "packages" ADD COLUMN "replyToMessageId" BIGINT;

View File

@@ -0,0 +1,47 @@
-- AlterTable: add autoGroupEnabled to telegram_channels
ALTER TABLE "telegram_channels" ADD COLUMN "autoGroupEnabled" BOOLEAN NOT NULL DEFAULT true;
-- CreateTable: grouping_rules
CREATE TABLE "grouping_rules" (
"id" TEXT NOT NULL,
"sourceChannelId" TEXT NOT NULL,
"pattern" TEXT NOT NULL,
"signalType" "GroupingSource" NOT NULL,
"confidence" DOUBLE PRECISION NOT NULL DEFAULT 1.0,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"createdByGroupId" TEXT,
CONSTRAINT "grouping_rules_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE INDEX "grouping_rules_sourceChannelId_idx" ON "grouping_rules"("sourceChannelId");
-- AddForeignKey
ALTER TABLE "grouping_rules" ADD CONSTRAINT "grouping_rules_sourceChannelId_fkey" FOREIGN KEY ("sourceChannelId") REFERENCES "telegram_channels"("id") ON DELETE CASCADE ON UPDATE CASCADE;
-- Full-text search: add tsvector column and GIN index
ALTER TABLE "packages" ADD COLUMN IF NOT EXISTS "searchVector" tsvector;
UPDATE "packages" SET "searchVector" = to_tsvector('english',
coalesce("fileName", '') || ' ' || coalesce("creator", '') || ' ' || coalesce("sourceCaption", '')
) WHERE "searchVector" IS NULL;
CREATE INDEX IF NOT EXISTS "packages_search_vector_idx" ON "packages" USING GIN ("searchVector");
-- Trigger to auto-update searchVector on insert/update
CREATE OR REPLACE FUNCTION packages_search_vector_update() RETURNS trigger AS $$
BEGIN
NEW."searchVector" := to_tsvector('english',
coalesce(NEW."fileName", '') || ' ' || coalesce(NEW."creator", '') || ' ' || coalesce(NEW."sourceCaption", '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS packages_search_vector_trigger ON "packages";
CREATE TRIGGER packages_search_vector_trigger
BEFORE INSERT OR UPDATE OF "fileName", "creator", "sourceCaption"
ON "packages"
FOR EACH ROW
EXECUTE FUNCTION packages_search_vector_update();

View File

@@ -0,0 +1,30 @@
-- CreateEnum
CREATE TYPE "ManualUploadStatus" AS ENUM ('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED');
-- CreateTable
CREATE TABLE "manual_uploads" (
"id" TEXT NOT NULL,
"status" "ManualUploadStatus" NOT NULL DEFAULT 'PENDING',
"groupName" TEXT,
"userId" TEXT NOT NULL,
"errorMessage" TEXT,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"completedAt" TIMESTAMP(3),
CONSTRAINT "manual_uploads_pkey" PRIMARY KEY ("id")
);
CREATE TABLE "manual_upload_files" (
"id" TEXT NOT NULL,
"uploadId" TEXT NOT NULL,
"fileName" TEXT NOT NULL,
"filePath" TEXT NOT NULL,
"fileSize" BIGINT NOT NULL,
"packageId" TEXT,
CONSTRAINT "manual_upload_files_pkey" PRIMARY KEY ("id")
);
CREATE INDEX "manual_uploads_status_idx" ON "manual_uploads"("status");
CREATE INDEX "manual_upload_files_uploadId_idx" ON "manual_upload_files"("uploadId");
ALTER TABLE "manual_uploads" ADD CONSTRAINT "manual_uploads_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE RESTRICT ON UPDATE CASCADE;
ALTER TABLE "manual_upload_files" ADD CONSTRAINT "manual_upload_files_uploadId_fkey" FOREIGN KEY ("uploadId") REFERENCES "manual_uploads"("id") ON DELETE CASCADE ON UPDATE CASCADE;

View File

@@ -0,0 +1,2 @@
-- AlterTable
ALTER TABLE "telegram_accounts" ADD COLUMN "isPremium" BOOLEAN NOT NULL DEFAULT false;

View File

@@ -0,0 +1,8 @@
-- AlterTable: track how many times the worker has tried each skipped source message.
-- Existing rows default to 1 (they represent a single past attempt that the worker
-- chose to record). Future failures increment via upsertSkippedPackage.
ALTER TABLE "skipped_packages" ADD COLUMN "attemptCount" INTEGER NOT NULL DEFAULT 1;
-- AlterEnum: add CHANNEL_ACCESS_LOST so the worker can surface a notification
-- when a source channel becomes inaccessible (account removed, channel deleted, etc.)
ALTER TYPE "NotificationType" ADD VALUE 'CHANNEL_ACCESS_LOST';

View File

@@ -0,0 +1,10 @@
-- AlterTable: capture TDLib's stable per-content identifier for new packages.
-- Existing rows are NULL; they fall through to the other dedup checks until
-- they're re-encountered organically.
ALTER TABLE "packages" ADD COLUMN "remoteUniqueId" TEXT;
-- CreateIndex: scoped to source channel because we want to dedup
-- per-channel (the same file appearing in two different channels is still
-- worth indexing twice — they're different ingestion sources).
CREATE INDEX "packages_sourceChannelId_remoteUniqueId_idx"
ON "packages"("sourceChannelId", "remoteUniqueId");

View File

@@ -0,0 +1,11 @@
-- AlterTable: per-channel scan-state columns
ALTER TABLE "account_channel_map"
ADD COLUMN "lastScannedAt" TIMESTAMP(3),
ADD COLUMN "lastScanFoundArchives" BOOLEAN NOT NULL DEFAULT false,
ADD COLUMN "consecutiveEmptyScans" INTEGER NOT NULL DEFAULT 0;
-- AlterTable: per-topic scan-state columns (forum channels)
ALTER TABLE "topic_progress"
ADD COLUMN "lastScannedAt" TIMESTAMP(3),
ADD COLUMN "lastScanFoundArchives" BOOLEAN NOT NULL DEFAULT false,
ADD COLUMN "consecutiveEmptyScans" INTEGER NOT NULL DEFAULT 0;

View File

@@ -42,6 +42,7 @@ model User {
inviteCodes InviteCode[] @relation("InviteCreator")
usedInvite InviteCode? @relation("InviteUser", fields: [usedInviteId], references: [id], onDelete: SetNull)
usedInviteId String?
manualUploads ManualUpload[]
}
model Account {
@@ -405,6 +406,7 @@ model TelegramAccount {
isActive Boolean @default(true)
authState AuthState @default(PENDING)
authCode String?
isPremium Boolean @default(false)
lastSeenAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@ -429,10 +431,13 @@ model TelegramChannel {
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
autoGroupEnabled Boolean @default(true)
accountMaps AccountChannelMap[]
packages Package[]
skippedPackages SkippedPackage[]
packageGroups PackageGroup[]
groupingRules GroupingRule[]
@@index([type, isActive])
@@index([category])
@@ -445,6 +450,17 @@ model AccountChannelMap {
channelId String
role ChannelRole @default(READER)
lastProcessedMessageId BigInt?
/// When this channel was last scanned (any reason, including skipped scans
/// that bumped the timestamp). Used by the recency-skip guard.
lastScannedAt DateTime?
/// True if the last scan found archives OR left retryable SkippedPackages
/// pending. Tracks "this channel has work I might need to revisit" — not
/// just "I uploaded something this cycle".
lastScanFoundArchives Boolean @default(false)
/// Number of consecutive cycles where this channel was trulyIdle (no
/// archives, no failures, no retryables). Drives the backoff that lets
/// cold channels skip cycles entirely.
consecutiveEmptyScans Int @default(0)
createdAt DateTime @default(now())
account TelegramAccount @relation(fields: [accountId], references: [id], onDelete: Cascade)
@@ -467,12 +483,20 @@ model Package {
sourceChannelId String
sourceMessageId BigInt
sourceTopicId BigInt?
/// TDLib's `remote.unique_id` for the FIRST part's file. Stable across
/// reposts of identical content in the same channel — used as the
/// strongest pre-download dedup signal (no false positives unlike
/// fileName + size matching).
remoteUniqueId String?
destChannelId String?
destMessageId BigInt?
destMessageIds BigInt[] @default([])
isMultipart Boolean @default(false)
partCount Int @default(1)
fileCount Int @default(0)
tags String[] @default([])
sourceCaption String? // Caption text from source Telegram message
replyToMessageId BigInt? // reply_to_message_id from source message (for reply chain grouping)
previewData Bytes? // JPEG thumbnail from nearby Telegram photo (stored as raw bytes)
previewMsgId BigInt? // Telegram message ID of the matched photo
packageGroupId String?
@@ -495,6 +519,7 @@ model Package {
@@index([archiveType])
@@index([creator])
@@index([packageGroupId])
@@index([sourceChannelId, remoteUniqueId])
@@map("packages")
}
@@ -521,6 +546,7 @@ model PackageGroup {
name String
mediaAlbumId String?
sourceChannelId String
groupingSource GroupingSource @default(MANUAL)
previewData Bytes?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@ -572,6 +598,14 @@ model TopicProgress {
topicId BigInt
topicName String?
lastProcessedMessageId BigInt?
/// When this topic was last scanned (any reason). Used by recency-skip.
lastScannedAt DateTime?
/// True if the last scan found archives OR has retryable SkippedPackages
/// pending for this topic. See AccountChannelMap doc for details.
lastScanFoundArchives Boolean @default(false)
/// Number of consecutive cycles where this topic was trulyIdle. Drives
/// backoff for cold topics.
consecutiveEmptyScans Int @default(0)
accountChannelMap AccountChannelMap @relation(fields: [accountChannelMapId], references: [id], onDelete: Cascade)
@@ -732,6 +766,12 @@ model SkippedPackage {
sourceTopicId BigInt?
isMultipart Boolean @default(false)
partCount Int @default(1)
/// How many times the worker has tried to process this source message.
/// The worker auto-retries failures across cycles up to a configurable cap
/// (WORKER_MAX_SKIP_ATTEMPTS, default 5). After the cap, the watermark is
/// allowed to advance past the failure so cycles aren't pinned forever;
/// the user can manually retry via the UI to reset and try again.
attemptCount Int @default(1)
accountId String
account TelegramAccount @relation(fields: [accountId], references: [id], onDelete: Cascade)
createdAt DateTime @default(now())
@@ -801,3 +841,98 @@ model KickstarterPackage {
@@id([kickstarterId, packageId])
@@map("kickstarter_packages")
}
// ── Grouping & Notifications ──
enum GroupingSource {
ALBUM
MANUAL
AUTO_TIME
AUTO_PATTERN
AUTO_REPLY
AUTO_ZIP
AUTO_CAPTION
}
enum NotificationType {
HASH_MISMATCH
MISSING_PART
UPLOAD_FAILED
DOWNLOAD_FAILED
GROUPING_CONFLICT
INTEGRITY_AUDIT
CHANNEL_ACCESS_LOST
}
enum NotificationSeverity {
INFO
WARNING
ERROR
}
model SystemNotification {
id String @id @default(cuid())
type NotificationType
severity NotificationSeverity @default(INFO)
title String
message String
context Json?
isRead Boolean @default(false)
createdAt DateTime @default(now())
@@index([isRead, createdAt])
@@index([type])
@@map("system_notifications")
}
model GroupingRule {
id String @id @default(cuid())
sourceChannelId String
pattern String // Regex or keyword pattern learned from manual grouping
signalType GroupingSource // Which grouping signal this rule applies to
confidence Float @default(1.0)
createdAt DateTime @default(now())
createdByGroupId String? // The manual group that spawned this rule
sourceChannel TelegramChannel @relation(fields: [sourceChannelId], references: [id], onDelete: Cascade)
@@index([sourceChannelId])
@@map("grouping_rules")
}
enum ManualUploadStatus {
PENDING
PROCESSING
COMPLETED
FAILED
}
model ManualUpload {
id String @id @default(cuid())
status ManualUploadStatus @default(PENDING)
groupName String? // Group name if multiple files
userId String
errorMessage String?
createdAt DateTime @default(now())
completedAt DateTime?
files ManualUploadFile[]
user User @relation(fields: [userId], references: [id])
@@index([status])
@@map("manual_uploads")
}
model ManualUploadFile {
id String @id @default(cuid())
uploadId String
fileName String
filePath String // Path on shared volume
fileSize BigInt
packageId String? // Set after processing
upload ManualUpload @relation(fields: [uploadId], references: [id], onDelete: Cascade)
@@index([uploadId])
@@map("manual_upload_files")
}

View File

@@ -1,7 +1,7 @@
"use client";
import { type ColumnDef } from "@tanstack/react-table";
import { MoreHorizontal, Pencil, Trash2, ExternalLink } from "lucide-react";
import { MoreHorizontal, Pencil, Trash2, ExternalLink, Link2, Send } from "lucide-react";
import { DataTableColumnHeader } from "@/components/shared/data-table-column-header";
import { Badge } from "@/components/ui/badge";
import { Button } from "@/components/ui/button";
@@ -32,6 +32,8 @@ export interface KickstarterRow {
interface KickstarterColumnsProps {
onEdit: (kickstarter: KickstarterRow) => void;
onDelete: (id: string) => void;
onLinkPackages: (kickstarter: KickstarterRow) => void;
onSendAll: (kickstarter: KickstarterRow) => void;
}
const deliveryConfig: Record<string, { label: string; className: string }> = {
@@ -63,6 +65,8 @@ const paymentConfig: Record<string, { label: string; className: string }> = {
export function getKickstarterColumns({
onEdit,
onDelete,
onLinkPackages,
onSendAll,
}: KickstarterColumnsProps): ColumnDef<KickstarterRow, unknown>[] {
return [
{
@@ -170,6 +174,16 @@ export function getKickstarterColumns({
<Pencil className="mr-2 h-3.5 w-3.5" />
Edit
</DropdownMenuItem>
<DropdownMenuItem onClick={() => onLinkPackages(row.original)}>
<Link2 className="mr-2 h-3.5 w-3.5" />
Link Packages
</DropdownMenuItem>
{row.original._count.packages > 0 && (
<DropdownMenuItem onClick={() => onSendAll(row.original)}>
<Send className="mr-2 h-3.5 w-3.5" />
Send All ({row.original._count.packages})
</DropdownMenuItem>
)}
<DropdownMenuSeparator />
<DropdownMenuItem
onClick={() => onDelete(row.original.id)}

View File

@@ -7,7 +7,8 @@ import { toast } from "sonner";
import { useDataTable } from "@/hooks/use-data-table";
import { getKickstarterColumns, type KickstarterRow } from "./kickstarter-columns";
import { KickstarterModal } from "./kickstarter-modal";
import { deleteKickstarter } from "../actions";
import { PackageLinkerDialog } from "./package-linker-dialog";
import { deleteKickstarter, sendAllKickstarterPackages } from "../actions";
import { DataTable } from "@/components/shared/data-table";
import { DataTablePagination } from "@/components/shared/data-table-pagination";
import { DataTableViewOptions } from "@/components/shared/data-table-view-options";
@@ -50,6 +51,7 @@ export function KickstarterTable({
const [modalOpen, setModalOpen] = useState(false);
const [editKickstarter, setEditKickstarter] = useState<KickstarterRow | undefined>();
const [deleteId, setDeleteId] = useState<string | null>(null);
const [linkTarget, setLinkTarget] = useState<KickstarterRow | null>(null);
const [searchValue, setSearchValue] = useState(searchParams.get("search") ?? "");
@@ -88,6 +90,17 @@ export function KickstarterTable({
setModalOpen(true);
},
onDelete: (id) => setDeleteId(id),
onLinkPackages: (kickstarter) => setLinkTarget(kickstarter),
onSendAll: (kickstarter) => {
startTransition(async () => {
const result = await sendAllKickstarterPackages(kickstarter.id);
if (result.success) {
toast.success(`Queued ${result.data!.queued} package(s) for delivery`);
} else {
toast.error(result.error);
}
});
},
});
const { table } = useDataTable({ data, columns, pageCount });
@@ -188,6 +201,15 @@ export function KickstarterTable({
onConfirm={handleDelete}
isLoading={isPending}
/>
{linkTarget && (
<PackageLinkerDialog
open={!!linkTarget}
onOpenChange={(open) => !open && setLinkTarget(null)}
kickstarterId={linkTarget.id}
kickstarterName={linkTarget.name}
/>
)}
</div>
);
}

View File

@@ -0,0 +1,211 @@
"use client";
import { useState, useTransition, useCallback, useEffect } from "react";
import { Search, Package, X, Loader2 } from "lucide-react";
import { toast } from "sonner";
import { linkPackages } from "../actions";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Badge } from "@/components/ui/badge";
import { Checkbox } from "@/components/ui/checkbox";
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
import { ScrollArea } from "@/components/ui/scroll-area";
interface PackageResult {
id: string;
fileName: string;
fileSize: string;
archiveType: string;
creator: string | null;
fileCount: number;
}
interface PackageLinkerDialogProps {
open: boolean;
onOpenChange: (open: boolean) => void;
kickstarterId: string;
kickstarterName: string;
}
function formatSize(bytes: string | number): string {
const b = Number(bytes);
if (b >= 1024 * 1024 * 1024) return `${(b / (1024 * 1024 * 1024)).toFixed(1)} GB`;
if (b >= 1024 * 1024) return `${(b / (1024 * 1024)).toFixed(0)} MB`;
return `${(b / 1024).toFixed(0)} KB`;
}
export function PackageLinkerDialog({
open,
onOpenChange,
kickstarterId,
kickstarterName,
}: PackageLinkerDialogProps) {
const [isPending, startTransition] = useTransition();
const [searchQuery, setSearchQuery] = useState("");
const [searchResults, setSearchResults] = useState<PackageResult[]>([]);
const [isSearching, setIsSearching] = useState(false);
const [selectedIds, setSelectedIds] = useState<Set<string>>(new Set());
// Fetch currently linked packages when dialog opens
useEffect(() => {
if (open) {
setSearchQuery("");
setSearchResults([]);
fetch(`/api/packages/linked?kickstarterId=${kickstarterId}`)
.then((res) => res.json())
.then((data) => {
if (data.packageIds) {
setSelectedIds(new Set(data.packageIds));
}
})
.catch(() => {});
}
}, [open, kickstarterId]);
const doSearch = useCallback(async (query: string) => {
if (query.length < 2) {
setSearchResults([]);
return;
}
setIsSearching(true);
try {
const res = await fetch(`/api/packages/search?q=${encodeURIComponent(query)}&limit=20`);
if (res.ok) {
const data = await res.json();
setSearchResults(data.packages ?? []);
}
} catch {
// Ignore search errors
} finally {
setIsSearching(false);
}
}, []);
// Debounced search
useEffect(() => {
const timer = setTimeout(() => doSearch(searchQuery), 300);
return () => clearTimeout(timer);
}, [searchQuery, doSearch]);
function togglePackage(id: string) {
setSelectedIds((prev) => {
const next = new Set(prev);
if (next.has(id)) next.delete(id);
else next.add(id);
return next;
});
}
function handleSave() {
startTransition(async () => {
const result = await linkPackages(kickstarterId, Array.from(selectedIds));
if (result.success) {
toast.success(`Linked ${selectedIds.size} package(s) to "${kickstarterName}"`);
onOpenChange(false);
} else {
toast.error(result.error);
}
});
}
return (
<Dialog open={open} onOpenChange={onOpenChange}>
<DialogContent className="sm:max-w-lg">
<DialogHeader>
<DialogTitle>Link Packages</DialogTitle>
<DialogDescription>
Search and select STL packages to link to &ldquo;{kickstarterName}&rdquo;.
</DialogDescription>
</DialogHeader>
<div className="space-y-3">
{selectedIds.size > 0 && (
<div className="flex items-center gap-2 text-sm text-muted-foreground">
<Package className="h-4 w-4" />
{selectedIds.size} package(s) selected
<Button
variant="ghost"
size="sm"
className="h-6 px-2 text-xs"
onClick={() => setSelectedIds(new Set())}
>
Clear all
</Button>
</div>
)}
<div className="relative">
<Search className="absolute left-2.5 top-2.5 h-4 w-4 text-muted-foreground" />
<Input
placeholder="Search packages by name or creator..."
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
className="pl-9"
autoFocus
/>
{isSearching && (
<Loader2 className="absolute right-2.5 top-2.5 h-4 w-4 animate-spin text-muted-foreground" />
)}
</div>
<ScrollArea className="h-[300px] rounded-md border">
<div className="p-2 space-y-1">
{searchResults.length === 0 && searchQuery.length >= 2 && !isSearching && (
<p className="text-sm text-muted-foreground text-center py-8">
No packages found
</p>
)}
{searchQuery.length < 2 && (
<p className="text-sm text-muted-foreground text-center py-8">
Type at least 2 characters to search
</p>
)}
{searchResults.map((pkg) => (
<label
key={pkg.id}
className="flex items-center gap-3 p-2 rounded-md hover:bg-muted/50 cursor-pointer"
>
<Checkbox
checked={selectedIds.has(pkg.id)}
onCheckedChange={() => togglePackage(pkg.id)}
/>
<div className="flex-1 min-w-0">
<p className="text-sm font-medium truncate">{pkg.fileName}</p>
<div className="flex items-center gap-2 text-xs text-muted-foreground">
{pkg.creator && <span>{pkg.creator}</span>}
<span>{formatSize(pkg.fileSize)}</span>
<Badge variant="outline" className="text-[10px] h-4 px-1">
{pkg.archiveType}
</Badge>
{pkg.fileCount > 0 && <span>{pkg.fileCount} files</span>}
</div>
</div>
{selectedIds.has(pkg.id) && (
<X className="h-3.5 w-3.5 text-muted-foreground shrink-0" />
)}
</label>
))}
</div>
</ScrollArea>
</div>
<DialogFooter>
<Button variant="outline" onClick={() => onOpenChange(false)}>
Cancel
</Button>
<Button onClick={handleSave} disabled={isPending}>
{isPending ? <Loader2 className="h-4 w-4 animate-spin mr-1" /> : null}
Save ({selectedIds.size})
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
}

View File

@@ -146,3 +146,83 @@ export async function linkPackages(
return { success: false, error: "Failed to link packages" };
}
}
export async function sendAllKickstarterPackages(
kickstarterId: string
): Promise<ActionResult<{ queued: number }>> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
const telegramLink = await prisma.telegramLink.findUnique({
where: { userId: session.user.id },
});
if (!telegramLink) {
return { success: false, error: "No linked Telegram account. Link one in Settings." };
}
const kickstarter = await prisma.kickstarter.findFirst({
where: { id: kickstarterId, userId: session.user.id },
select: {
packages: {
select: {
package: {
select: { id: true, destChannelId: true, destMessageId: true, fileName: true },
},
},
},
},
});
if (!kickstarter) {
return { success: false, error: "Kickstarter not found" };
}
const sendablePackages = kickstarter.packages
.map((lnk) => lnk.package)
.filter((p) => p.destChannelId && p.destMessageId);
if (sendablePackages.length === 0) {
return { success: false, error: "No linked packages are available for sending" };
}
let queued = 0;
for (const pkg of sendablePackages) {
const existing = await prisma.botSendRequest.findFirst({
where: {
packageId: pkg.id,
telegramLinkId: telegramLink.id,
status: { in: ["PENDING", "SENDING"] },
},
});
if (!existing) {
const sendRequest = await prisma.botSendRequest.create({
data: {
packageId: pkg.id,
telegramLinkId: telegramLink.id,
requestedByUserId: session.user.id,
status: "PENDING",
},
});
try {
await prisma.$queryRawUnsafe(
`SELECT pg_notify('bot_send', $1)`,
sendRequest.id
);
} catch {
// Best-effort
}
queued++;
}
}
revalidatePath(REVALIDATE_PATH);
return { success: true, data: { queued } };
} catch {
return { success: false, error: "Failed to send packages" };
}
}

View File

@@ -1,7 +1,7 @@
"use client";
import { type ColumnDef } from "@tanstack/react-table";
import { FileArchive, Eye, ChevronRight, Layers, Ungroup, Send, ImagePlus } from "lucide-react";
import { FileArchive, Eye, ChevronRight, Layers, Ungroup, Send, ImagePlus, GitMerge } from "lucide-react";
import { DataTableColumnHeader } from "@/components/shared/data-table-column-header";
import { Badge } from "@/components/ui/badge";
import { Button } from "@/components/ui/button";
@@ -69,6 +69,9 @@ interface PackageColumnsProps {
onGroupPreviewUpload: (groupId: string) => void;
selectedPackages: Set<string>;
onToggleSelect: (packageId: string) => void;
mergeSourceId: string | null;
onStartMerge: (groupId: string) => void;
onCompleteMerge: (targetGroupId: string) => void;
}
export function formatBytes(bytesStr: string): string {
@@ -148,6 +151,9 @@ export function getPackageColumns({
onGroupPreviewUpload,
selectedPackages,
onToggleSelect,
mergeSourceId,
onStartMerge,
onCompleteMerge,
}: PackageColumnsProps): ColumnDef<StlTableRow, unknown>[] {
return [
{
@@ -392,6 +398,8 @@ export function getPackageColumns({
cell: ({ row }) => {
const data = row.original;
if (isGroupRow(data)) {
const isMergeSource = mergeSourceId === data.id;
const canMergeHere = mergeSourceId !== null && mergeSourceId !== data.id;
return (
<div className="flex items-center gap-0.5">
<Button
@@ -403,6 +411,26 @@ export function getPackageColumns({
>
<Send className="h-4 w-4" />
</Button>
<Button
variant="ghost"
size="icon"
className={`h-8 w-8 ${isMergeSource ? "text-amber-500 bg-amber-500/10 hover:bg-amber-500/20" : ""}`}
onClick={() => onStartMerge(data.id)}
title={isMergeSource ? "Cancel merge (this group is the merge source)" : "Start merge — mark this group as merge source"}
>
<GitMerge className="h-4 w-4" />
</Button>
{canMergeHere && (
<Button
variant="ghost"
size="icon"
className="h-8 w-8 text-primary bg-primary/10 hover:bg-primary/20"
onClick={() => onCompleteMerge(data.id)}
title="Merge source group into this group"
>
<Layers className="h-4 w-4" />
</Button>
)}
<Button
variant="ghost"
size="icon"

View File

@@ -20,6 +20,7 @@ export interface SkippedRow {
sourceChannel: { id: string; title: string };
isMultipart: boolean;
partCount: number;
attemptCount: number;
createdAt: string;
}
@@ -107,9 +108,22 @@ export function getSkippedColumns({
),
accessorFn: (row) => row.sourceChannel.title,
},
{
accessorKey: "attemptCount",
header: ({ column }) => <DataTableColumnHeader column={column} title="Attempts" />,
cell: ({ row }) => {
const count = row.original.attemptCount;
const variant = count >= 5 ? "destructive" : count > 1 ? "secondary" : "outline";
return (
<Badge variant={variant} className="text-[10px]">
{count}
</Badge>
);
},
},
{
accessorKey: "createdAt",
header: ({ column }) => <DataTableColumnHeader column={column} title="Skipped" />,
header: ({ column }) => <DataTableColumnHeader column={column} title="Last Skipped" />,
cell: ({ row }) => (
<span className="text-sm text-muted-foreground">
{new Date(row.original.createdAt).toLocaleDateString()}

View File

@@ -3,7 +3,8 @@
import { useState, useCallback, useTransition, useMemo, useRef } from "react";
import { useRouter, usePathname, useSearchParams } from "next/navigation";
import { toast } from "sonner";
import { Search, Layers } from "lucide-react";
import { Search, Layers, Upload } from "lucide-react";
import { UploadDialog } from "./upload-dialog";
import { useDataTable } from "@/hooks/use-data-table";
import {
getPackageColumns,
@@ -38,7 +39,7 @@ import {
} from "@/components/ui/dialog";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Badge } from "@/components/ui/badge";
import type { DisplayItem, IngestionAccountStatus } from "@/lib/telegram/types";
import type { DisplayItem, IngestionAccountStatus, PackageListItem } from "@/lib/telegram/types";
import type { SkippedRow } from "./skipped-columns";
import {
updatePackageCreator,
@@ -49,6 +50,7 @@ import {
removeFromGroupAction,
sendAllInGroupAction,
updateGroupPreviewAction,
mergeGroupsAction,
} from "../actions";
interface StlTableProps {
@@ -61,6 +63,9 @@ interface StlTableProps {
skippedData: SkippedRow[];
skippedPageCount: number;
skippedTotalCount: number;
ungroupedData: PackageListItem[];
ungroupedPageCount: number;
ungroupedTotalCount: number;
}
export function StlTable({
@@ -73,6 +78,9 @@ export function StlTable({
skippedData,
skippedPageCount,
skippedTotalCount,
ungroupedData,
ungroupedPageCount,
ungroupedTotalCount,
}: StlTableProps) {
const router = useRouter();
const pathname = usePathname();
@@ -96,6 +104,12 @@ export function StlTable({
const previewInputRef = useRef<HTMLInputElement>(null);
const [uploadGroupId, setUploadGroupId] = useState<string | null>(null);
// Group merge state
const [mergeSourceId, setMergeSourceId] = useState<string | null>(null);
// Upload dialog state
const [uploadOpen, setUploadOpen] = useState(false);
const toggleGroup = useCallback((groupId: string) => {
setExpandedGroups((prev) => {
const next = new Set(prev);
@@ -334,6 +348,35 @@ export function StlTable({
[uploadGroupId, router]
);
const handleStartMerge = useCallback((groupId: string) => {
setMergeSourceId((prev) => {
if (prev === groupId) {
toast.info("Merge cancelled");
return null;
}
toast.info("Merge source selected — click the merge-here button on the target group");
return groupId;
});
}, []);
const handleMergeGroups = useCallback(
(targetGroupId: string) => {
if (!mergeSourceId) return;
const sourceId = mergeSourceId;
startTransition(async () => {
const result = await mergeGroupsAction(targetGroupId, sourceId);
if (result.success) {
toast.success("Groups merged successfully");
setMergeSourceId(null);
router.refresh();
} else {
toast.error(result.error);
}
});
},
[mergeSourceId, router]
);
const columns = getPackageColumns({
onViewFiles: (pkg) => setViewPkg(pkg),
searchTerm,
@@ -375,10 +418,30 @@ export function StlTable({
onGroupPreviewUpload: handleGroupPreviewUpload,
selectedPackages,
onToggleSelect: toggleSelect,
mergeSourceId,
onStartMerge: handleStartMerge,
onCompleteMerge: handleMergeGroups,
});
const { table } = useDataTable({ data: tableRows, columns, pageCount });
const ungroupedRows: StlTableRow[] = useMemo(
() =>
ungroupedData.map((pkg) => ({
...pkg,
_rowType: "package" as const,
_groupId: null,
_isGroupMember: false,
})),
[ungroupedData]
);
const { table: ungroupedTable } = useDataTable({
data: ungroupedRows,
columns,
pageCount: ungroupedPageCount,
});
const activeTag = searchParams.get("tag") ?? "";
return (
@@ -401,6 +464,14 @@ export function StlTable({
</Badge>
)}
</TabsTrigger>
<TabsTrigger value="ungrouped" className="gap-1.5">
Ungrouped
{ungroupedTotalCount > 0 && (
<Badge variant="secondary" className="h-5 px-1.5 text-[10px]">
{ungroupedTotalCount}
</Badge>
)}
</TabsTrigger>
</TabsList>
<TabsContent value="packages" className="space-y-4">
@@ -430,6 +501,10 @@ export function StlTable({
</Select>
)}
<DataTableViewOptions table={table} />
<Button variant="outline" size="sm" className="h-9" onClick={() => setUploadOpen(true)}>
<Upload className="mr-2 h-4 w-4" />
Upload Files
</Button>
{selectedPackages.size >= 2 && (
<Button
variant="outline"
@@ -472,6 +547,11 @@ export function StlTable({
totalCount={skippedTotalCount}
/>
</TabsContent>
<TabsContent value="ungrouped" className="space-y-4">
<DataTable table={ungroupedTable} emptyMessage="All packages are grouped!" />
<DataTablePagination table={ungroupedTable} totalCount={ungroupedTotalCount} />
</TabsContent>
</Tabs>
<PackageFilesDrawer
@@ -515,6 +595,8 @@ export function StlTable({
</DialogContent>
</Dialog>
<UploadDialog open={uploadOpen} onOpenChange={setUploadOpen} />
{/* Hidden file input for group preview upload (Task 12) */}
<input
ref={previewInputRef}

View File

@@ -0,0 +1,243 @@
"use client";
import { useState, useRef, useTransition, useEffect } from "react";
import { Upload, File, X, Loader2, CheckCircle2, AlertCircle } from "lucide-react";
import { toast } from "sonner";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
interface UploadDialogProps {
open: boolean;
onOpenChange: (open: boolean) => void;
}
function formatSize(bytes: number): string {
if (bytes >= 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
if (bytes >= 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(0)} MB`;
return `${(bytes / 1024).toFixed(0)} KB`;
}
type UploadStatus = "idle" | "uploading" | "processing" | "done" | "error";
export function UploadDialog({ open, onOpenChange }: UploadDialogProps) {
const [files, setFiles] = useState<File[]>([]);
const [groupName, setGroupName] = useState("");
const [status, setStatus] = useState<UploadStatus>("idle");
const [error, setError] = useState<string | null>(null);
const [isPending, startTransition] = useTransition();
const fileInputRef = useRef<HTMLInputElement>(null);
const pollRef = useRef<ReturnType<typeof setInterval> | null>(null);
useEffect(() => {
if (open) {
setFiles([]);
setGroupName("");
setStatus("idle");
setError(null);
}
return () => {
if (pollRef.current) clearInterval(pollRef.current);
};
}, [open]);
function handleFileChange(e: React.ChangeEvent<HTMLInputElement>) {
if (e.target.files) {
setFiles(Array.from(e.target.files));
}
}
function removeFile(index: number) {
setFiles((prev) => prev.filter((_, i) => i !== index));
}
function handleUpload() {
if (files.length === 0) return;
startTransition(async () => {
setStatus("uploading");
setError(null);
try {
const formData = new FormData();
for (const file of files) {
formData.append("files", file);
}
if (groupName.trim()) {
formData.append("groupName", groupName.trim());
}
const res = await fetch("/api/uploads", {
method: "POST",
body: formData,
});
const data = await res.json();
if (!res.ok) {
setStatus("error");
setError(data.error ?? "Upload failed");
return;
}
setStatus("processing");
// Poll for completion
pollRef.current = setInterval(async () => {
try {
const statusRes = await fetch(`/api/uploads/${data.uploadId}`);
const statusData = await statusRes.json();
if (statusData.status === "COMPLETED") {
setStatus("done");
toast.success(`${files.length} file(s) uploaded and indexed`);
if (pollRef.current) clearInterval(pollRef.current);
} else if (statusData.status === "FAILED") {
setStatus("error");
setError(statusData.errorMessage ?? "Processing failed");
if (pollRef.current) clearInterval(pollRef.current);
}
} catch {
// Keep polling
}
}, 3000);
// Stop polling after 10 minutes
setTimeout(() => {
if (pollRef.current) {
clearInterval(pollRef.current);
pollRef.current = null;
setStatus((s) => s === "processing" ? "done" : s);
}
}, 600_000);
} catch {
setStatus("error");
setError("Network error");
}
});
}
return (
<Dialog open={open} onOpenChange={onOpenChange}>
<DialogContent className="sm:max-w-lg">
<DialogHeader>
<DialogTitle>Upload Files</DialogTitle>
<DialogDescription>
Upload archive files to be processed and indexed. Multiple files will be automatically grouped.
</DialogDescription>
</DialogHeader>
{status === "idle" && (
<div className="space-y-4">
<div
className="border-2 border-dashed rounded-lg p-8 text-center cursor-pointer hover:border-primary/50 transition-colors"
onClick={() => fileInputRef.current?.click()}
>
<Upload className="h-8 w-8 mx-auto mb-2 text-muted-foreground" />
<p className="text-sm text-muted-foreground">
Click to select files or drag & drop
</p>
<p className="text-xs text-muted-foreground mt-1">
ZIP, RAR, 7Z files up to 4GB each
</p>
<input
ref={fileInputRef}
type="file"
multiple
accept=".zip,.rar,.7z,.pdf,.stl"
onChange={handleFileChange}
className="hidden"
/>
</div>
{files.length > 0 && (
<div className="space-y-2">
{files.map((file, i) => (
<div key={i} className="flex items-center gap-2 p-2 rounded bg-muted/30">
<File className="h-4 w-4 shrink-0 text-muted-foreground" />
<span className="text-sm flex-1 truncate">{file.name}</span>
<span className="text-xs text-muted-foreground">{formatSize(file.size)}</span>
<button onClick={() => removeFile(i)} className="p-0.5 hover:text-destructive">
<X className="h-3.5 w-3.5" />
</button>
</div>
))}
</div>
)}
{files.length > 1 && (
<div>
<Label htmlFor="groupName" className="text-sm">Group Name (optional)</Label>
<Input
id="groupName"
value={groupName}
onChange={(e) => setGroupName(e.target.value)}
placeholder="Auto-generated from filenames"
className="mt-1"
/>
</div>
)}
</div>
)}
{(status === "uploading" || status === "processing") && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-muted/30 border">
<Loader2 className="h-6 w-6 animate-spin text-primary" />
<div>
<p className="text-sm font-medium">
{status === "uploading" ? "Uploading files..." : "Processing & uploading to Telegram..."}
</p>
<p className="text-xs text-muted-foreground mt-0.5">
{status === "uploading"
? "Sending files to server"
: "Hashing, extracting metadata, uploading to destination channel"}
</p>
</div>
</div>
)}
{status === "done" && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-green-500/10 border border-green-500/20">
<CheckCircle2 className="h-6 w-6 text-green-500" />
<div>
<p className="text-sm font-medium text-green-500">Upload complete!</p>
<p className="text-xs text-muted-foreground">Files have been indexed and uploaded to Telegram.</p>
</div>
</div>
)}
{status === "error" && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-destructive/10 border border-destructive/20">
<AlertCircle className="h-6 w-6 text-destructive" />
<div>
<p className="text-sm font-medium text-destructive">Upload failed</p>
<p className="text-xs text-muted-foreground">{error}</p>
</div>
</div>
)}
<DialogFooter>
{status === "idle" && (
<>
<Button variant="outline" onClick={() => onOpenChange(false)}>Cancel</Button>
<Button onClick={handleUpload} disabled={files.length === 0 || isPending}>
<Upload className="h-4 w-4 mr-1" />
Upload {files.length > 0 ? `(${files.length})` : ""}
</Button>
</>
)}
{(status === "done" || status === "error") && (
<Button variant="outline" onClick={() => onOpenChange(false)}>Close</Button>
)}
</DialogFooter>
</DialogContent>
</Dialog>
);
}

View File

@@ -10,6 +10,7 @@ import {
createManualGroup,
removePackageFromGroup,
dissolveGroup,
mergeGroups,
} from "@/lib/telegram/queries";
const ALLOWED_IMAGE_TYPES = [
@@ -185,6 +186,62 @@ export async function setPreviewFromExtract(
}
}
export async function repairPackageAction(
packageId: string
): Promise<ActionResult> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
const pkg = await prisma.package.findUnique({
where: { id: packageId },
select: {
id: true,
fileName: true,
sourceChannelId: true,
sourceMessageId: true,
destChannelId: true,
destMessageId: true,
},
});
if (!pkg) return { success: false, error: "Package not found" };
// Clear the destination info so the worker re-processes it
await prisma.package.update({
where: { id: packageId },
data: {
destMessageId: null,
destMessageIds: [],
destChannelId: null,
},
});
// Reset the channel watermark to before this message so worker picks it up
await prisma.accountChannelMap.updateMany({
where: {
channelId: pkg.sourceChannelId,
lastProcessedMessageId: { gte: pkg.sourceMessageId },
},
data: { lastProcessedMessageId: pkg.sourceMessageId - BigInt(1) },
});
// Mark related notifications as read
await prisma.systemNotification.updateMany({
where: {
context: { path: ["packageId"], equals: packageId },
isRead: false,
},
data: { isRead: true },
});
revalidatePath("/stls");
return { success: true, data: undefined };
} catch {
return { success: false, error: "Failed to schedule repair" };
}
}
export async function retrySkippedPackageAction(
id: string
): Promise<ActionResult> {
@@ -435,6 +492,26 @@ export async function updateGroupPreviewAction(
}
}
export async function mergeGroupsAction(
targetGroupId: string,
sourceGroupId: string
): Promise<ActionResult> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
if (targetGroupId === sourceGroupId) {
return { success: false, error: "Cannot merge a group with itself" };
}
try {
await mergeGroups(targetGroupId, sourceGroupId);
revalidatePath("/stls");
return { success: true, data: undefined };
} catch {
return { success: false, error: "Failed to merge groups" };
}
}
export async function sendAllInGroupAction(
groupId: string
): Promise<ActionResult> {

View File

@@ -1,6 +1,6 @@
import { auth } from "@/lib/auth";
import { redirect } from "next/navigation";
import { listDisplayItems, searchPackages, getIngestionStatus, getAllPackageTags, listSkippedPackages, countSkippedPackages } from "@/lib/telegram/queries";
import { listDisplayItems, searchPackages, getIngestionStatus, getAllPackageTags, listSkippedPackages, countSkippedPackages, listUngroupedPackages, countUngroupedPackages } from "@/lib/telegram/queries";
import { StlTable } from "./_components/stl-table";
import type { DisplayItem, PackageListItem } from "@/lib/telegram/types";
@@ -24,7 +24,7 @@ export default async function StlFilesPage({ searchParams }: Props) {
const tab = (params.tab as string) ?? "packages";
// Fetch packages, ingestion status, tags, and skipped count in parallel
const [result, ingestionStatus, availableTags, skippedCount] = await Promise.all([
const [result, ingestionStatus, availableTags, skippedCount, ungroupedCount] = await Promise.all([
search
? searchPackages({
query: search,
@@ -43,6 +43,7 @@ export default async function StlFilesPage({ searchParams }: Props) {
getIngestionStatus(),
getAllPackageTags(),
countSkippedPackages(),
countUngroupedPackages(),
]);
// For search results, wrap as DisplayItem[]; for non-search, already DisplayItem[]
@@ -55,6 +56,11 @@ export default async function StlFilesPage({ searchParams }: Props) {
? await listSkippedPackages({ page, limit: perPage })
: null;
// Fetch ungrouped packages only if on that tab
const ungroupedResult = tab === "ungrouped"
? await listUngroupedPackages({ page, limit: perPage })
: null;
return (
<StlTable
data={displayItems}
@@ -66,6 +72,9 @@ export default async function StlFilesPage({ searchParams }: Props) {
skippedData={skippedResult?.items ?? []}
skippedPageCount={skippedResult?.pagination.totalPages ?? 0}
skippedTotalCount={skippedCount}
ungroupedData={ungroupedResult?.items ?? []}
ungroupedPageCount={ungroupedResult?.pagination.totalPages ?? 0}
ungroupedTotalCount={ungroupedCount}
/>
);
}

View File

@@ -291,10 +291,25 @@ export async function setChannelCategory(
if (!admin.success) return admin;
try {
const existing = await prisma.telegramChannel.findUnique({
where: { id },
select: { category: true },
});
if (!existing) return { success: false, error: "Channel not found" };
const oldCategory = existing.category;
const newCategory = category?.trim() || null;
await prisma.telegramChannel.update({
where: { id },
data: { category: category?.trim() || null },
data: { category: newCategory },
});
// Retroactively re-tag packages from this channel when category changes
if (oldCategory !== newCategory && newCategory) {
await retagChannelPackages(id, oldCategory, newCategory);
}
revalidatePath("/telegram");
return { success: true, data: undefined };
} catch {
@@ -302,6 +317,50 @@ export async function setChannelCategory(
}
}
export async function retagChannelPackages(
channelId: string,
oldCategory: string | null,
newCategory: string
): Promise<ActionResult<{ updated: number }>> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
// Find packages from this channel that have the old category tag (or no category tag)
const packages = await prisma.package.findMany({
where: { sourceChannelId: channelId },
select: { id: true, tags: true },
});
let updated = 0;
for (const pkg of packages) {
const tags = [...pkg.tags];
// Remove old category tag if present
if (oldCategory) {
const idx = tags.indexOf(oldCategory);
if (idx !== -1) tags.splice(idx, 1);
}
// Add new category tag if not already present
if (!tags.includes(newCategory)) {
tags.push(newCategory);
}
// Only update if tags actually changed
if (JSON.stringify(tags) !== JSON.stringify(pkg.tags)) {
await prisma.package.update({
where: { id: pkg.id },
data: { tags },
});
updated++;
}
}
revalidatePath("/stls");
return { success: true, data: { updated } };
} catch {
return { success: false, error: "Failed to re-tag packages" };
}
}
export async function setChannelType(
id: string,
type: "SOURCE" | "DESTINATION"

View File

@@ -0,0 +1,33 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import {
markNotificationRead,
markAllNotificationsRead,
dismissNotification,
clearAllNotifications,
} from "@/data/notification.queries";
export const dynamic = "force-dynamic";
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const body = await request.json().catch(() => ({}));
const id = body.id as string | undefined;
const action = (body.action as string) ?? "read";
if (action === "dismiss" && id) {
await dismissNotification(id);
} else if (action === "clear") {
await clearAllNotifications();
} else if (id) {
await markNotificationRead(id);
} else {
await markAllNotificationsRead();
}
return NextResponse.json({ success: true });
}

View File

@@ -0,0 +1,43 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
export const dynamic = "force-dynamic";
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const body = await request.json().catch(() => ({}));
const notificationId = body.notificationId as string;
if (!notificationId) {
return NextResponse.json({ error: "notificationId required" }, { status: 400 });
}
const notification = await prisma.systemNotification.findUnique({
where: { id: notificationId },
});
if (!notification) {
return NextResponse.json({ error: "Notification not found" }, { status: 404 });
}
const context = notification.context as Record<string, unknown> | null;
const packageId = context?.packageId as string | undefined;
if (!packageId) {
return NextResponse.json({ error: "Notification has no associated package" }, { status: 400 });
}
// Import and call the repair action
const { repairPackageAction } = await import("@/app/(app)/stls/actions");
const result = await repairPackageAction(packageId);
if (!result.success) {
return NextResponse.json({ error: result.error }, { status: 500 });
}
return NextResponse.json({ success: true });
}

View File

@@ -0,0 +1,27 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import {
getRecentNotifications,
getUnreadNotificationCount,
} from "@/data/notification.queries";
export const dynamic = "force-dynamic";
export async function GET() {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const [notifications, unreadCount] = await Promise.all([
getRecentNotifications(30),
getUnreadNotificationCount(),
]);
const serialized = notifications.map((n) => ({
...n,
createdAt: n.createdAt.toISOString(),
}));
return NextResponse.json({ notifications: serialized, unreadCount });
}

View File

@@ -0,0 +1,21 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { getLinkedPackageIds } from "@/data/kickstarter.queries";
export const dynamic = "force-dynamic";
export async function GET(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { searchParams } = new URL(request.url);
const kickstarterId = searchParams.get("kickstarterId");
if (!kickstarterId) {
return NextResponse.json({ error: "kickstarterId required" }, { status: 400 });
}
const packageIds = await getLinkedPackageIds(kickstarterId);
return NextResponse.json({ packageIds });
}

View File

@@ -0,0 +1,26 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { searchPackagesForLinking } from "@/data/kickstarter.queries";
export const dynamic = "force-dynamic";
export async function GET(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { searchParams } = new URL(request.url);
const query = searchParams.get("q") ?? "";
const limit = Math.min(Number(searchParams.get("limit") ?? "20"), 50);
const packages = await searchPackagesForLinking(query, limit);
// Serialize BigInt for JSON
const serialized = packages.map((p) => ({
...p,
fileSize: p.fileSize.toString(),
}));
return NextResponse.json({ packages: serialized });
}

View File

@@ -0,0 +1,43 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
export const dynamic = "force-dynamic";
export async function GET(
_request: Request,
{ params }: { params: Promise<{ id: string }> }
) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { id } = await params;
const upload = await prisma.manualUpload.findUnique({
where: { id },
include: {
files: {
select: { id: true, fileName: true, fileSize: true, packageId: true },
},
},
});
if (!upload || upload.userId !== session.user.id) {
return NextResponse.json({ error: "Not found" }, { status: 404 });
}
return NextResponse.json({
id: upload.id,
status: upload.status,
groupName: upload.groupName,
errorMessage: upload.errorMessage,
files: upload.files.map((f) => ({
...f,
fileSize: f.fileSize.toString(),
})),
createdAt: upload.createdAt.toISOString(),
completedAt: upload.completedAt?.toISOString() ?? null,
});
}

View File

@@ -0,0 +1,83 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
import { writeFile, mkdir } from "fs/promises";
import path from "path";
export const dynamic = "force-dynamic";
const UPLOAD_DIR = process.env.UPLOAD_DIR ?? "/data/uploads";
const MAX_FILE_SIZE = 4 * 1024 * 1024 * 1024; // 4GB per file
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
try {
const formData = await request.formData();
const files = formData.getAll("files") as File[];
const groupName = formData.get("groupName") as string | null;
if (!files.length) {
return NextResponse.json({ error: "No files provided" }, { status: 400 });
}
// Create the upload record
const upload = await prisma.manualUpload.create({
data: {
userId: session.user.id,
groupName: groupName || (files.length > 1 ? files[0].name.replace(/\.[^.]+$/, "") : null),
status: "PENDING",
},
});
// Save files to shared volume
const uploadDir = path.join(UPLOAD_DIR, upload.id);
await mkdir(uploadDir, { recursive: true });
for (const file of files) {
if (file.size > MAX_FILE_SIZE) {
return NextResponse.json(
{ error: `File "${file.name}" exceeds 4GB limit` },
{ status: 400 }
);
}
const filePath = path.join(uploadDir, file.name);
const buffer = Buffer.from(await file.arrayBuffer());
await writeFile(filePath, buffer);
await prisma.manualUploadFile.create({
data: {
uploadId: upload.id,
fileName: file.name,
filePath,
fileSize: BigInt(file.size),
},
});
}
// Notify worker
try {
await prisma.$queryRawUnsafe(
`SELECT pg_notify('manual_upload', $1)`,
upload.id
);
} catch {
// Best-effort
}
return NextResponse.json({
uploadId: upload.id,
fileCount: files.length,
status: "PENDING",
});
} catch (err) {
return NextResponse.json(
{ error: err instanceof Error ? err.message : "Upload failed" },
{ status: 500 }
);
}
}

View File

@@ -6,6 +6,7 @@ import { Button } from "@/components/ui/button";
import { Sheet, SheetContent, SheetTrigger } from "@/components/ui/sheet";
import { UserMenu } from "./user-menu";
import { MobileSidebar } from "./mobile-sidebar";
import { NotificationBell } from "./notification-bell";
const routeTitles: Record<string, string> = {
"/dashboard": "Dashboard",
@@ -38,7 +39,8 @@ export function Header() {
<h1 className="text-lg font-semibold">{title}</h1>
<div className="ml-auto">
<div className="ml-auto flex items-center gap-1">
<NotificationBell />
<UserMenu />
</div>
</header>

View File

@@ -0,0 +1,268 @@
"use client";
import { useState, useEffect, useCallback } from "react";
import { Bell, AlertTriangle, AlertCircle, Info, CheckCircle2, X, Trash2 } from "lucide-react";
import { Button } from "@/components/ui/button";
import { Badge } from "@/components/ui/badge";
import {
Popover,
PopoverContent,
PopoverTrigger,
} from "@/components/ui/popover";
import { ScrollArea } from "@/components/ui/scroll-area";
import { toast } from "sonner";
interface Notification {
id: string;
type: string;
severity: "INFO" | "WARNING" | "ERROR";
title: string;
message: string;
isRead: boolean;
createdAt: string;
}
const severityIcon = {
INFO: Info,
WARNING: AlertTriangle,
ERROR: AlertCircle,
};
const severityColor = {
INFO: "text-blue-400",
WARNING: "text-orange-400",
ERROR: "text-red-400",
};
export function NotificationBell() {
const [notifications, setNotifications] = useState<Notification[]>([]);
const [unreadCount, setUnreadCount] = useState(0);
const [open, setOpen] = useState(false);
const fetchNotifications = useCallback(async () => {
try {
const res = await fetch("/api/notifications");
if (res.ok) {
const data = await res.json();
setNotifications(data.notifications ?? []);
setUnreadCount(data.unreadCount ?? 0);
}
} catch {
// Ignore fetch errors
}
}, []);
// Poll every 30 seconds + on mount
useEffect(() => {
fetchNotifications();
const interval = setInterval(fetchNotifications, 30_000);
return () => clearInterval(interval);
}, [fetchNotifications]);
// Refresh when popover opens
useEffect(() => {
if (open) fetchNotifications();
}, [open, fetchNotifications]);
async function handleMarkAllRead() {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({}),
});
setNotifications((prev) => prev.map((n) => ({ ...n, isRead: true })));
setUnreadCount(0);
} catch {
// Ignore
}
}
async function handleMarkRead(id: string) {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ id }),
});
setNotifications((prev) =>
prev.map((n) => (n.id === id ? { ...n, isRead: true } : n))
);
setUnreadCount((c) => Math.max(0, c - 1));
} catch {
// Ignore
}
}
async function handleDismiss(id: string) {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ id, action: "dismiss" }),
});
setNotifications((prev) => prev.filter((n) => n.id !== id));
setUnreadCount((c) => Math.max(0, c - 1));
} catch {
// Ignore
}
}
async function handleClearAll() {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ action: "clear" }),
});
setNotifications([]);
setUnreadCount(0);
} catch {
// Ignore
}
}
async function handleRepair(notificationId: string) {
try {
const res = await fetch("/api/notifications/repair", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ notificationId }),
});
if (res.ok) {
toast.success("Repair scheduled — package will be re-processed on next cycle");
fetchNotifications();
}
} catch {
// Ignore
}
}
function formatTime(iso: string): string {
const d = new Date(iso);
const now = new Date();
const diffMs = now.getTime() - d.getTime();
const diffMin = Math.floor(diffMs / 60_000);
if (diffMin < 1) return "just now";
if (diffMin < 60) return `${diffMin}m ago`;
const diffHr = Math.floor(diffMin / 60);
if (diffHr < 24) return `${diffHr}h ago`;
const diffDay = Math.floor(diffHr / 24);
return `${diffDay}d ago`;
}
return (
<Popover open={open} onOpenChange={setOpen}>
<PopoverTrigger asChild>
<Button variant="ghost" size="icon" className="relative h-9 w-9">
<Bell className="h-4 w-4" />
{unreadCount > 0 && (
<Badge
variant="destructive"
className="absolute -top-1 -right-1 h-4 min-w-4 px-1 text-[10px] leading-none"
>
{unreadCount > 99 ? "99+" : unreadCount}
</Badge>
)}
</Button>
</PopoverTrigger>
<PopoverContent className="w-96 p-0" align="end">
<div className="flex items-center justify-between border-b px-4 py-3">
<h3 className="text-sm font-semibold">Notifications</h3>
<div className="flex items-center gap-1">
{unreadCount > 0 && (
<Button
variant="ghost"
size="sm"
className="h-7 text-xs"
onClick={handleMarkAllRead}
>
Mark all read
</Button>
)}
{notifications.length > 0 && (
<Button
variant="ghost"
size="sm"
className="h-7 text-xs text-muted-foreground"
onClick={handleClearAll}
>
<Trash2 className="h-3 w-3 mr-1" />
Clear
</Button>
)}
</div>
</div>
<ScrollArea className="max-h-[400px]">
{notifications.length === 0 ? (
<div className="flex flex-col items-center justify-center py-8 text-muted-foreground">
<CheckCircle2 className="h-8 w-8 mb-2 opacity-50" />
<p className="text-sm">All clear!</p>
</div>
) : (
<div className="divide-y">
{notifications.map((n) => {
const Icon = severityIcon[n.severity] ?? Info;
const color = severityColor[n.severity] ?? "text-muted-foreground";
return (
<div
key={n.id}
className={`flex w-full gap-3 px-4 py-3 text-left hover:bg-muted/50 transition-colors ${
!n.isRead ? "bg-muted/20" : ""
}`}
role="button"
tabIndex={0}
onClick={() => !n.isRead && handleMarkRead(n.id)}
onKeyDown={(e) => {
if (e.key === "Enter" || e.key === " ") {
if (!n.isRead) handleMarkRead(n.id);
}
}}
>
<Icon className={`h-4 w-4 mt-0.5 shrink-0 ${color}`} />
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2">
<p className={`text-sm truncate ${!n.isRead ? "font-medium" : ""}`}>
{n.title}
</p>
{!n.isRead && (
<span className="h-2 w-2 rounded-full bg-primary shrink-0" />
)}
<button
className="ml-auto shrink-0 p-0.5 rounded hover:bg-muted text-muted-foreground hover:text-foreground"
onClick={(e) => { e.stopPropagation(); handleDismiss(n.id); }}
title="Dismiss"
>
<X className="h-3 w-3" />
</button>
</div>
<p className="text-xs text-muted-foreground line-clamp-2 mt-0.5">
{n.message}
</p>
<p className="text-[10px] text-muted-foreground mt-1">
{formatTime(n.createdAt)}
</p>
{(n.type === "MISSING_PART" || n.type === "HASH_MISMATCH") && (
<Button
variant="outline"
size="sm"
className="h-6 px-2 text-xs mt-1"
onClick={(e) => {
e.stopPropagation();
handleRepair(n.id);
}}
>
Repair
</Button>
)}
</div>
</div>
);
})}
</div>
)}
</ScrollArea>
</PopoverContent>
</Popover>
);
}

View File

@@ -95,3 +95,34 @@ export async function getKickstarterHosts() {
include: { _count: { select: { kickstarters: true } } },
});
}
export async function searchPackagesForLinking(query: string, limit = 20) {
if (!query || query.length < 2) return [];
return prisma.package.findMany({
where: {
OR: [
{ fileName: { contains: query, mode: "insensitive" } },
{ creator: { contains: query, mode: "insensitive" } },
],
},
orderBy: { indexedAt: "desc" },
take: limit,
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
creator: true,
fileCount: true,
},
});
}
export async function getLinkedPackageIds(kickstarterId: string): Promise<string[]> {
const links = await prisma.kickstarterPackage.findMany({
where: { kickstarterId },
select: { packageId: true },
});
return links.map((l) => l.packageId);
}

View File

@@ -0,0 +1,45 @@
import { prisma } from "@/lib/prisma";
export async function getUnreadNotificationCount(): Promise<number> {
return prisma.systemNotification.count({
where: { isRead: false },
});
}
export async function getRecentNotifications(limit = 20) {
return prisma.systemNotification.findMany({
orderBy: { createdAt: "desc" },
take: limit,
select: {
id: true,
type: true,
severity: true,
title: true,
message: true,
isRead: true,
createdAt: true,
},
});
}
export async function markNotificationRead(id: string) {
return prisma.systemNotification.update({
where: { id },
data: { isRead: true },
});
}
export async function markAllNotificationsRead() {
return prisma.systemNotification.updateMany({
where: { isRead: false },
data: { isRead: true },
});
}
export async function dismissNotification(id: string) {
return prisma.systemNotification.delete({ where: { id } });
}
export async function clearAllNotifications() {
return prisma.systemNotification.deleteMany({});
}

View File

@@ -340,6 +340,30 @@ export async function listPackageFiles(options: {
};
}
async function fullTextSearchPackageIds(query: string, limit: number): Promise<string[]> {
// Convert user query to tsquery — handle multi-word by joining with &
const tsQuery = query
.trim()
.split(/\s+/)
.filter((w) => w.length >= 2)
.map((w) => w.replace(/[^a-zA-Z0-9]/g, ""))
.filter(Boolean)
.join(" & ");
if (!tsQuery) return [];
const results = await prisma.$queryRawUnsafe<{ id: string }[]>(
`SELECT id FROM packages
WHERE "searchVector" @@ to_tsquery('english', $1)
ORDER BY ts_rank("searchVector", to_tsquery('english', $1)) DESC
LIMIT $2`,
tsQuery,
limit
);
return results.map((r) => r.id);
}
export async function searchPackages(options: {
query: string;
page: number;
@@ -366,9 +390,21 @@ export async function searchPackages(options: {
);
const fileMatchedIds = fileMatches.map((f) => f.packageId);
// Try full-text search first (better ranking, handles word stemming)
let ftsPackageNameIds: string[] = [];
if (options.searchIn === "both" && q.length >= 3) {
try {
ftsPackageNameIds = await fullTextSearchPackageIds(q, 200);
} catch {
// FTS failed — fall back to ILIKE below
}
}
const packageNameIds =
options.searchIn === "both"
? (
? ftsPackageNameIds.length > 0
? ftsPackageNameIds
: (
await prisma.package.findMany({
where: { fileName: { contains: q, mode: "insensitive" } },
select: { id: true },
@@ -553,6 +589,7 @@ export async function listSkippedPackages(options: {
sourceMessageId: s.sourceMessageId.toString(),
isMultipart: s.isMultipart,
partCount: s.partCount,
attemptCount: s.attemptCount,
createdAt: s.createdAt.toISOString(),
}));
@@ -571,6 +608,72 @@ export async function countSkippedPackages(): Promise<number> {
return prisma.skippedPackage.count();
}
export async function listUngroupedPackages(options: {
page: number;
limit: number;
}) {
const { page, limit } = options;
const skip = (page - 1) * limit;
const where = { packageGroupId: null, destMessageId: { not: null } };
const [items, total] = await Promise.all([
prisma.package.findMany({
where,
orderBy: { indexedAt: "desc" },
skip,
take: limit,
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
creator: true,
fileCount: true,
isMultipart: true,
partCount: true,
tags: true,
indexedAt: true,
previewData: true,
sourceChannel: { select: { id: true, title: true } },
},
}),
prisma.package.count({ where }),
]);
return {
items: items.map((p) => ({
id: p.id,
fileName: p.fileName,
fileSize: p.fileSize.toString(),
contentHash: "",
archiveType: p.archiveType,
creator: p.creator,
fileCount: p.fileCount,
isMultipart: p.isMultipart,
partCount: p.partCount,
tags: p.tags,
indexedAt: p.indexedAt.toISOString(),
hasPreview: !!p.previewData,
sourceChannel: p.sourceChannel,
matchedFileCount: 0,
matchedByContent: false,
})),
pagination: {
total,
totalPages: Math.ceil(total / limit),
page,
limit,
},
};
}
export async function countUngroupedPackages(): Promise<number> {
return prisma.package.count({
where: { packageGroupId: null, destMessageId: { not: null } },
});
}
export async function getPackageGroup(groupId: string) {
return prisma.packageGroup.findUnique({
where: { id: groupId },
@@ -630,6 +733,53 @@ export async function createManualGroup(name: string, packageIds: string[]) {
data: { packageGroupId: group.id },
});
// Learn a grouping rule from the manual override
try {
const linkedPkgs = await prisma.package.findMany({
where: { id: { in: packageIds } },
select: { fileName: true, creator: true },
});
// Extract the common filename pattern
const fileNames = linkedPkgs.map((p) => p.fileName);
let pattern = "";
if (fileNames.length > 1) {
// Find longest common prefix
let prefix = fileNames[0];
for (let i = 1; i < fileNames.length; i++) {
while (!fileNames[i].startsWith(prefix)) {
prefix = prefix.slice(0, -1);
if (!prefix) break;
}
}
const trimmed = prefix.replace(/[\s\-_.(]+$/, "");
if (trimmed.length >= 4) {
pattern = trimmed;
}
}
// Fall back to shared creator
if (!pattern) {
const creators = [...new Set(linkedPkgs.map((p) => p.creator).filter(Boolean))];
if (creators.length === 1 && creators[0]) {
pattern = creators[0];
}
}
if (pattern) {
await prisma.groupingRule.create({
data: {
sourceChannelId: firstPkg.sourceChannelId,
pattern,
signalType: "MANUAL",
createdByGroupId: group.id,
},
});
}
} catch {
// Best-effort — don't fail the group creation if rule learning fails
}
// Clean up empty groups left behind
await prisma.packageGroup.deleteMany({
where: { packages: { none: {} }, id: { not: group.id } },
@@ -670,3 +820,13 @@ export async function dissolveGroup(groupId: string) {
});
await prisma.packageGroup.delete({ where: { id: groupId } });
}
export async function mergeGroups(targetGroupId: string, sourceGroupId: string) {
// Move all packages from source group to target group
await prisma.package.updateMany({
where: { packageGroupId: sourceGroupId },
data: { packageGroupId: targetGroupId },
});
// Delete the now-empty source group
await prisma.packageGroup.delete({ where: { id: sourceGroupId } });
}

View File

@@ -55,6 +55,8 @@ export interface SkippedPackageItem {
sourceMessageId: string;
isMultipart: boolean;
partCount: number;
/** How many times the worker has tried this source message across cycles. */
attemptCount: number;
createdAt: string;
}

View File

@@ -12,8 +12,8 @@
"@prisma/client": "^7.4.0",
"pg": "^8.18.0",
"pino": "^9.6.0",
"prebuilt-tdlib": "^0.1008050.0",
"tdl": "^8.0.0",
"prebuilt-tdlib": "^0.1008064.0",
"tdl": "^8.1.0",
"yauzl": "^3.2.0"
},
"devDependencies": {
@@ -568,9 +568,9 @@
"license": "MIT"
},
"node_modules/@prebuilt-tdlib/darwin-arm64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-arm64/-/darwin-arm64-0.1008050.0.tgz",
"integrity": "sha512-XrWN7M1gfvnzOBRX0YdXVfhSxIDSs/ZJ16QJ0ILDKe+grOFl/cfl7lwB/hK/MlHC6Rev56f5X7xaWnjMh0vktQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-arm64/-/darwin-arm64-0.1008064.0.tgz",
"integrity": "sha512-Oq5us+o0g68Jag74RIV3LdLkZxQxJMcOdrVbgmyE7Unk+WcifqTb/gZw1rS6BrW+2SX2LNeGY4zQqqBTNDr17Q==",
"cpu": [
"arm64"
],
@@ -581,9 +581,9 @@
]
},
"node_modules/@prebuilt-tdlib/darwin-x64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-x64/-/darwin-x64-0.1008050.0.tgz",
"integrity": "sha512-a1UfBW0lYx4tUy5viMPtsbqBfBncCAgDu3FPjljfYTHjP8wfkKFxpp5+8wdxhyqdy3QriWaipVtUXQgOeEWMJg==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/darwin-x64/-/darwin-x64-0.1008064.0.tgz",
"integrity": "sha512-Pz11xjET2Y3uUJKxkWKBc0dmOtlykmBdZ9D6Ahh+EsoLDLIWHm7M91p6nZT396YZ4n2BL+FtDYK65Ae3LDIA5g==",
"cpu": [
"x64"
],
@@ -594,9 +594,22 @@
]
},
"node_modules/@prebuilt-tdlib/linux-arm64-glibc": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-glibc/-/linux-arm64-glibc-0.1008050.0.tgz",
"integrity": "sha512-HRGspdQYzaBkU+W2M8uY5OgOkmgfTkyHkTYan/dn7EE/38QdIFW0YTvmGrl3DoFV2PA+SeJQw0xqK8tMSyHKaA==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-glibc/-/linux-arm64-glibc-0.1008064.0.tgz",
"integrity": "sha512-1kML9+RCfTOTWLzxq2klCN962/XwYhd+SGd4BxOwcmvPniYDNRUXtgMi3qRyV/Flola8dchGFrqZJU4kNZNLuQ==",
"cpu": [
"arm64"
],
"license": "0BSD",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@prebuilt-tdlib/linux-arm64-musl": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-arm64-musl/-/linux-arm64-musl-0.1008064.0.tgz",
"integrity": "sha512-tN9FJOR8VDfmOoTHMivAqBfQ/d9Bry9T/9cGSTcms3H4ORun/WO5U5zT8VqadAsqjuiQ8Y9HaUqqz65xBDtcgw==",
"cpu": [
"arm64"
],
@@ -607,9 +620,9 @@
]
},
"node_modules/@prebuilt-tdlib/linux-x64-glibc": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-glibc/-/linux-x64-glibc-0.1008050.0.tgz",
"integrity": "sha512-Yf6ve3Dzxc66kV1cijFLn7EXKhPN5YHTjtJABEaCR5euetCI2wZp/1uBsXvyYTuFXqQbMfjO3xUCXUIBhLoChw==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-glibc/-/linux-x64-glibc-0.1008064.0.tgz",
"integrity": "sha512-7fyCp2uk0BdeHKJ9PyQOCditC9vBXeeIjYPAKKBcrkum5bi1e9txy2g5kkGjqwUkN0ntIniS5QfHEyr17Idr9g==",
"cpu": [
"x64"
],
@@ -619,10 +632,30 @@
"linux"
]
},
"node_modules/@prebuilt-tdlib/linux-x64-musl": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/linux-x64-musl/-/linux-x64-musl-0.1008064.0.tgz",
"integrity": "sha512-e2zRucrRrrK6M04iQWMfwtrts+VvVtyUwtTP1hF2g3a6jW+AHMzFoB9Wu8fWr+vuJflLIQ6sG9r3lI07Q8NenQ==",
"cpu": [
"x64"
],
"license": "0BSD",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@prebuilt-tdlib/types": {
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/types/-/types-0.1008064.0.tgz",
"integrity": "sha512-eqr1+fiHZ+Gj4lwcITzMp6FwPg8UrxlxxaFjhiJRHL9BlbmD2QkCRHac4wW1Sx8Dzwzd7f+xO21Pgi7TBRSwmw==",
"license": "0BSD",
"optional": true
},
"node_modules/@prebuilt-tdlib/win32-x64": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/win32-x64/-/win32-x64-0.1008050.0.tgz",
"integrity": "sha512-4v8tU5bodMcLhzrWWXzIzqdHBIpq0wim+7sDmQWQIMy3kDeIzVtpuM+vQjxrGoeH9oWr2WXSRKuj93ld7G5NbQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/@prebuilt-tdlib/win32-x64/-/win32-x64-0.1008064.0.tgz",
"integrity": "sha512-rkacZWexQw52/EUaLAmbsu2+P3C1/AtinlCjfiX07oQAEg3327BCEZqrcY0ER83D8+MMf2pfwMPCDJKytr4hcg==",
"cpu": [
"x64"
],
@@ -1696,16 +1729,19 @@
}
},
"node_modules/prebuilt-tdlib": {
"version": "0.1008050.0",
"resolved": "https://registry.npmjs.org/prebuilt-tdlib/-/prebuilt-tdlib-0.1008050.0.tgz",
"integrity": "sha512-CfeQE1rG51d2iC6m72fzrbCW4mqI17ugil9pVurWHtfUJi1Fcn7zadpTzDoUl4oc1dEtKgM7S24DVP67gcl4SQ==",
"version": "0.1008064.0",
"resolved": "https://registry.npmjs.org/prebuilt-tdlib/-/prebuilt-tdlib-0.1008064.0.tgz",
"integrity": "sha512-jJLowKZoH4slXYrkTkKlEgyGsIGv61AWjDZcxxVxJYu21X3kmukGwbCpk4ML99cJp2CwRsD41GCEQBkKJAwCUg==",
"license": "MIT",
"optionalDependencies": {
"@prebuilt-tdlib/darwin-arm64": "0.1008050.0",
"@prebuilt-tdlib/darwin-x64": "0.1008050.0",
"@prebuilt-tdlib/linux-arm64-glibc": "0.1008050.0",
"@prebuilt-tdlib/linux-x64-glibc": "0.1008050.0",
"@prebuilt-tdlib/win32-x64": "0.1008050.0"
"@prebuilt-tdlib/darwin-arm64": "0.1008064.0",
"@prebuilt-tdlib/darwin-x64": "0.1008064.0",
"@prebuilt-tdlib/linux-arm64-glibc": "0.1008064.0",
"@prebuilt-tdlib/linux-arm64-musl": "0.1008064.0",
"@prebuilt-tdlib/linux-x64-glibc": "0.1008064.0",
"@prebuilt-tdlib/linux-x64-musl": "0.1008064.0",
"@prebuilt-tdlib/types": "0.1008064.0",
"@prebuilt-tdlib/win32-x64": "0.1008064.0"
}
},
"node_modules/prisma": {
@@ -1998,13 +2034,13 @@
"license": "MIT"
},
"node_modules/tdl": {
"version": "8.0.2",
"resolved": "https://registry.npmjs.org/tdl/-/tdl-8.0.2.tgz",
"integrity": "sha512-KYxlJ4eao7FUu91U1dCDkaHmK70JAyZ1KqitkKqpPC7rxAiXWhaYxddWvt84UxIYoWbgdd0B70FYJ4p/YqpFCA==",
"version": "8.1.0",
"resolved": "https://registry.npmjs.org/tdl/-/tdl-8.1.0.tgz",
"integrity": "sha512-idpw60gjJdiJALQg0+6UbxtJTMxVhzZAgCO6QzL81gqBYCkEFjm9zM9HwTTQGeOaAavw4yRHymR68yUUiCoKrA==",
"hasInstallScript": true,
"license": "MIT",
"dependencies": {
"debug": "^4.4.0",
"debug": "^4.4.3",
"node-addon-api": "^7.1.1",
"node-gyp-build": "^4.8.4"
},

View File

@@ -13,8 +13,8 @@
"@prisma/client": "^7.4.0",
"pg": "^8.18.0",
"pino": "^9.6.0",
"prebuilt-tdlib": "^0.1008050.0",
"tdl": "^8.0.0",
"prebuilt-tdlib": "^0.1008064.0",
"tdl": "^8.1.0",
"yauzl": "^3.2.0"
},
"devDependencies": {

View File

@@ -0,0 +1,93 @@
import { execFile } from "child_process";
import { promisify } from "util";
import { childLogger } from "../util/logger.js";
const execFileAsync = promisify(execFile);
const log = childLogger("integrity");
export type IntegrityResult =
| { ok: true }
| { ok: false; reason: string };
/**
* Test that the archive can be read end-to-end without errors, BEFORE we
* spend bandwidth uploading it to the destination channel. Catches:
* - Truncated downloads (rare given our size check, but cheap to confirm)
* - CRC errors inside the archive
* - Bad central directories
* - Encrypted archives (we report them as failures rather than upload
* a file users can't extract)
*
* Returns { ok: true } if the archive is intact. Returns
* { ok: false, reason } otherwise. Logs at warn level on failure.
*
* For multipart archives, pass the first part. unzip / unrar / 7z all
* auto-discover sibling parts.
*
* archiveType "DOCUMENT" is a pass-through — there's no container to test.
*/
export async function testArchiveIntegrity(
archiveType: "ZIP" | "RAR" | "SEVEN_Z" | "DOCUMENT",
firstPartPath: string
): Promise<IntegrityResult> {
if (archiveType === "DOCUMENT") {
return { ok: true };
}
try {
if (archiveType === "ZIP") {
// -t = test, -qq = very quiet (errors only)
const { stderr } = await execFileAsync("unzip", ["-tqq", firstPartPath], {
timeout: 300_000, // 5 min for very large archives
maxBuffer: 10 * 1024 * 1024,
});
if (stderr && stderr.trim()) {
return { ok: false, reason: `unzip -t reported: ${stderr.slice(0, 500)}` };
}
return { ok: true };
}
if (archiveType === "RAR") {
const { stdout, stderr } = await execFileAsync("unrar", ["t", firstPartPath], {
timeout: 300_000,
maxBuffer: 10 * 1024 * 1024,
});
// unrar uses non-zero exit code on errors, which becomes a throw.
// If it succeeds, "All OK" is in stdout.
const combined = `${stdout}\n${stderr}`;
if (/All OK/i.test(combined)) {
return { ok: true };
}
return { ok: false, reason: `unrar t did not report "All OK": ${combined.slice(-500)}` };
}
if (archiveType === "SEVEN_Z") {
const { stdout, stderr } = await execFileAsync("7z", ["t", firstPartPath], {
timeout: 300_000,
maxBuffer: 10 * 1024 * 1024,
});
const combined = `${stdout}\n${stderr}`;
if (/Everything is Ok/i.test(combined)) {
return { ok: true };
}
return { ok: false, reason: `7z t did not report "Everything is Ok": ${combined.slice(-500)}` };
}
return { ok: false, reason: `Unknown archive type: ${archiveType}` };
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
// execFile throws on non-zero exit. Try to extract the most useful part.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const stderr = (err as any)?.stderr as string | undefined;
const detail = stderr ? `: ${stderr.slice(0, 500)}` : "";
// Specifically flag encrypted archives so the caller can record a more
// specific SkipReason / notification.
if (/password|encrypted|need.*password/i.test(`${msg}${detail}`)) {
return { ok: false, reason: `Archive is encrypted (password protected): ${msg}${detail}` };
}
log.debug({ err, archiveType, firstPartPath }, "Archive integrity test failed");
return { ok: false, reason: `Integrity test failed: ${msg}${detail}` };
}
}

View File

@@ -11,6 +11,11 @@ export interface TelegramMessage {
fileSize: bigint;
date: Date;
mediaAlbumId?: string;
replyToMessageId?: bigint;
caption?: string;
/** TDLib's `remote.unique_id` for the file — stable across reposts of
* the exact same content. Empty string if the message didn't expose it. */
remoteUniqueId?: string;
}
export interface ArchiveSet {

View File

@@ -8,83 +8,122 @@ const execFileAsync = promisify(execFile);
const log = childLogger("rar-reader");
/**
* Parse output of `unrar l -v <file>` to extract file metadata.
* unrar automatically discovers sibling parts when they're co-located.
* Parse output of `unrar lt <file>` to extract file metadata.
*
* `lt` (list technical) emits one block per archived file with key:value
* lines — far more reliable than the column-based default `l -v` output,
* which has changed format twice across unrar versions.
*
* unrar automatically discovers sibling multipart files when they're
* co-located (e.g. *.part1.rar + *.part2.rar in the same directory).
*
* Returns [] on any failure (best-effort: ingestion still succeeds with
* an empty file list rather than failing the whole archive).
*/
export async function readRarContents(
firstPartPath: string
): Promise<FileEntry[]> {
try {
const { stdout } = await execFileAsync("unrar", ["l", "-v", firstPartPath], {
timeout: 30000,
maxBuffer: 10 * 1024 * 1024, // 10MB for very large archives
const { stdout } = await execFileAsync("unrar", ["lt", firstPartPath], {
timeout: 60_000,
maxBuffer: 50 * 1024 * 1024, // 50MB for archives with very many files
});
return parseUnrarOutput(stdout);
const entries = parseUnrarTechnical(stdout);
if (entries.length === 0) {
// Log a sample of the output so we can diagnose format changes
log.warn(
{ file: firstPartPath, sample: stdout.slice(0, 500) },
"unrar lt returned no parseable entries"
);
}
return entries;
} catch (err) {
log.warn({ err, file: firstPartPath }, "Failed to read RAR contents");
return []; // Fallback: return empty on error
return [];
}
}
/**
* Parse the tabular output of `unrar l -v`.
* Parse `unrar lt` output: header followed by per-file key:value blocks
* separated by blank lines.
*
* Example output format:
* Archive: test.rar
* Details: RAR 5
* Example block:
*
* Attributes Size Packed Ratio Date Time CRC-32 Name
* ----------- --------- --------- ----- -------- ----- -------- ----
* ...A.... 12345 10234 83% 2024-01-15 10:30 DEADBEEF folder/file.stl
* ----------- --------- --------- ----- -------- ----- -------- ----
* Name: folder/file.stl
* Type: File
* Size: 12345
* Packed size: 10234
* Ratio: 83%
* mtime: 2024-01-15 10:30:00,000000000
* Attributes: ..A....
* CRC32: DEADBEEF
* Host OS: Windows
* Compression: RAR 5.0(v50) -m3 -md=32M
*/
function parseUnrarOutput(output: string): FileEntry[] {
function parseUnrarTechnical(output: string): FileEntry[] {
const entries: FileEntry[] = [];
const lines = output.split("\n");
// Split into blocks on blank lines, then on each block read key:value pairs.
const blocks = output.split(/\r?\n\s*\r?\n/);
let inFileList = false;
let separatorCount = 0;
for (const block of blocks) {
const fields = parseBlock(block);
if (!fields) continue;
for (const line of lines) {
const trimmed = line.trim();
// Detect separator lines (------- pattern)
if (/^-{5,}/.test(trimmed)) {
separatorCount++;
if (separatorCount === 1) {
inFileList = true;
} else if (separatorCount >= 2) {
inFileList = false;
}
continue;
}
if (!inFileList) continue;
// Parse file entry line
// Format: Attributes Size Packed Ratio Date Time CRC Name
const match = trimmed.match(
/^\S+\s+(\d+)\s+(\d+)\s+\d+%\s+\S+\s+\S+\s+([0-9A-Fa-f]+)\s+(.+)$/
);
if (match) {
const [, uncompressedStr, compressedStr, crc32, filePath] = match;
// Skip directory entries (typically end with / or have size 0 with dir attributes)
if (filePath.endsWith("/") || filePath.endsWith("\\")) continue;
// Only File entries (skip Directory, and anything missing the basics)
if (fields.type && fields.type.toLowerCase() !== "file") continue;
if (!fields.name || fields.size === undefined) continue;
const filePath = fields.name;
const ext = path.extname(filePath).toLowerCase();
entries.push({
path: filePath,
fileName: path.basename(filePath),
extension: ext ? ext.slice(1) : null,
compressedSize: BigInt(compressedStr),
uncompressedSize: BigInt(uncompressedStr),
crc32: crc32.toLowerCase(),
uncompressedSize: BigInt(fields.size),
compressedSize: fields.packedSize !== undefined
? BigInt(fields.packedSize)
: BigInt(fields.size),
crc32: fields.crc32 ? fields.crc32.toLowerCase() : null,
});
}
}
return entries;
}
interface BlockFields {
name?: string;
type?: string;
size?: string;
packedSize?: string;
crc32?: string;
}
function parseBlock(block: string): BlockFields | null {
// Skip the archive-header block (contains "Archive:" / "Details:" lines
// and lacks a Name field).
if (!/^\s*Name:/m.test(block)) return null;
const fields: BlockFields = {};
const lines = block.split(/\r?\n/);
for (const line of lines) {
// Match " key: value" with arbitrary leading whitespace and a multi-word
// key (e.g. "Packed size", "Host OS").
const m = line.match(/^\s*([A-Za-z][A-Za-z0-9 ]*?)\s*:\s*(.*)$/);
if (!m) continue;
const key = m[1].trim().toLowerCase();
const value = m[2].trim();
if (key === "name") fields.name = value;
else if (key === "type") fields.type = value;
else if (key === "size") fields.size = value;
else if (key === "packed size") fields.packedSize = value;
else if (key === "crc32" || key === "blake2sp" || key === "checksum") {
// unrar may report BLAKE2sp for newer archives instead of CRC32.
// Either way we just store it as a hex string in our crc32 field.
fields.crc32 = value;
}
}
return fields;
}

View File

@@ -0,0 +1,58 @@
import type { FileEntry } from "./zip-reader.js";
/**
* Mapping from file extensions to slicer tags. Each tag groups a family of
* extensions that mean the same thing for end users — "this archive contains
* files I can open in <slicer>".
*
* Extensions are matched case-insensitively without the leading dot.
*/
const SLICER_EXTENSION_MAP: Record<string, string> = {
// Lychee Slicer
lys: "lychee",
lyt: "lychee",
lyc: "lychee",
// ChituBox / Anycubic / Phrozen / Elegoo (resin printers)
chitubox: "chitubox",
ctb: "chitubox",
cbddlp: "chitubox",
// Anycubic Photon family
photon: "anycubic",
pwmo: "anycubic",
pwmx: "anycubic",
pwmb: "anycubic",
pwma: "anycubic",
pws: "anycubic",
pwsq: "anycubic",
phz: "anycubic",
// Bambu / Prusa
"3mf": "bambu",
bgcode: "bambu",
// FDM gcode (generic)
gcode: "fdm",
// Mango / generic resin formats sometimes seen in releases
mfp: "mango",
mfpv: "mango",
osla: "mango",
};
/**
* Derive a deduplicated list of slicer tags from an archive's file listing.
* Returns an empty array if no recognised slicer-specific files are present
* (e.g., the archive is just STLs without pre-supports).
*/
export function extractSlicerTags(entries: FileEntry[]): string[] {
const tags = new Set<string>();
for (const entry of entries) {
if (!entry.extension) continue;
const ext = entry.extension.toLowerCase();
const tag = SLICER_EXTENSION_MAP[ext];
if (tag) tags.add(tag);
}
return [...tags].sort();
}

View File

@@ -3,31 +3,37 @@ import { stat } from "fs/promises";
import path from "path";
import { pipeline } from "stream/promises";
import { childLogger } from "../util/logger.js";
import { config } from "../util/config.js";
const log = childLogger("split");
/**
* 1950 MiB — safely under Telegram's 2GB upload limit.
* At exactly 2GiB, TDLib's internal 512KB chunking can exceed Telegram's
* Maximum part size for Telegram upload. Configurable via MAX_PART_SIZE_MB env var.
* Default: 1950 MiB (safely under 2GB non-Premium limit).
* Premium: set to 3900 MiB (safely under 4GB Premium limit).
*
* At exactly 2/4 GiB, TDLib's internal 512KB chunking can exceed Telegram's
* 4000-part threshold, causing FILE_PARTS_INVALID errors.
*/
const MAX_PART_SIZE = 1950n * 1024n * 1024n;
const MAX_PART_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n;
/**
* Split a file into ≤2GB parts using byte-level splitting.
* Returns paths to the split parts. If the file is already ≤2GB, returns the original path.
* Split a file into parts using byte-level splitting.
* Returns paths to the split parts. If the file fits in one part, returns the original path.
* Pass maxPartSize to override the global default (e.g., 3950 MiB for Premium accounts).
*/
export async function byteLevelSplit(filePath: string): Promise<string[]> {
export async function byteLevelSplit(filePath: string, maxPartSize?: bigint): Promise<string[]> {
const effectiveMax = maxPartSize ?? MAX_PART_SIZE;
const stats = await stat(filePath);
const fileSize = BigInt(stats.size);
if (fileSize <= MAX_PART_SIZE) {
if (fileSize <= effectiveMax) {
return [filePath];
}
const dir = path.dirname(filePath);
const baseName = path.basename(filePath);
const partSize = Number(MAX_PART_SIZE);
const partSize = Number(effectiveMax);
const totalParts = Math.ceil(Number(fileSize) / partSize);
const parts: string[] = [];

119
worker/src/audit.ts Normal file
View File

@@ -0,0 +1,119 @@
import { db } from "./db/client.js";
import { childLogger } from "./util/logger.js";
const log = childLogger("audit");
/**
* Periodic integrity audit: checks all packages for consistency.
* Creates SystemNotification records for any issues found.
*
* Checks performed:
* 1. Multipart completeness: destMessageIds.length should match partCount
* 2. Missing destination: packages with destChannelId but no destMessageId
*/
export async function runIntegrityAudit(): Promise<{ checked: number; issues: number }> {
log.info("Starting integrity audit");
let checked = 0;
let issues = 0;
// Check 1: Multipart packages with wrong number of destination message IDs
const multipartPackages = await db.package.findMany({
where: {
isMultipart: true,
partCount: { gt: 1 },
destMessageId: { not: null },
},
select: {
id: true,
fileName: true,
partCount: true,
destMessageIds: true,
sourceChannelId: true,
sourceChannel: { select: { title: true } },
},
});
checked += multipartPackages.length;
for (const pkg of multipartPackages) {
const actualParts = pkg.destMessageIds.length;
// Only flag when we have >1 stored IDs but count doesn't match.
// Packages with exactly 1 ID are legacy (backfilled from single destMessageId) — not actionable.
if (actualParts > 1 && actualParts !== pkg.partCount) {
issues++;
// Check if we already have a notification for this
const existing = await db.systemNotification.findFirst({
where: {
type: "MISSING_PART",
context: { path: ["packageId"], equals: pkg.id },
},
select: { id: true },
});
if (!existing) {
await db.systemNotification.create({
data: {
type: "MISSING_PART",
severity: "WARNING",
title: `Incomplete multipart: ${pkg.fileName}`,
message: `Expected ${pkg.partCount} parts but only ${actualParts} destination message IDs stored`,
context: {
packageId: pkg.id,
fileName: pkg.fileName,
expectedParts: pkg.partCount,
actualParts,
sourceChannelId: pkg.sourceChannelId,
channelTitle: pkg.sourceChannel.title,
},
},
});
log.warn(
{ packageId: pkg.id, fileName: pkg.fileName, expected: pkg.partCount, actual: actualParts },
"Multipart package has mismatched part count"
);
}
}
}
// Check 2: Packages with dest channel but no dest message (orphaned index)
const orphanedCount = await db.package.count({
where: {
destChannelId: { not: null },
destMessageId: null,
},
});
if (orphanedCount > 0) {
issues++;
const existing = await db.systemNotification.findFirst({
where: {
type: "INTEGRITY_AUDIT",
context: { path: ["check"], equals: "orphaned_index" },
createdAt: { gte: new Date(Date.now() - 24 * 60 * 60 * 1000) },
},
select: { id: true },
});
if (!existing) {
await db.systemNotification.create({
data: {
type: "INTEGRITY_AUDIT",
severity: "INFO",
title: `${orphanedCount} packages with missing destination message`,
message: `Found ${orphanedCount} packages that have a destination channel set but no destination message ID. These may be from interrupted uploads.`,
context: {
check: "orphaned_index",
count: orphanedCount,
},
},
});
}
}
log.info({ checked, issues }, "Integrity audit complete");
return { checked, issues };
}

343
worker/src/backfill.ts Normal file
View File

@@ -0,0 +1,343 @@
import path from "path";
import { mkdir, rm } from "fs/promises";
import { db } from "./db/client.js";
import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js";
import { withTdlibMutex } from "./util/mutex.js";
import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js";
import { downloadFile } from "./tdlib/download.js";
import { getActiveAccounts } from "./db/queries.js";
import { readZipCentralDirectory } from "./archive/zip-reader.js";
import { readRarContents } from "./archive/rar-reader.js";
import { read7zContents } from "./archive/sevenz-reader.js";
import { extractSlicerTags } from "./archive/slicer-tags.js";
import type { FileEntry } from "./archive/zip-reader.js";
const log = childLogger("backfill");
/**
* Re-extract file listings for Packages whose fileCount is 0 — usually
* caused by historical bugs in the archive readers (e.g. the RAR parser
* that silently returned [] for every archive before 0bdd4ba).
*
* For each candidate Package:
* 1. Download all destMessageIds from the destination channel
* 2. Run the appropriate reader (ZIP / RAR / 7Z) on the assembled files
* 3. Insert PackageFile rows + update Package.fileCount
* 4. Clean up the temp files
*
* Triggered via pg_notify "backfill_filelists" with optional payload
* `{"limit": N, "archiveType": "RAR"}` — both fields optional, defaults
* are limit=100, archiveType=any.
*/
export async function processBackfillRequest(payloadJson: string): Promise<void> {
let limit = 100;
let archiveTypeFilter: "ZIP" | "RAR" | "SEVEN_Z" | undefined;
try {
const parsed = JSON.parse(payloadJson) as { limit?: number; archiveType?: string };
if (typeof parsed.limit === "number" && parsed.limit > 0) limit = parsed.limit;
if (parsed.archiveType === "ZIP" || parsed.archiveType === "RAR" || parsed.archiveType === "SEVEN_Z") {
archiveTypeFilter = parsed.archiveType;
}
} catch {
// Empty / invalid payload — use defaults
}
const candidates = await db.package.findMany({
where: {
fileCount: 0,
destChannelId: { not: null },
destMessageId: { not: null },
archiveType: archiveTypeFilter
? archiveTypeFilter
: { in: ["ZIP", "RAR", "SEVEN_Z"] },
},
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
destChannelId: true,
destMessageId: true,
destMessageIds: true,
isMultipart: true,
partCount: true,
},
orderBy: { createdAt: "asc" },
take: limit,
});
if (candidates.length === 0) {
log.info({ archiveTypeFilter }, "Backfill: no candidates with fileCount=0");
return;
}
log.info(
{ count: candidates.length, archiveTypeFilter },
"Backfill: starting batch"
);
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("Backfill: no authenticated accounts — aborting");
return;
}
// Prefer the Premium account if available (faster downloads, larger files)
const account = accounts.find((a) => a.isPremium) ?? accounts[0];
await withTdlibMutex(account.phone, "backfill", async () => {
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
// Load chats so TDLib knows about the destination chat
try {
await client.invoke({
_: "getChats",
chat_list: { _: "chatListMain" },
limit: 1000,
});
} catch {
// May already be loaded
}
let processed = 0;
let succeeded = 0;
let failed = 0;
for (const pkg of candidates) {
processed++;
const ctx = { packageId: pkg.id, fileName: pkg.fileName };
try {
await processOnePackage(client, pkg, ctx);
succeeded++;
} catch (err) {
failed++;
log.warn({ err, ...ctx }, "Backfill failed for package");
}
}
log.info(
{ processed, succeeded, failed, archiveTypeFilter },
"Backfill batch complete"
);
} finally {
await closeTdlibClient(client).catch(() => {});
}
});
}
interface BackfillPackage {
id: string;
fileName: string;
fileSize: bigint;
archiveType: "ZIP" | "RAR" | "SEVEN_Z" | "DOCUMENT" | string;
destChannelId: string | null;
destMessageId: bigint | null;
destMessageIds: bigint[];
isMultipart: boolean;
partCount: number;
}
async function processOnePackage(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
client: any,
pkg: BackfillPackage,
ctx: { packageId: string; fileName: string }
): Promise<void> {
if (!pkg.destChannelId || !pkg.destMessageId) {
log.debug(ctx, "Skipping: no destination channel/message");
return;
}
// Look up the destination channel's Telegram ID
const destChannel = await db.telegramChannel.findUnique({
where: { id: pkg.destChannelId },
select: { telegramId: true },
});
if (!destChannel) {
throw new Error("Destination channel not found in DB");
}
const chatId = Number(destChannel.telegramId);
// Resolve which message IDs to download. The Package may carry a
// single destMessageId or multiple destMessageIds (for multipart).
const messageIds: bigint[] =
pkg.destMessageIds.length > 0
? pkg.destMessageIds
: pkg.destMessageId
? [pkg.destMessageId]
: [];
if (messageIds.length === 0) {
throw new Error("Package has no destination message IDs");
}
const tempDir = path.join(config.tempDir, `backfill_${pkg.id}`);
await mkdir(tempDir, { recursive: true });
try {
const partPaths: string[] = [];
for (let i = 0; i < messageIds.length; i++) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const message = (await client.invoke({
_: "getMessage",
chat_id: chatId,
message_id: Number(messageIds[i]),
})) as unknown as {
content?: { document?: { file_name?: string; document?: { id: number; size: number } } };
};
const doc = message?.content?.document;
if (!doc?.document?.id) {
throw new Error(`Destination message ${messageIds[i]} has no document`);
}
const fileId = String(doc.document.id);
const fileName = doc.file_name ?? `${pkg.id}.part${i + 1}`;
const localPath = path.join(tempDir, fileName);
await downloadFile(
client,
fileId,
localPath,
BigInt(doc.document.size),
fileName
);
partPaths.push(localPath);
}
// Run the appropriate reader on the assembled file(s)
let entries: FileEntry[] = [];
if (pkg.archiveType === "ZIP") {
entries = await readZipCentralDirectory(partPaths);
} else if (pkg.archiveType === "RAR") {
// unrar auto-discovers sibling parts when in the same directory
entries = await readRarContents(partPaths[0]);
} else if (pkg.archiveType === "SEVEN_Z") {
entries = await read7zContents(partPaths[0]);
} else {
log.debug({ ...ctx, archiveType: pkg.archiveType }, "Skipping unsupported archive type");
return;
}
if (entries.length === 0) {
log.warn(ctx, "Reader returned 0 entries — archive may be encrypted or corrupt");
return;
}
// Also derive slicer tags from the file list so the backfilled packages
// gain the same search/filter context as newly-ingested ones.
const slicerTags = extractSlicerTags(entries);
// Write everything in a single transaction so a partial backfill never
// leaves the Package half-indexed.
await db.$transaction(async (tx) => {
// Re-check fileCount inside the transaction: another worker might
// have backfilled this package between our read and write.
const current = await tx.package.findUnique({
where: { id: pkg.id },
select: { fileCount: true, tags: true },
});
if (current && current.fileCount > 0) {
log.debug({ ...ctx, existingFileCount: current.fileCount }, "Already backfilled by another worker — skipping");
return;
}
await tx.packageFile.deleteMany({ where: { packageId: pkg.id } });
await tx.packageFile.createMany({
data: entries.map((e) => ({
packageId: pkg.id,
path: e.path,
fileName: e.fileName,
extension: e.extension,
compressedSize: e.compressedSize,
uncompressedSize: e.uncompressedSize,
crc32: e.crc32,
})),
});
// Merge slicer tags with whatever's already on the Package (preserve
// channel category, manual tags, etc.).
const existingTags = current?.tags ?? [];
const mergedTags = [...new Set([...existingTags, ...slicerTags])];
await tx.package.update({
where: { id: pkg.id },
data: { fileCount: entries.length, tags: mergedTags },
});
});
log.info({ ...ctx, fileCount: entries.length }, "Backfilled file list");
} finally {
await rm(tempDir, { recursive: true, force: true }).catch(() => {});
}
}
/**
* Cheap pure-DB backfill: walk Packages that already have PackageFile rows
* but no slicer tags, recompute the tags from their extensions, and merge
* with the existing tag list. No downloads, no TDLib.
*
* Trigger:
* SELECT pg_notify('backfill_slicer_tags', '{"limit":1000}');
*/
export async function processSlicerTagBackfill(payloadJson: string): Promise<void> {
let limit = 1000;
try {
const parsed = JSON.parse(payloadJson) as { limit?: number };
if (typeof parsed.limit === "number" && parsed.limit > 0) limit = parsed.limit;
} catch {
// Use default
}
// KNOWN_TAGS = the slicer tags we know how to derive. A Package missing
// all of these is a candidate for recompute. extractSlicerTags is safe
// to run on every package (returns [] for archives with no slicer files),
// but filtering up-front avoids walking the entire DB.
const KNOWN_TAGS = ["lychee", "chitubox", "anycubic", "bambu", "fdm", "mango"];
const candidates = await db.package.findMany({
where: {
fileCount: { gt: 0 },
NOT: { tags: { hasSome: KNOWN_TAGS } },
},
select: {
id: true,
tags: true,
files: { select: { extension: true } },
},
orderBy: { createdAt: "asc" },
take: limit,
});
if (candidates.length === 0) {
log.info("Slicer tag backfill: no candidates");
return;
}
log.info({ count: candidates.length }, "Slicer tag backfill: starting");
let updated = 0;
for (const pkg of candidates) {
const fileEntries = pkg.files.map((f) => ({
path: "",
fileName: "",
extension: f.extension,
compressedSize: 0n,
uncompressedSize: 0n,
crc32: null as string | null,
}));
const slicerTags = extractSlicerTags(fileEntries);
if (slicerTags.length === 0) continue;
const merged = [...new Set([...pkg.tags, ...slicerTags])];
if (merged.length === pkg.tags.length) continue;
await db.package.update({ where: { id: pkg.id }, data: { tags: merged } });
updated++;
}
log.info({ candidates: candidates.length, updated }, "Slicer tag backfill: done");
}

View File

@@ -5,7 +5,14 @@ import { config } from "../util/config.js";
const pool = new pg.Pool({
connectionString: config.databaseUrl,
max: 5,
// Pool needs headroom for: 2 account advisory locks (held for entire cycle),
// up to 2 concurrent hash locks, plus Prisma operations from both accounts.
// Previously max=5 caused pool exhaustion and indefinite hangs.
max: 15,
// Prevent pool.connect() from blocking forever when pool is exhausted.
// Throws an error after 30s so the operation can fail and retry instead of
// silently hanging for hours (as happened with the Turnbase.7z stall).
connectionTimeoutMillis: 30_000,
});
const adapter = new PrismaPg(pool);

View File

@@ -79,3 +79,66 @@ export async function releaseLock(accountId: string): Promise<void> {
client.release();
}
}
/**
* Derive a lock ID for a content hash. Prefixes with "hash:" so the resulting
* 32-bit integer does not collide with account advisory lock IDs.
*/
function contentHashToLockId(contentHash: string): number {
return hashToLockId(`hash:${contentHash}`);
}
/**
* Acquire a per-content-hash advisory lock before uploading.
* Prevents two concurrent workers from uploading the same archive
* when both scan a shared source channel.
*
* Returns true if acquired (proceed with upload).
* Returns false if already held (another worker is handling this archive — skip).
*
* MUST be released via releaseHashLock() after createPackageStub() completes,
* including on all error paths (use try/finally).
*/
export async function tryAcquireHashLock(contentHash: string): Promise<boolean> {
const lockId = contentHashToLockId(contentHash);
const client = await pool.connect();
try {
const result = await client.query<{ pg_try_advisory_lock: boolean }>(
"SELECT pg_try_advisory_lock($1)",
[lockId]
);
const acquired = result.rows[0]?.pg_try_advisory_lock ?? false;
if (acquired) {
heldConnections.set(`hash:${contentHash}`, client);
log.debug({ hash: contentHash.slice(0, 16), lockId }, "Hash lock acquired");
return true;
} else {
client.release();
log.debug({ hash: contentHash.slice(0, 16), lockId }, "Hash lock held by another worker — skipping");
return false;
}
} catch (err) {
client.release();
throw err;
}
}
/**
* Release the per-content-hash advisory lock.
* Call after createPackageStub() completes (or on any error path).
*/
export async function releaseHashLock(contentHash: string): Promise<void> {
const lockId = contentHashToLockId(contentHash);
const client = heldConnections.get(`hash:${contentHash}`);
if (!client) {
log.warn({ hash: contentHash.slice(0, 16) }, "No held connection for hash lock release");
return;
}
try {
await client.query("SELECT pg_advisory_unlock($1)", [lockId]);
log.debug({ hash: contentHash.slice(0, 16) }, "Hash lock released");
} finally {
heldConnections.delete(`hash:${contentHash}`);
client.release();
}
}

View File

@@ -70,7 +70,109 @@ export async function packageExistsByHash(contentHash: string) {
export async function getUploadedPackageByHash(contentHash: string) {
return db.package.findFirst({
where: { contentHash, destMessageId: { not: null }, destChannelId: { not: null } },
select: { destChannelId: true, destMessageId: true },
select: { destChannelId: true, destMessageId: true, destMessageIds: true },
});
}
export interface CreatePackageStubInput {
contentHash: string;
fileName: string;
fileSize: bigint;
archiveType: ArchiveType;
sourceChannelId: string;
sourceMessageId: bigint;
sourceTopicId?: bigint | null;
/** TDLib remote.unique_id of the first part — for future dedup. */
remoteUniqueId?: string | null;
destChannelId: string;
destMessageId: bigint;
destMessageIds: bigint[];
isMultipart: boolean;
partCount: number;
ingestionRunId: string;
creator?: string | null;
tags?: string[];
}
/**
* Write a minimal Package record immediately after Telegram confirms the upload.
* Call this before preview/metadata extraction so recoverIncompleteUploads() can
* detect and verify the package if the worker crashes mid-metadata.
*
* Follow with updatePackageWithMetadata() once file entries and preview are ready.
*/
export async function createPackageStub(
input: CreatePackageStubInput
): Promise<{ id: string }> {
const pkg = await db.package.create({
data: {
contentHash: input.contentHash,
fileName: input.fileName,
fileSize: input.fileSize,
archiveType: input.archiveType,
sourceChannelId: input.sourceChannelId,
sourceMessageId: input.sourceMessageId,
sourceTopicId: input.sourceTopicId ?? undefined,
remoteUniqueId: input.remoteUniqueId ?? undefined,
destChannelId: input.destChannelId,
destMessageId: input.destMessageId,
destMessageIds: input.destMessageIds,
isMultipart: input.isMultipart,
partCount: input.partCount,
fileCount: 0,
ingestionRunId: input.ingestionRunId,
creator: input.creator ?? undefined,
tags: input.tags?.length ? input.tags : undefined,
},
select: { id: true },
});
try {
await db.$queryRawUnsafe(
`SELECT pg_notify('new_package', $1)`,
JSON.stringify({
packageId: pkg.id,
fileName: input.fileName,
creator: input.creator ?? null,
tags: input.tags ?? [],
})
);
} catch {
// Best-effort
}
return pkg;
}
/**
* Update a stub Package with file entries and preview after metadata extraction.
* Called as Phase 2 of the two-phase write after createPackageStub().
*/
export async function updatePackageWithMetadata(
packageId: string,
input: {
files: {
path: string;
fileName: string;
extension: string | null;
compressedSize: bigint;
uncompressedSize: bigint;
crc32: string | null;
}[];
previewData?: Buffer | null;
previewMsgId?: bigint | null;
}
): Promise<void> {
await db.package.update({
where: { id: packageId },
data: {
fileCount: input.files.length,
previewData: input.previewData ? new Uint8Array(input.previewData) : undefined,
previewMsgId: input.previewMsgId ?? undefined,
files: {
create: input.files,
},
},
});
}
@@ -90,6 +192,73 @@ export async function packageExistsBySourceMessage(
return pkg !== null;
}
/**
* Strongest pre-download dedup signal: a Package in this channel already
* has a matching TDLib remote.unique_id. The unique_id is stable across
* reposts of the exact same file content, so a hit is a guaranteed
* (lossless) duplicate. No false positives.
*
* Falls back to the older findRepostedPackage (name + size) for packages
* that were ingested before we started capturing remote.unique_id.
*/
export async function findPackageByRemoteUniqueId(
sourceChannelId: string,
remoteUniqueId: string
): Promise<{
id: string;
destMessageId: bigint | null;
sourceTopicId: bigint | null;
} | null> {
return db.package.findFirst({
where: {
sourceChannelId,
remoteUniqueId,
destMessageId: { not: null },
},
orderBy: { sourceTopicId: { sort: "desc", nulls: "last" } },
select: { id: true, destMessageId: true, sourceTopicId: true },
});
}
/**
* Detect a likely repost: same source channel + same fileName + same total
* fileSize already exists with destMessageId set. Used to skip downloads
* when the channel admin re-posts the same file under a new message ID
* (which `packageExistsBySourceMessage` cannot catch because the message ID
* is different).
*
* Returns the existing package's destMessageId for logging/observability,
* or null if no match. Approximate: same name + same total size is an
* extremely strong signal that it's the same content, but theoretically
* two unrelated files could collide. If that ever happens, the new file
* gets treated as a duplicate and is lost; the user can manually re-link
* via the UI by removing the existing Package.
*/
export async function findRepostedPackage(
sourceChannelId: string,
fileName: string,
fileSize: bigint
): Promise<{
id: string;
destMessageId: bigint | null;
sourceTopicId: bigint | null;
} | null> {
return db.package.findFirst({
where: {
sourceChannelId,
fileName,
fileSize,
destMessageId: { not: null },
},
// Prefer the existing Package with the most specific (non-NULL)
// sourceTopicId, so when the user is re-scanning and the file already
// exists in a specific topic, the audit log shows the most informative
// match. NULLS LAST in DESC order achieves that for any non-null IDs.
orderBy: { sourceTopicId: { sort: "desc", nulls: "last" } },
select: { id: true, destMessageId: true, sourceTopicId: true },
});
}
/**
* Delete orphaned Package rows that have the same content hash but never
* completed the upload (destMessageId is null). Called before creating a
@@ -111,6 +280,7 @@ export interface CreatePackageInput {
sourceTopicId?: bigint | null;
destChannelId?: string;
destMessageId?: bigint;
destMessageIds?: bigint[];
isMultipart: boolean;
partCount: number;
ingestionRunId: string;
@@ -118,6 +288,8 @@ export interface CreatePackageInput {
tags?: string[];
previewData?: Buffer | null;
previewMsgId?: bigint | null;
sourceCaption?: string | null;
replyToMessageId?: bigint | null;
files: {
path: string;
fileName: string;
@@ -140,6 +312,7 @@ export async function createPackageWithFiles(input: CreatePackageInput) {
sourceTopicId: input.sourceTopicId ?? undefined,
destChannelId: input.destChannelId,
destMessageId: input.destMessageId,
destMessageIds: input.destMessageIds ?? (input.destMessageId ? [input.destMessageId] : []),
isMultipart: input.isMultipart,
partCount: input.partCount,
fileCount: input.files.length,
@@ -148,6 +321,8 @@ export async function createPackageWithFiles(input: CreatePackageInput) {
tags: input.tags && input.tags.length > 0 ? input.tags : undefined,
previewData: input.previewData ? new Uint8Array(input.previewData) : undefined,
previewMsgId: input.previewMsgId ?? undefined,
sourceCaption: input.sourceCaption ?? undefined,
replyToMessageId: input.replyToMessageId ?? undefined,
files: {
create: input.files,
},
@@ -280,6 +455,73 @@ export async function updateLastProcessedMessage(
});
}
export interface ScanStateUpdate {
/** New watermark to persist. Use the same value the caller would have
* passed to updateLastProcessedMessage / upsertTopicProgress. */
lastProcessedMessageId: bigint | null;
/** True if the scan found archives OR has retryable SkippedPackages
* pending. The caller computes this via the trulyIdle formula. */
lastScanFoundArchives: boolean;
/** Pre-incremented value of consecutiveEmptyScans. Caller passes:
* trulyIdle ? prev + 1 : 0
* We do the arithmetic outside the helper so the helper stays a pure
* setter — easier to reason about. */
consecutiveEmptyScans: number;
}
/**
* Atomically update an AccountChannelMap's watermark and scan-state fields.
* Replaces the older updateLastProcessedMessage for the post-scan write.
* Sets lastScannedAt = NOW() server-side.
*/
export async function upsertChannelScanState(
mappingId: string,
update: ScanStateUpdate
) {
return db.accountChannelMap.update({
where: { id: mappingId },
data: {
lastProcessedMessageId: update.lastProcessedMessageId ?? undefined,
lastScannedAt: new Date(),
lastScanFoundArchives: update.lastScanFoundArchives,
consecutiveEmptyScans: update.consecutiveEmptyScans,
},
});
}
/**
* Atomically upsert a TopicProgress row with the new watermark + scan-state
* fields. Same semantics as upsertChannelScanState but for forum topics.
*/
export async function upsertTopicScanState(
accountChannelMapId: string,
topicId: bigint,
topicName: string | null,
update: ScanStateUpdate
) {
return db.topicProgress.upsert({
where: {
accountChannelMapId_topicId: { accountChannelMapId, topicId },
},
create: {
accountChannelMapId,
topicId,
topicName,
lastProcessedMessageId: update.lastProcessedMessageId,
lastScannedAt: new Date(),
lastScanFoundArchives: update.lastScanFoundArchives,
consecutiveEmptyScans: update.consecutiveEmptyScans,
},
update: {
topicName,
lastProcessedMessageId: update.lastProcessedMessageId ?? undefined,
lastScannedAt: new Date(),
lastScanFoundArchives: update.lastScanFoundArchives,
consecutiveEmptyScans: update.consecutiveEmptyScans,
},
});
}
export async function markStaleRunsAsFailed() {
return db.ingestionRun.updateMany({
where: { status: "RUNNING" },
@@ -302,6 +544,16 @@ export async function updateAccountAuthState(
});
}
export async function updateAccountPremiumStatus(
accountId: string,
isPremium: boolean
): Promise<void> {
await db.telegramAccount.update({
where: { id: accountId },
data: { isPremium },
});
}
export async function getAccountAuthCode(accountId: string) {
const account = await db.telegramAccount.findUnique({
where: { id: accountId },
@@ -510,6 +762,7 @@ export async function upsertSkippedPackage(data: {
errorMessage: data.errorMessage ?? null,
fileName: data.fileName,
fileSize: data.fileSize,
attemptCount: { increment: 1 },
createdAt: new Date(),
},
create: {
@@ -527,6 +780,26 @@ export async function upsertSkippedPackage(data: {
});
}
/**
* Return source-message IDs in a channel whose SkippedPackage attemptCount has
* reached or exceeded the cap — these are treated as "permanently failed for
* now" so the watermark can advance past them. The user can manually retry via
* the UI to reset the SkippedPackage record.
*/
export async function getCappedSkippedMessageIds(
sourceChannelId: string,
cap: number
): Promise<Set<bigint>> {
const rows = await db.skippedPackage.findMany({
where: {
sourceChannelId,
attemptCount: { gte: cap },
},
select: { sourceMessageId: true },
});
return new Set(rows.map((r) => r.sourceMessageId));
}
export async function deleteSkippedPackage(
sourceChannelId: string,
sourceMessageId: bigint
@@ -536,6 +809,63 @@ export async function deleteSkippedPackage(
});
}
/**
* Find SkippedPackages for a given account+channel that are still eligible
* for auto-retry (attemptCount below the cap). Used at the start of a scan
* to pull the watermark back so we don't strand failed messages forever
* after the watermark has advanced past them.
*
* For non-forum channels, pass `topicId: null` to get rows with NULL topic.
* For forum channels, pass the topic ID to scope to that topic only.
*/
export async function getRetryableSkippedMessageIds(args: {
accountId: string;
sourceChannelId: string;
topicId: bigint | null;
cap: number;
}): Promise<bigint[]> {
const rows = await db.skippedPackage.findMany({
where: {
accountId: args.accountId,
sourceChannelId: args.sourceChannelId,
sourceTopicId: args.topicId,
attemptCount: { lt: args.cap },
},
select: { sourceMessageId: true },
orderBy: { sourceMessageId: "asc" },
});
return rows.map((r) => r.sourceMessageId);
}
/**
* Update a Package's source topic when a more specific topic context is
* discovered for the same content. Used when findRepostedPackage matches
* an existing Package whose topic is less specific (e.g., "General") than
* the topic we just encountered the file in.
*
* Also updates the creator if the new topic name is more informative than
* the existing creator (i.e., the existing creator was derived from a
* less-specific topic name like "General").
*/
export async function updatePackageTopicContext(
packageId: string,
newTopicId: bigint,
newTopicName: string | null
): Promise<void> {
await db.package.update({
where: { id: packageId },
data: {
sourceTopicId: newTopicId,
// Only overwrite creator if the new topic name is meaningful (non-empty,
// non-General). Keeps explicit creator values from filename or admin
// input intact.
...(newTopicName && newTopicName !== "General"
? { creator: newTopicName }
: {}),
},
});
}
export async function createOrFindPackageGroup(input: {
mediaAlbumId: string;
sourceChannelId: string;
@@ -585,3 +915,46 @@ export async function linkPackagesToGroup(
data: { packageGroupId: groupId },
});
}
export async function createTimeWindowGroup(input: {
sourceChannelId: string;
name: string;
packageIds: string[];
}): Promise<string> {
const group = await db.packageGroup.create({
data: {
sourceChannelId: input.sourceChannelId,
name: input.name,
groupingSource: "AUTO_TIME",
},
});
await db.package.updateMany({
where: { id: { in: input.packageIds } },
data: { packageGroupId: group.id },
});
return group.id;
}
export async function createAutoGroup(input: {
sourceChannelId: string;
name: string;
packageIds: string[];
groupingSource: "ALBUM" | "MANUAL" | "AUTO_TIME" | "AUTO_PATTERN" | "AUTO_ZIP" | "AUTO_CAPTION" | "AUTO_REPLY";
}): Promise<string> {
const group = await db.packageGroup.create({
data: {
sourceChannelId: input.sourceChannelId,
name: input.name,
groupingSource: input.groupingSource,
},
});
await db.package.updateMany({
where: { id: { in: input.packageIds } },
data: { packageGroupId: group.id },
});
return group.id;
}

View File

@@ -101,16 +101,14 @@ export async function processExtractRequest(requestId: string): Promise<void> {
try {
await mkdir(tempDir, { recursive: true });
// Wrap the entire TDLib session in the mutex so no other TDLib
// operation can run concurrently (TDLib is single-session).
await withTdlibMutex("extract", async () => {
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
throw new Error("No authenticated Telegram accounts available");
}
const account = accounts[0];
const client = await createTdlibClient({ id: account.id, phone: account.phone });
await withTdlibMutex(account.phone, "extract", async () => {
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
// Load chat list so TDLib can find the dest channel

View File

@@ -5,6 +5,8 @@ import { withTdlibMutex } from "./util/mutex.js";
import { processFetchRequest } from "./worker.js";
import { processExtractRequest } from "./extract-listener.js";
import { rebuildPackageDatabase } from "./rebuild.js";
import { processManualUpload } from "./manual-upload.js";
import { processBackfillRequest, processSlicerTagBackfill } from "./backfill.js";
import { generateInviteLink, createSupergroup, searchPublicChat } from "./tdlib/chats.js";
import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js";
import { triggerImmediateCycle } from "./scheduler.js";
@@ -13,6 +15,7 @@ import {
getGlobalSetting,
setGlobalSetting,
getActiveAccounts,
getChannelFetchRequest,
upsertChannel,
ensureAccountChannelLink,
updateFetchRequestStatus,
@@ -55,6 +58,9 @@ async function connectListener(): Promise<void> {
await pgClient.query("LISTEN join_channel");
await pgClient.query("LISTEN archive_extract");
await pgClient.query("LISTEN rebuild_packages");
await pgClient.query("LISTEN manual_upload");
await pgClient.query("LISTEN backfill_filelists");
await pgClient.query("LISTEN backfill_slicer_tags");
pgClient.on("notification", (msg) => {
if (msg.channel === "channel_fetch" && msg.payload) {
@@ -71,6 +77,12 @@ async function connectListener(): Promise<void> {
handleArchiveExtract(msg.payload);
} else if (msg.channel === "rebuild_packages" && msg.payload) {
handleRebuildPackages(msg.payload);
} else if (msg.channel === "manual_upload" && msg.payload) {
handleManualUpload(msg.payload);
} else if (msg.channel === "backfill_filelists") {
handleBackfillFilelists(msg.payload ?? "{}");
} else if (msg.channel === "backfill_slicer_tags") {
handleBackfillSlicerTags(msg.payload ?? "{}");
}
});
@@ -96,7 +108,7 @@ async function connectListener(): Promise<void> {
}
});
log.info("Fetch listener started (channel_fetch, generate_invite, create_destination, ingestion_trigger, join_channel, archive_extract, rebuild_packages)");
log.info("Fetch listener started (channel_fetch, generate_invite, create_destination, ingestion_trigger, join_channel, archive_extract, rebuild_packages, manual_upload, backfill_filelists, backfill_slicer_tags)");
} catch (err) {
log.error({ err }, "Failed to start fetch listener — retrying");
scheduleReconnect();
@@ -129,7 +141,9 @@ let fetchQueue: Promise<void> = Promise.resolve();
function handleChannelFetch(requestId: string): void {
fetchQueue = fetchQueue.then(async () => {
try {
await withTdlibMutex("fetch-channels", () =>
const request = await getChannelFetchRequest(requestId);
const key = request?.account?.phone ?? "global";
await withTdlibMutex(key, "fetch-channels", () =>
processFetchRequest(requestId)
);
} catch (err) {
@@ -143,22 +157,20 @@ function handleChannelFetch(requestId: string): void {
function handleGenerateInvite(channelId: string): void {
fetchQueue = fetchQueue.then(async () => {
try {
await withTdlibMutex("generate-invite", async () => {
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to generate invite link");
return;
}
const account = accounts[0];
await withTdlibMutex(account.phone, "generate-invite", async () => {
const destChannel = await getGlobalDestinationChannel();
if (!destChannel || destChannel.id !== channelId) {
log.warn({ channelId }, "Destination channel mismatch, skipping invite generation");
return;
}
// Use the first available authenticated account to generate the link
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to generate invite link");
return;
}
const account = accounts[0];
const client = await createTdlibClient({ id: account.id, phone: account.phone });
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
const link = await generateInviteLink(client, destChannel.telegramId);
@@ -183,7 +195,13 @@ function handleCreateDestination(payload: string): void {
const parsed = JSON.parse(payload) as { requestId: string; title: string };
requestId = parsed.requestId;
await withTdlibMutex("create-destination", async () => {
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
throw new Error("No authenticated accounts available to create the group");
}
const account = accounts[0];
await withTdlibMutex(account.phone, "create-destination", async () => {
const { db } = await import("./db/client.js");
// Mark the request as in-progress
@@ -192,14 +210,7 @@ function handleCreateDestination(payload: string): void {
data: { status: "IN_PROGRESS" },
});
// Use the first available authenticated account
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
throw new Error("No authenticated accounts available to create the group");
}
const account = accounts[0];
const client = await createTdlibClient({ id: account.id, phone: account.phone });
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
// Create the supergroup via TDLib
@@ -324,16 +335,16 @@ function handleJoinChannel(payload: string): void {
const parsed = JSON.parse(payload) as { requestId: string; input: string; accountId: string };
requestId = parsed.requestId;
await withTdlibMutex("join-channel", async () => {
await updateFetchRequestStatus(requestId!, "IN_PROGRESS");
const accounts = await getActiveAccounts();
const account = accounts.find((a) => a.id === parsed.accountId) ?? accounts[0];
if (!account) {
throw new Error("No authenticated accounts available");
}
const client = await createTdlibClient({ id: account.id, phone: account.phone });
await withTdlibMutex(account.phone, "join-channel", async () => {
await updateFetchRequestStatus(requestId!, "IN_PROGRESS");
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
const linkInfo = parseTelegramInput(parsed.input);
@@ -503,7 +514,12 @@ function handleIngestionTrigger(): void {
function handleRebuildPackages(requestId: string): void {
fetchQueue = fetchQueue.then(async () => {
try {
await withTdlibMutex("rebuild-packages", () =>
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to rebuild packages");
return;
}
await withTdlibMutex(accounts[0].phone, "rebuild-packages", () =>
rebuildPackageDatabase(requestId)
);
} catch (err) {
@@ -511,3 +527,38 @@ function handleRebuildPackages(requestId: string): void {
}
});
}
// ── Manual upload handler ──
function handleManualUpload(uploadId: string): void {
fetchQueue = fetchQueue
.then(() => processManualUpload(uploadId))
.catch((err) => log.error({ err, uploadId }, "Manual upload processing failed"));
}
// ── Backfill file-list handler ──
//
// Trigger via:
// SELECT pg_notify('backfill_filelists', '{"limit":50,"archiveType":"RAR"}');
//
// Both fields are optional. archiveType filters to one of ZIP/RAR/SEVEN_Z.
// Default limit is 100. The handler queues so multiple notifications run
// sequentially (no concurrent TDLib downloads competing for the mutex).
function handleBackfillFilelists(payload: string): void {
fetchQueue = fetchQueue
.then(() => processBackfillRequest(payload))
.catch((err) => log.error({ err, payload }, "Backfill request failed"));
}
// ── Slicer tag backfill handler ──
//
// Trigger:
// SELECT pg_notify('backfill_slicer_tags', '{"limit":1000}');
//
// Pure-DB pass over Packages that have file lists but no slicer tags.
// No downloads, no TDLib involvement — fast and safe.
function handleBackfillSlicerTags(payload: string): void {
fetchQueue = fetchQueue
.then(() => processSlicerTagBackfill(payload))
.catch((err) => log.error({ err, payload }, "Slicer tag backfill failed"));
}

View File

@@ -1,7 +1,8 @@
import type { Client } from "tdl";
import type { TelegramPhoto } from "./preview/match.js";
import { downloadPhotoThumbnail } from "./tdlib/download.js";
import { createOrFindPackageGroup, linkPackagesToGroup } from "./db/queries.js";
import { createOrFindPackageGroup, linkPackagesToGroup, createTimeWindowGroup, createAutoGroup } from "./db/queries.js";
import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js";
import { db } from "./db/client.js";
@@ -77,3 +78,591 @@ export async function processAlbumGroups(
}
}
}
/**
* Apply learned GroupingRules from manual overrides.
* For each rule, find ungrouped packages whose fileName contains the pattern.
*/
export async function processRuleBasedGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const rules = await db.groupingRule.findMany({
where: { sourceChannelId },
orderBy: { confidence: "desc" },
});
if (rules.length === 0) return;
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
select: { id: true, fileName: true, creator: true },
});
if (ungrouped.length < 2) return;
for (const rule of rules) {
const matches = ungrouped.filter((pkg) => {
const lower = rule.pattern.toLowerCase();
return pkg.fileName.toLowerCase().includes(lower) ||
(pkg.creator && pkg.creator.toLowerCase().includes(lower));
});
if (matches.length < 2) continue;
// Check if any are already grouped (by a previous rule in this loop)
const stillUngrouped = await db.package.findMany({
where: {
id: { in: matches.map((m) => m.id) },
packageGroupId: null,
},
select: { id: true },
});
if (stillUngrouped.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: rule.pattern,
packageIds: stillUngrouped.map((m) => m.id),
groupingSource: "MANUAL",
});
log.info(
{ groupId, ruleId: rule.id, pattern: rule.pattern, memberCount: stillUngrouped.length },
"Applied learned grouping rule"
);
} catch (err) {
log.warn({ err, ruleId: rule.id }, "Failed to apply grouping rule");
}
}
}
/**
* After album grouping, cluster remaining ungrouped packages from the same channel
* that were posted within a configurable time window.
* Only groups packages that were just indexed in this scan cycle (the `indexedPackages` list).
*/
export async function processTimeWindowGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
if (config.autoGroupTimeWindowMinutes <= 0) return;
// Find which of the just-indexed packages are still ungrouped
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
orderBy: { sourceMessageId: "asc" },
select: {
id: true,
fileName: true,
sourceMessageId: true,
indexedAt: true,
},
});
if (ungrouped.length < 2) return;
const windowMs = config.autoGroupTimeWindowMinutes * 60 * 1000;
// Cluster by time proximity: walk through sorted list, start new cluster when gap > window
const clusters: typeof ungrouped[] = [];
let current: typeof ungrouped = [ungrouped[0]];
for (let i = 1; i < ungrouped.length; i++) {
const prev = current[current.length - 1];
const gap = Math.abs(ungrouped[i].indexedAt.getTime() - prev.indexedAt.getTime());
if (gap <= windowMs) {
current.push(ungrouped[i]);
} else {
clusters.push(current);
current = [ungrouped[i]];
}
}
clusters.push(current);
// Create groups for clusters with 2+ packages
for (const cluster of clusters) {
if (cluster.length < 2) continue;
// Derive group name from common filename prefix
const name = findCommonPrefix(cluster.map((p) => p.fileName)) || cluster[0].fileName;
try {
const groupId = await createTimeWindowGroup({
sourceChannelId,
name,
packageIds: cluster.map((p) => p.id),
});
log.info(
{ groupId, name, memberCount: cluster.length },
"Created time-window group"
);
} catch (err) {
log.warn({ err, clusterSize: cluster.length }, "Failed to create time-window group");
}
}
}
/**
* Group ungrouped packages that share a date pattern (YYYY-MM, YYYY_MM, etc.)
* or project slug extracted from their filenames.
*/
export async function processPatternGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
select: { id: true, fileName: true },
});
if (ungrouped.length < 2) return;
// Group by extracted pattern
const patternMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
const pattern = extractPattern(pkg.fileName);
if (!pattern) continue;
const group = patternMap.get(pattern) ?? [];
group.push(pkg);
patternMap.set(pattern, group);
}
for (const [pattern, members] of patternMap) {
if (members.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: pattern,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_PATTERN",
});
log.info(
{ groupId, pattern, memberCount: members.length },
"Created pattern-based group"
);
} catch (err) {
log.warn({ err, pattern }, "Failed to create pattern group");
}
}
}
/**
* Extract a grouping pattern from a filename.
* Matches: YYYY-MM, YYYY_MM, "Month Year", or a project prefix before common separators.
* Returns null if no usable pattern found.
*/
function extractPattern(fileName: string): string | null {
// Strip extension for matching
const name = fileName.replace(/\.(zip|rar|7z|pdf|stl)(\.\d+)?$/i, "");
// Match YYYY-MM or YYYY_MM patterns
const dateMatch = name.match(/(\d{4})[\-_](\d{2})/);
if (dateMatch) {
return `${dateMatch[1]}-${dateMatch[2]}`;
}
// Match "Month Year" patterns (e.g., "January 2025", "Jan 2025")
const months = "(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|aug(?:ust)?|sep(?:tember)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)";
const monthYearMatch = name.match(new RegExp(`(${months})\\s*(\\d{4})`, "i"));
if (monthYearMatch) {
const monthStr = monthYearMatch[1].toLowerCase().slice(0, 3);
const monthNum = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"].indexOf(monthStr) + 1;
if (monthNum > 0) {
return `${monthYearMatch[2]}-${String(monthNum).padStart(2, "0")}`;
}
}
// Match project prefix: text before " - ", " ", or "(". Must be at least 5 chars.
const prefixMatch = name.match(/^(.{5,}?)(?:\s*[\-]\s|\s*\()/);
if (prefixMatch) {
return prefixMatch[1].trim();
}
return null;
}
/**
* Group ungrouped packages that share the same creator within a channel.
* Only groups if there are 3+ packages from the same creator (to avoid
* over-grouping when a creator only has a couple files).
*/
export async function processCreatorGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
creator: { not: null },
},
select: { id: true, fileName: true, creator: true },
});
if (ungrouped.length < 3) return;
// Group by creator
const creatorMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.creator) continue;
const key = pkg.creator.toLowerCase();
const group = creatorMap.get(key) ?? [];
group.push(pkg);
creatorMap.set(key, group);
}
for (const [, members] of creatorMap) {
if (members.length < 3) continue;
const creatorName = members[0].creator!;
const name = findCommonPrefix(members.map((m) => m.fileName)) || creatorName;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_PATTERN",
});
log.info(
{ groupId, creator: creatorName, memberCount: members.length },
"Created creator-based group"
);
} catch (err) {
log.warn({ err, creator: creatorName }, "Failed to create creator group");
}
}
}
/**
* Group ungrouped packages that share the same root folder inside their archives.
* E.g., if two packages both contain files under "ProjectX/", they're likely related.
* Only considers packages with 3+ files (to avoid false positives from flat archives).
*/
export async function processZipPathGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
// Find ungrouped packages that have indexed files
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
fileCount: { gte: 3 },
},
select: {
id: true,
fileName: true,
files: {
select: { path: true },
take: 50,
},
},
});
if (ungrouped.length < 2) return;
// Extract the dominant root folder for each package
const packageRoots = new Map<string, { id: string; fileName: string }[]>();
for (const pkg of ungrouped) {
const root = extractRootFolder(pkg.files.map((f) => f.path));
if (!root) continue;
const key = root.toLowerCase();
const group = packageRoots.get(key) ?? [];
group.push({ id: pkg.id, fileName: pkg.fileName });
packageRoots.set(key, group);
}
// Create groups for roots shared by 2+ packages
for (const [root, members] of packageRoots) {
if (members.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: root,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_ZIP",
});
log.info(
{ groupId, rootFolder: root, memberCount: members.length },
"Created ZIP path prefix group"
);
} catch (err) {
log.warn({ err, rootFolder: root }, "Failed to create ZIP path group");
}
}
}
/**
* Group ungrouped packages that reply to the same root message.
* If message B and C both reply to message A, they're grouped together.
*/
export async function processReplyChainGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
replyToMessageId: { not: null },
},
select: {
id: true,
fileName: true,
replyToMessageId: true,
},
});
if (ungrouped.length < 2) return;
// Group by replyToMessageId
const replyMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.replyToMessageId) continue;
const key = pkg.replyToMessageId.toString();
const group = replyMap.get(key) ?? [];
group.push(pkg);
replyMap.set(key, group);
}
for (const [replyId, members] of replyMap) {
if (members.length < 2) continue;
const name = findCommonPrefix(members.map((m) => m.fileName)) || members[0].fileName;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_REPLY" as const,
});
log.info(
{ groupId, replyToMessageId: replyId, memberCount: members.length },
"Created reply-chain group"
);
} catch (err) {
log.warn({ err, replyToMessageId: replyId }, "Failed to create reply-chain group");
}
}
}
/**
* Group ungrouped packages with similar captions from the same channel.
* Uses normalized caption comparison — two captions match if they share
* the same significant words (ignoring common words and file extensions).
*/
export async function processCaptionGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
sourceCaption: { not: null },
},
select: {
id: true,
fileName: true,
sourceCaption: true,
},
});
if (ungrouped.length < 2) return;
// Group by normalized caption key
const captionMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.sourceCaption) continue;
const key = normalizeCaptionKey(pkg.sourceCaption);
if (!key) continue;
const group = captionMap.get(key) ?? [];
group.push(pkg);
captionMap.set(key, group);
}
for (const [, members] of captionMap) {
if (members.length < 2) continue;
const name = members[0].sourceCaption!.slice(0, 80);
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_CAPTION" as const,
});
log.info(
{ groupId, memberCount: members.length },
"Created caption-match group"
);
} catch (err) {
log.warn({ err }, "Failed to create caption group");
}
}
}
/**
* Normalize a caption for grouping: lowercase, strip extensions and numbers,
* extract significant words (3+ chars), sort, and join.
* Two captions with the same key are considered a match.
*/
function normalizeCaptionKey(caption: string): string | null {
const stripped = caption
.toLowerCase()
.replace(/\.(zip|rar|7z|stl|pdf|obj|gcode)(\.\d+)?/gi, "")
.replace(/[^a-z0-9\s]/g, " ");
const words = stripped
.split(/\s+/)
.filter((w) => w.length >= 3)
.filter((w) => !["the", "and", "for", "with", "from", "part", "file", "files"].includes(w));
if (words.length < 2) return null;
return words.sort().join(" ");
}
/**
* Extract the dominant root folder from a list of archive file paths.
* Returns the first path segment that appears in >50% of files.
* Returns null for flat archives or archives with no common root.
*/
function extractRootFolder(paths: string[]): string | null {
if (paths.length === 0) return null;
// Count first path segments
const segmentCounts = new Map<string, number>();
for (const p of paths) {
// Normalize separators and get first segment
const normalized = p.replace(/\\/g, "/");
const firstSlash = normalized.indexOf("/");
if (firstSlash <= 0) continue; // Skip root-level files
const segment = normalized.slice(0, firstSlash);
// Skip common noise folders
if (segment === "__MACOSX" || segment === ".DS_Store" || segment === "Thumbs.db") continue;
segmentCounts.set(segment, (segmentCounts.get(segment) ?? 0) + 1);
}
if (segmentCounts.size === 0) return null;
// Find the most common segment
let maxSegment = "";
let maxCount = 0;
for (const [seg, count] of segmentCounts) {
if (count > maxCount) {
maxSegment = seg;
maxCount = count;
}
}
// Must appear in >50% of files and be at least 3 chars
if (maxCount < paths.length * 0.5 || maxSegment.length < 3) return null;
return maxSegment;
}
/**
* Detect packages that could have been grouped differently.
* Checks if any grouped package's filename matches a GroupingRule
* that would place it in a different group.
*/
export async function detectGroupingConflicts(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const rules = await db.groupingRule.findMany({
where: { sourceChannelId },
});
if (rules.length === 0) return;
const grouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: { not: null },
},
select: {
id: true,
fileName: true,
packageGroupId: true,
packageGroup: { select: { name: true, groupingSource: true } },
},
});
for (const pkg of grouped) {
for (const rule of rules) {
if (pkg.fileName.toLowerCase().includes(rule.pattern.toLowerCase())) {
// Check if the rule's source group is different from current group
if (rule.createdByGroupId && rule.createdByGroupId !== pkg.packageGroupId) {
try {
await db.systemNotification.create({
data: {
type: "GROUPING_CONFLICT",
severity: "INFO",
title: `Potential grouping conflict: ${pkg.fileName}`,
message: `Grouped by ${pkg.packageGroup?.groupingSource ?? "unknown"} into "${pkg.packageGroup?.name}", but also matches rule "${rule.pattern}" from a different manual group`,
context: {
packageId: pkg.id,
fileName: pkg.fileName,
currentGroupId: pkg.packageGroupId,
matchedRuleId: rule.id,
matchedPattern: rule.pattern,
},
},
});
} catch {
// Best-effort
}
break; // One notification per package
}
}
}
}
}
/**
* Find the longest common prefix among a list of filenames,
* trimming trailing separators and partial words.
*/
function findCommonPrefix(names: string[]): string {
if (names.length === 0) return "";
if (names.length === 1) return names[0];
let prefix = names[0];
for (let i = 1; i < names.length; i++) {
while (!names[i].startsWith(prefix)) {
prefix = prefix.slice(0, -1);
if (prefix.length === 0) return "";
}
}
// Trim trailing separators and partial words
const trimmed = prefix.replace(/[\s\-_.(]+$/, "");
return trimmed.length >= 3 ? trimmed : "";
}

View File

@@ -27,6 +27,33 @@ async function main(): Promise<void> {
await cleanupTempDir();
await markStaleRunsAsFailed();
// Release any advisory locks orphaned by a previous worker instance.
// When Docker kills a container, PostgreSQL may keep the session alive
// (zombie connections), holding advisory locks that block the new worker.
try {
const result = await pool.query(`
SELECT pid, state, left(query, 80) as query, age(clock_timestamp(), state_change) as idle_time
FROM pg_stat_activity
WHERE datname = current_database()
AND pid != pg_backend_pid()
AND state = 'idle'
AND query LIKE '%pg_try_advisory_lock%'
AND state_change < clock_timestamp() - interval '5 minutes'
`);
for (const row of result.rows) {
log.warn(
{ pid: row.pid, idleTime: row.idle_time, query: row.query },
"Terminating stale advisory lock session from previous worker"
);
await pool.query("SELECT pg_terminate_backend($1)", [row.pid]);
}
if (result.rows.length > 0) {
log.info({ terminated: result.rows.length }, "Cleaned up stale advisory lock sessions");
}
} catch (err) {
log.warn({ err }, "Failed to clean up stale advisory locks (non-fatal)");
}
// Verify destination messages exist for all "uploaded" packages.
// Resets any packages whose dest message is missing so they get re-processed.
await recoverIncompleteUploads();

211
worker/src/manual-upload.ts Normal file
View File

@@ -0,0 +1,211 @@
import path from "path";
import { rm } from "fs/promises";
import { db } from "./db/client.js";
import { childLogger } from "./util/logger.js";
import { config } from "./util/config.js";
import { hashParts } from "./archive/hash.js";
import { byteLevelSplit } from "./archive/split.js";
import { uploadToChannel } from "./upload/channel.js";
import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js";
import { readZipCentralDirectory } from "./archive/zip-reader.js";
import { readRarContents } from "./archive/rar-reader.js";
import { read7zContents } from "./archive/sevenz-reader.js";
import { getActiveAccounts } from "./db/queries.js";
const log = childLogger("manual-upload");
export async function processManualUpload(uploadId: string): Promise<void> {
log.info({ uploadId }, "Processing manual upload");
const upload = await db.manualUpload.findUnique({
where: { id: uploadId },
include: { files: true },
});
if (!upload || upload.status !== "PENDING") {
log.warn({ uploadId }, "Manual upload not found or not pending");
return;
}
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "PROCESSING" },
});
try {
// Get destination channel
const destSetting = await db.globalSetting.findUnique({
where: { key: "destination_channel_id" },
});
if (!destSetting) throw new Error("No destination channel configured");
const destChannel = await db.telegramChannel.findFirst({
where: { id: destSetting.value, type: "DESTINATION", isActive: true },
});
if (!destChannel) throw new Error("Destination channel not found or inactive");
// Get a TDLib client (use first active account)
const accounts = await getActiveAccounts();
const account = accounts[0];
if (!account) throw new Error("No authenticated Telegram account available");
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
const packageIds: string[] = [];
for (const file of upload.files) {
try {
const filePath = file.filePath;
const fileName = file.fileName;
const fileSize = file.fileSize;
log.info({ fileName, fileSize: Number(fileSize) }, "Processing file");
// Determine archive type
let archiveType: "ZIP" | "RAR" | "SEVEN_Z" | "DOCUMENT" = "DOCUMENT";
const ext = fileName.toLowerCase();
if (ext.endsWith(".zip")) archiveType = "ZIP";
else if (ext.endsWith(".rar")) archiveType = "RAR";
else if (ext.endsWith(".7z")) archiveType = "SEVEN_Z";
// Hash the file
const contentHash = await hashParts([filePath]);
// Check for duplicates
const existing = await db.package.findFirst({
where: { contentHash, destMessageId: { not: null } },
select: { id: true },
});
if (existing) {
log.info({ fileName, contentHash }, "Duplicate file, skipping upload");
await db.manualUploadFile.update({
where: { id: file.id },
data: { packageId: existing.id },
});
packageIds.push(existing.id);
continue;
}
// Read archive metadata
let entries: {
path: string;
fileName: string;
extension: string | null;
compressedSize: bigint;
uncompressedSize: bigint;
crc32: string | null;
}[] = [];
try {
if (archiveType === "ZIP") entries = await readZipCentralDirectory([filePath]);
else if (archiveType === "RAR") entries = await readRarContents(filePath);
else if (archiveType === "SEVEN_Z") entries = await read7zContents(filePath);
} catch {
log.debug({ fileName }, "Could not read archive metadata");
}
// Split if needed
const MAX_UPLOAD_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n;
let uploadPaths = [filePath];
if (fileSize > MAX_UPLOAD_SIZE) {
uploadPaths = await byteLevelSplit(filePath);
}
// Upload to Telegram
const destResult = await uploadToChannel(
client,
destChannel.telegramId,
uploadPaths
);
// Create package record
const pkg = await db.package.create({
data: {
contentHash,
fileName,
fileSize,
archiveType,
sourceChannelId: destChannel.id,
sourceMessageId: destResult.messageId,
destChannelId: destChannel.id,
destMessageId: destResult.messageId,
destMessageIds: destResult.messageIds,
isMultipart: uploadPaths.length > 1,
partCount: uploadPaths.length,
fileCount: entries.length,
files: entries.length > 0 ? { create: entries } : undefined,
},
});
await db.manualUploadFile.update({
where: { id: file.id },
data: { packageId: pkg.id },
});
packageIds.push(pkg.id);
log.info({ fileName, packageId: pkg.id }, "File processed and uploaded");
// Clean up split files (but not the original)
if (uploadPaths.length > 1) {
for (const splitPath of uploadPaths) {
if (splitPath !== filePath) {
await rm(splitPath, { force: true }).catch(() => {});
}
}
}
} catch (fileErr) {
log.error({ err: fileErr, fileName: file.fileName }, "Failed to process file");
}
}
// Group packages if multiple files
if (packageIds.length >= 2) {
const groupName =
upload.groupName ?? upload.files[0].fileName.replace(/\.[^.]+$/, "");
const group = await db.packageGroup.create({
data: {
name: groupName,
sourceChannelId: destChannel.id,
groupingSource: "MANUAL",
},
});
await db.package.updateMany({
where: { id: { in: packageIds } },
data: { packageGroupId: group.id },
});
log.info(
{ groupId: group.id, groupName, packageCount: packageIds.length },
"Created group for uploaded files"
);
}
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "COMPLETED", completedAt: new Date() },
});
log.info(
{ uploadId, fileCount: upload.files.length, packageCount: packageIds.length },
"Manual upload completed"
);
} finally {
await closeTdlibClient(client);
}
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
log.error({ err, uploadId }, "Manual upload failed");
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "FAILED", errorMessage: message },
});
}
// Clean up uploaded files
try {
const uploadDir = path.join("/data/uploads", uploadId);
await rm(uploadDir, { recursive: true, force: true });
} catch {
// Best-effort cleanup
}
}

View File

@@ -63,7 +63,7 @@ export async function rebuildPackageDatabase(
}
const account = accounts[0];
const client = await createTdlibClient({
const { client } = await createTdlibClient({
id: account.id,
phone: account.phone,
});
@@ -308,14 +308,15 @@ async function scanDestinationChannel(
}>(client, {
_: "searchChatMessages",
chat_id: Number(chatId),
// No topic context for a flat destination scan. TDLib 1.8.64+ replaced
// `message_thread_id` / `saved_messages_topic_id` with a single
// optional `topic_id`; for a flat scan we just omit it.
query: "",
from_message_id: currentFromId,
offset: 0,
limit: 100,
filter: { _: "searchMessagesFilterDocument" },
sender_id: null,
message_thread_id: 0,
saved_messages_topic_id: 0,
});
if (!result.messages || result.messages.length === 0) break;

View File

@@ -63,7 +63,7 @@ export async function recoverIncompleteUploads(): Promise<void> {
let client: Client | undefined;
try {
client = await createTdlibClient({ id: account.id, phone: account.phone });
({ client } = await createTdlibClient({ id: account.id, phone: account.phone }));
// Load the chat list so TDLib can resolve chat IDs
try {
@@ -78,18 +78,31 @@ export async function recoverIncompleteUploads(): Promise<void> {
let resetCount = 0;
let verifiedCount = 0;
let unknownCount = 0;
let wrongContentCount = 0;
// Batch size for getMessages. TDLib accepts up to ~100 IDs per call.
// Using 100 means 20k packages → ~200 round-trips instead of 20k.
const BATCH_SIZE = 100;
for (const [, channelPackages] of byChannel) {
for (const pkg of channelPackages) {
const exists = await verifyMessageExists(
// Group packages by destChannelId (already done) — within each group,
// process in batches via getMessages (plural).
for (let i = 0; i < channelPackages.length; i += BATCH_SIZE) {
const batch = channelPackages.slice(i, i + BATCH_SIZE);
const batchResults = await verifyMessagesBatch(
client,
destChannel.telegramId,
pkg.destMessageId!
batch.map((p) => p.destMessageId!)
);
if (exists) {
for (let j = 0; j < batch.length; j++) {
const pkg = batch[j];
const result = batchResults[j];
if (result.state === "exists") {
verifiedCount++;
} else {
} else if (result.state === "deleted") {
log.warn(
{
packageId: pkg.id,
@@ -100,21 +113,50 @@ export async function recoverIncompleteUploads(): Promise<void> {
);
await resetPackageDestination(pkg.id);
resetCount++;
} else if (result.state === "wrong-content") {
// The message exists but isn't a document anymore (got cleared /
// replaced). Treat as missing so we re-upload.
log.warn(
{
packageId: pkg.id,
fileName: pkg.fileName,
destMessageId: Number(pkg.destMessageId),
contentType: result.contentType,
},
"Destination message is not a document, resetting package for re-upload"
);
await resetPackageDestination(pkg.id);
wrongContentCount++;
} else {
// Unknown — TDLib couldn't tell us. Don't reset, but DO count this
// so the summary line shows recovery wasn't 100% successful.
unknownCount++;
log.warn(
{
packageId: pkg.id,
fileName: pkg.fileName,
destMessageId: Number(pkg.destMessageId),
reason: result.reason.slice(0, 200),
},
"Could not verify destination message — will retry on next startup"
);
}
}
}
}
if (resetCount > 0) {
log.info(
{ resetCount, verifiedCount, totalChecked: packages.length },
"Upload recovery complete — packages reset for re-processing"
{
verifiedCount,
resetCount,
wrongContentCount,
unknownCount,
totalChecked: packages.length,
},
unknownCount === 0
? "Upload recovery complete"
: "Upload recovery complete — some packages could not be verified, will retry next startup"
);
} else {
log.info(
{ verifiedCount, totalChecked: packages.length },
"Upload recovery complete — all destination messages verified"
);
}
} catch (err) {
log.error({ err }, "Upload recovery failed (non-fatal, will retry next startup)");
} finally {
@@ -124,15 +166,77 @@ export async function recoverIncompleteUploads(): Promise<void> {
}
}
type VerifyResult =
| { state: "exists" }
| { state: "deleted" }
| { state: "wrong-content"; contentType: string }
| { state: "unknown"; reason: string };
/**
* Check whether a message exists in a Telegram chat.
* Returns false if the message was deleted or never existed.
* Batch version of verifyMessageExists. Calls TDLib's getMessages (plural)
* with up to ~100 message IDs at once. Returns one VerifyResult per input
* ID, in input order. Missing messages come back as null in TDLib's response
* — translated to {state: "deleted"} here.
*
* Falls back to per-message verification on any error so that one bad batch
* doesn't lose all verification for that chunk.
*/
async function verifyMessagesBatch(
client: Client,
chatTelegramId: bigint,
messageIds: bigint[]
): Promise<VerifyResult[]> {
try {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const result = (await withFloodWait(
() =>
client.invoke({
_: "getMessages",
chat_id: Number(chatTelegramId),
message_ids: messageIds.map((id) => Number(id)),
}),
"getMessages:verify"
)) as { messages?: (null | { content?: { _: string } })[] };
const messages = result.messages ?? [];
return messageIds.map((_id, i) => {
const m = messages[i];
if (!m || !m.content) return { state: "deleted" };
if (m.content._ !== "messageDocument") {
return { state: "wrong-content", contentType: String(m.content._) };
}
return { state: "exists" };
});
} catch (err) {
// If the whole batch errors out, fall back to per-message verification.
log.warn(
{ err, batchSize: messageIds.length, chatTelegramId: chatTelegramId.toString() },
"getMessages batch failed, falling back to per-message verification"
);
const out: VerifyResult[] = [];
for (const id of messageIds) {
out.push(await verifyMessageExists(client, chatTelegramId, id));
}
return out;
}
}
/**
* Check whether a message exists in a Telegram chat and is the document we
* uploaded. Returns a discriminated result instead of a bare boolean so the
* caller can distinguish "definitely gone" (reset) from "couldn't reach TG"
* (leave alone, try again next startup).
*
* Previous version conflated all non-404 errors with "exists", which masked
* recovery completely when TDLib had a degraded connection — the worker
* would log "all destination messages verified" even though it had answered
* questions it couldn't actually answer.
*/
async function verifyMessageExists(
client: Client,
chatTelegramId: bigint,
messageId: bigint
): Promise<boolean> {
): Promise<VerifyResult> {
try {
const result = await withFloodWait(
() =>
@@ -144,44 +248,37 @@ async function verifyMessageExists(
"getMessage:verify"
);
// TDLib returns the message object if it exists.
// A deleted message may return with content type "messageChatDeleteMessage"
// or the call may throw. Check that we got a real message with content.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const msg = result as any;
if (!msg || !msg.content) {
return false;
return { state: "deleted" };
}
// Check that the message has document content (our uploads are documents)
// A message that exists but has no document content was likely cleared/replaced
if (msg.content._ !== "messageDocument") {
log.debug(
{
messageId: Number(messageId),
contentType: msg.content._,
},
"Destination message exists but is not a document"
);
return false;
return { state: "wrong-content", contentType: String(msg.content._) };
}
return true;
return { state: "exists" };
} catch (err) {
// TDLib throws "Message not found" (error code 404) for deleted messages
const message = err instanceof Error ? err.message : String(err);
const errMessage = err instanceof Error ? err.message : String(err);
const code = (err as { code?: number })?.code;
if (code === 404 || message.includes("not found") || message.includes("Not Found")) {
return false;
// Hard "the message is definitely gone" signals from TDLib:
// - HTTP 404
// - "Message not found" / "MESSAGE_ID_INVALID" error strings
const lower = errMessage.toLowerCase();
if (
code === 404 ||
lower.includes("message not found") ||
lower.includes("message_id_invalid") ||
lower.includes("messageidinvalid") ||
lower.includes("not found")
) {
return { state: "deleted" };
}
// For other errors (network issues, etc.), assume the message exists
// to avoid incorrectly resetting packages due to transient failures
log.warn(
{ err, messageId: Number(messageId) },
"Could not verify message (assuming it exists)"
);
return true;
// Everything else (network, connection, TDLib internal) is genuinely
// unknown — do NOT claim "verified".
return { state: "unknown", reason: errMessage };
}
}

View File

@@ -1,8 +1,9 @@
import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js";
import { withTdlibMutex } from "./util/mutex.js";
import { withTdlibMutex, forceReleaseMutex } from "./util/mutex.js";
import { getActiveAccounts, getPendingAccounts } from "./db/queries.js";
import { runWorkerForAccount, authenticateAccount } from "./worker.js";
import { runIntegrityAudit } from "./audit.js";
const log = childLogger("scheduler");
@@ -18,13 +19,19 @@ let activeCyclePromise: Promise<void> | null = null;
*/
const CYCLE_TIMEOUT_MS = (parseInt(process.env.WORKER_CYCLE_TIMEOUT_MINUTES ?? "240", 10)) * 60 * 1000;
/** Read-only access to the current cycle counter for code that needs to
* apply per-cycle modulo logic (e.g. the cold-channel backoff). */
export function getCurrentCycle(): number {
return cycleCount;
}
/**
* Run one ingestion cycle:
* 1. Authenticate any PENDING accounts (triggers SMS code flow + auto-fetch channels)
* 2. Process all active AUTHENTICATED accounts for ingestion
*
* All TDLib operations are wrapped in the mutex to ensure only one client
* runs at a time (also shared with the fetch listener for on-demand requests).
* Each account's TDLib operations are wrapped in a per-key mutex so different
* accounts run concurrently while the same account is still serialized.
*
* The cycle has a configurable timeout (WORKER_CYCLE_TIMEOUT_MINUTES, default 4h).
* Once the timeout elapses, no new accounts will be started but any in-progress
@@ -54,7 +61,7 @@ async function runCycle(): Promise<void> {
log.warn("Cycle timeout reached during authentication phase, stopping");
break;
}
await withTdlibMutex(`auth:${account.phone}`, () =>
await withTdlibMutex(account.phone, `auth:${account.phone}`, () =>
authenticateAccount(account)
);
}
@@ -70,23 +77,54 @@ async function runCycle(): Promise<void> {
log.info({ accountCount: accounts.length }, "Processing accounts");
for (const account of accounts) {
if (Date.now() - cycleStart > CYCLE_TIMEOUT_MS) {
log.warn(
{ elapsed: Math.round((Date.now() - cycleStart) / 60_000), timeoutMinutes: CYCLE_TIMEOUT_MS / 60_000 },
"Cycle timeout reached, skipping remaining accounts"
);
break;
}
await withTdlibMutex(`ingest:${account.phone}`, () =>
const results = await Promise.allSettled(
accounts.map((account) => {
let timer: ReturnType<typeof setTimeout>;
return Promise.race([
withTdlibMutex(account.phone, `ingest:${account.phone}`, () =>
runWorkerForAccount(account)
),
new Promise<never>((_, reject) => {
timer = setTimeout(
() => reject(new Error(`Account ${account.phone} ingestion timed out after ${CYCLE_TIMEOUT_MS / 60_000}min`)),
CYCLE_TIMEOUT_MS
);
}),
]).finally(() => clearTimeout(timer));
})
);
for (let i = 0; i < results.length; i++) {
if (results[i].status === "rejected") {
const reason = (results[i] as PromiseRejectedResult).reason;
log.error(
{ phone: accounts[i].phone, err: reason },
"Account ingestion failed"
);
// If the cycle timed out, force-release the mutex so the next cycle
// (or other operations like fetch-channels) can proceed immediately
// instead of waiting 30 minutes for the mutex timeout.
const errMsg = reason instanceof Error ? reason.message : String(reason);
if (errMsg.includes("timed out") || errMsg.includes("mutex wait timeout")) {
forceReleaseMutex(accounts[i].phone);
}
}
}
log.info(
{ elapsed: Math.round((Date.now() - cycleStart) / 1000) },
"Ingestion cycle complete"
);
// Run integrity audit after all accounts are processed
try {
const auditResult = await runIntegrityAudit();
if (auditResult.issues > 0) {
log.info({ ...auditResult }, "Integrity audit found issues");
}
} catch (auditErr) {
log.warn({ err: auditErr }, "Integrity audit failed");
}
} catch (err) {
log.error({ err }, "Ingestion cycle failed");
} finally {

View File

@@ -5,6 +5,36 @@ import { withFloodWait } from "../util/retry.js";
const log = childLogger("chats");
/**
* Collect chat folder IDs to widen the loadChats sweep across all folder
* chat lists. In TDLib 1.8.64+ there's no synchronous getChatFolders call —
* the folder list arrives via updateChatFolders. We listen for it briefly
* (200ms) and fall back to an empty list if nothing arrives; chats inside
* folders are still reachable via chatListMain so this only loses some
* preemptive cache warming.
*/
async function collectFolderIds(
client: Client
): Promise<{ _: "chatListFolder"; chat_folder_id: number }[]> {
return new Promise((resolve) => {
const ids: number[] = [];
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const handler = (update: any) => {
if (update?._ === "updateChatFolders") {
const folders = update.chat_folders as { id: number }[] | undefined;
if (folders) {
for (const f of folders) ids.push(f.id);
}
}
};
client.on("update", handler);
setTimeout(() => {
client.off("update", handler);
resolve(ids.map((id) => ({ _: "chatListFolder" as const, chat_folder_id: id })));
}, 200);
});
}
export interface TelegramChatInfo {
chatId: bigint;
title: string;
@@ -37,21 +67,16 @@ export async function getAccountChats(
// First, load all chats into TDLib's cache using loadChats (the proper API).
// loadChats returns 404 when all chats have been loaded.
// Then use getChats to retrieve the IDs for enrichment.
// Load from main, archive, AND chat folders to cover all chat types.
const folderLists: { _: "chatListFolder"; chat_folder_id: number }[] = [];
try {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const folders = (await client.invoke({ _: "getChatFolders" })) as any;
if (folders?.chat_folders) {
for (const f of folders.chat_folders) {
folderLists.push({ _: "chatListFolder", chat_folder_id: f.id });
}
}
} catch {
// getChatFolders may not be available in older TDLib versions
}
//
// Folder-specific loading (chatListFolder) was removed in TDLib 1.8.64+ —
// getChatFolders (plural) is no longer a callable method, only the
// updateChatFolders event. The chats inside folders are still reachable
// via chatListMain so this isn't a functional regression.
const folderLists: { _: "chatListFolder"; chat_folder_id: number }[] =
await collectFolderIds(client);
const chatLists: Record<string, unknown>[] = [
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const chatLists: any[] = [
{ _: "chatListMain" },
{ _: "chatListArchive" },
...folderLists,
@@ -282,3 +307,63 @@ export async function searchPublicChat(
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
/**
* Return the chat's server-side last_message.id from TDLib's local cache.
* Used by the channel-scan-skip guard to short-circuit a paginated
* searchChatMessages when nothing has changed since our watermark.
*
* Returns null when the chat has no last_message (empty channel) or the
* call fails — callers must treat null as "unknown" and run the scan.
*/
export async function getChannelLastMessageId(
client: Client,
chatId: bigint
): Promise<bigint | null> {
try {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const chat = (await client.invoke({
_: "getChat",
chat_id: Number(chatId),
})) as { last_message?: { id?: number } };
const id = chat.last_message?.id;
return id ? BigInt(id) : null;
} catch (err) {
log.debug({ err, chatId: chatId.toString() }, "getChannelLastMessageId failed");
return null;
}
}
/**
* Return the forum topic's last_message_id from TDLib. Same purpose as
* getChannelLastMessageId but scoped to a single topic in a forum
* supergroup. TDLib's `getForumTopic` returns a `forumTopic` whose `info`
* field contains the last_message_id.
*
* Returns null on failure or empty topic — caller treats as "unknown".
*/
export async function getForumTopicLastMessageId(
client: Client,
chatId: bigint,
topicId: bigint
): Promise<bigint | null> {
try {
// TDLib 1.8.64 uses `forum_topic_id` (renamed from `message_thread_id`
// in the request) — consistent with the rest of the forum-topic API
// surface in this version.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const topic = (await client.invoke({
_: "getForumTopic",
chat_id: Number(chatId),
forum_topic_id: Number(topicId),
})) as { last_message?: { id?: number }; info?: { last_message_id?: number } };
const id = topic.last_message?.id ?? topic.info?.last_message_id;
return id ? BigInt(id) : null;
} catch (err) {
log.debug(
{ err, chatId: chatId.toString(), topicId: topicId.toString() },
"getForumTopicLastMessageId failed"
);
return null;
}
}

View File

@@ -6,6 +6,7 @@ import { childLogger } from "../util/logger.js";
import {
updateAccountAuthState,
getAccountAuthCode,
updateAccountPremiumStatus,
} from "../db/queries.js";
const log = childLogger("tdlib-client");
@@ -27,7 +28,7 @@ interface AccountConfig {
*/
export async function createTdlibClient(
account: AccountConfig
): Promise<Client> {
): Promise<{ client: Client; isPremium: boolean }> {
const dbPath = path.join(config.tdlibStateDir, account.id);
const client = createClient({
@@ -78,7 +79,30 @@ export async function createTdlibClient(
await updateAccountAuthState(account.id, "AUTHENTICATED");
log.info({ accountId: account.id }, "TDLib client authenticated");
return client;
let isPremium = false;
try {
const me = await client.invoke({ _: "getMe" }) as { is_premium?: boolean };
isPremium = me.is_premium ?? false;
await updateAccountPremiumStatus(account.id, isPremium);
log.info({ accountId: account.id, isPremium }, "Account Premium status detected");
} catch (err) {
log.warn({ err, accountId: account.id }, "Could not detect Premium status, defaulting to false");
}
client.on("update", (update: unknown) => {
const u = update as { _?: string; is_upload?: boolean };
if (u?._ === "updateSpeedLimitNotification") {
log.warn(
{ accountId: account.id, isUpload: u.is_upload },
u.is_upload
? "Upload speed limited by Telegram (account is not Premium)"
: "Download speed limited by Telegram (account is not Premium)"
);
}
});
return { client, isPremium };
} catch (err) {
log.error({ err, accountId: account.id }, "TDLib authentication failed");
await updateAccountAuthState(account.id, "EXPIRED");

View File

@@ -2,13 +2,16 @@ import type { Client } from "tdl";
import { readFile, rename, copyFile, unlink, stat } from "fs/promises";
import { config } from "../util/config.js";
import { childLogger } from "../util/logger.js";
import { withFloodWait } from "../util/retry.js";
import { withFloodWait, extractFloodWaitSeconds } from "../util/retry.js";
import { isArchiveAttachment } from "../archive/detect.js";
import type { TelegramMessage } from "../archive/multipart.js";
import type { TelegramPhoto } from "../preview/match.js";
const log = childLogger("download");
/** Maximum retry attempts for stalled/failed downloads */
const MAX_DOWNLOAD_RETRIES = 3;
/** Maximum number of pages to scan per channel/topic to prevent infinite loops */
export const MAX_SCAN_PAGES = 5000;
@@ -36,6 +39,15 @@ interface TdMessage {
id: number;
date: number;
media_album_id?: string;
// TDLib 1.8.50 exposed `reply_to_message_id` directly on the message.
// 1.8.64+ replaced it with a tagged-union `reply_to: MessageReplyTo`.
// Read both for resilience across versions.
reply_to_message_id?: number;
reply_to?: {
_: string;
chat_id?: number;
message_id?: number;
};
content: {
_: string;
document?: {
@@ -47,6 +59,10 @@ interface TdMessage {
path?: string;
is_downloading_completed?: boolean;
};
remote?: {
/** Stable identifier across reposts of the same file content. */
unique_id?: string;
};
};
};
photo?: {
@@ -58,6 +74,24 @@ interface TdMessage {
};
}
/**
* Pick the right "the message I'm replying to" ID across TDLib versions.
* - 1.8.50 and earlier expose it directly as `reply_to_message_id`.
* - 1.8.64+ expose `reply_to: MessageReplyTo` (tagged union); a reply to
* a regular message has `_: "messageReplyToMessage"` with `message_id`.
* - Story replies (`_: "messageReplyToStory"`) intentionally return null
* here — they aren't useful for our reply-chain grouping.
*/
function extractReplyToMessageId(msg: TdMessage): bigint | undefined {
if (msg.reply_to_message_id) {
return BigInt(msg.reply_to_message_id);
}
if (msg.reply_to && msg.reply_to._ === "messageReplyToMessage" && msg.reply_to.message_id) {
return BigInt(msg.reply_to.message_id);
}
return undefined;
}
interface TdFile {
id: number;
size: number;
@@ -75,6 +109,8 @@ export interface ChannelScanResult {
archives: TelegramMessage[];
photos: TelegramPhoto[];
totalScanned: number;
/** Highest message ID seen during scan (for watermark, even when no archives found). */
maxScannedMessageId: bigint | null;
}
export type ScanProgressCallback = (messagesScanned: number) => void;
@@ -109,7 +145,10 @@ export async function invokeWithTimeout<T>(
}
}, timeoutMs);
(client.invoke(request) as Promise<T>)
// The tdl 8.1+ types are very strict about the literal `_` field;
// our generic wrapper passes arbitrary requests, so cast through any.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(client.invoke(request as any) as Promise<T>)
.then((result) => {
if (!settled) {
settled = true;
@@ -154,6 +193,7 @@ export async function getChannelMessages(
const archives: TelegramMessage[] = [];
const photos: TelegramPhoto[] = [];
const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null;
let maxScannedMessageId: bigint | null = null;
// Open the chat so TDLib can access it
try {
@@ -188,18 +228,26 @@ export async function getChannelMessages(
const result = await invokeWithTimeout<{ messages: TdMessage[]; total_count?: number }>(client, {
_: "searchChatMessages",
chat_id: Number(chatId),
// No topic_id for a flat (non-forum) channel scan. TDLib 1.8.64+
// dropped the top-level `message_thread_id: 0` we used to pass; the
// type-narrow now is "omit the field entirely if not in a topic".
query: "",
from_message_id: fromMessageId,
offset: 0,
limit: Math.min(limit, 100),
filter,
message_thread_id: 0,
});
if (!result.messages || result.messages.length === 0) break;
totalScanned += result.messages.length;
// Track highest message ID (first message in batch = newest, since results are newest-first)
const batchMaxId = BigInt(result.messages[0].id);
if (maxScannedMessageId === null || batchMaxId > maxScannedMessageId) {
maxScannedMessageId = batchMaxId;
}
for (const msg of result.messages) {
// Check for archive documents
const doc = msg.content?.document;
@@ -213,6 +261,9 @@ export async function getChannelMessages(
fileSize: BigInt(doc.document.size),
date: new Date(msg.date * 1000),
mediaAlbumId: msg.media_album_id && msg.media_album_id !== "0" ? msg.media_album_id : undefined,
replyToMessageId: extractReplyToMessageId(msg),
caption: msg.content?.caption?.text || undefined,
remoteUniqueId: doc.document.remote?.unique_id || undefined,
});
continue;
}
@@ -240,6 +291,11 @@ export async function getChannelMessages(
fromMessageId = result.messages[result.messages.length - 1].id;
if (result.messages.length < Math.min(limit, 100)) break;
// Early exit: searchChatMessages returns newest-first. Once the oldest
// message on this page is at or below the boundary, all remaining pages
// are even older — no new messages exist, stop scanning immediately.
if (boundary && fromMessageId <= boundary) break;
await sleep(config.apiDelayMs);
}
}
@@ -260,6 +316,7 @@ export async function getChannelMessages(
archives: archives.reverse(),
photos: photos.reverse(),
totalScanned,
maxScannedMessageId,
};
}
@@ -353,6 +410,75 @@ export async function downloadFile(
isComplete: false,
});
for (let attempt = 0; attempt <= MAX_DOWNLOAD_RETRIES; attempt++) {
try {
return await downloadFileAttempt(client, numericId, fileId, destPath, totalBytes, fileName, onProgress);
} catch (err) {
const isLastAttempt = attempt >= MAX_DOWNLOAD_RETRIES;
// Rate limit from Telegram
const waitSeconds = extractFloodWaitSeconds(err);
if (waitSeconds !== null && !isLastAttempt) {
const jitter = 1000 + Math.random() * 4000;
const waitMs = waitSeconds * 1000 + jitter;
log.warn(
{ fileName, attempt: attempt + 1, maxRetries: MAX_DOWNLOAD_RETRIES, waitSeconds },
`Download rate-limited — sleeping ${waitSeconds}s before retry`
);
await cancelDownload(client, numericId);
await sleep(waitMs);
continue;
}
// Stall, timeout, or unexpected stop — cancel and retry
const errMsg = err instanceof Error ? err.message : "";
if (
(errMsg.includes("stalled") || errMsg.includes("timed out") || errMsg.includes("stopped unexpectedly")) &&
!isLastAttempt
) {
log.warn(
{ fileName, attempt: attempt + 1, maxRetries: MAX_DOWNLOAD_RETRIES },
"Download failed — cancelling and retrying"
);
await cancelDownload(client, numericId);
await sleep(5_000);
continue;
}
throw err;
}
}
throw new Error(`Download failed after ${MAX_DOWNLOAD_RETRIES} retries for ${fileName}`);
}
/**
* Cancel an active TDLib download so it can be retried cleanly.
*/
async function cancelDownload(client: Client, fileId: number): Promise<void> {
try {
await client.invoke({
_: "cancelDownloadFile",
file_id: fileId,
only_if_pending: false,
});
log.debug({ fileId }, "Cancelled TDLib download for retry");
} catch {
// Best-effort
}
}
/**
* Single download attempt with progress tracking, stall detection, and verification.
*/
async function downloadFileAttempt(
client: Client,
numericId: number,
fileId: string,
destPath: string,
totalBytes: number,
fileName: string,
onProgress?: ProgressCallback
): Promise<void> {
return new Promise<void>((resolve, reject) => {
let lastLoggedPercent = 0;
let settled = false;

View File

@@ -64,7 +64,11 @@ export async function getForumTopicList(
const topics: ForumTopic[] = [];
let offsetDate = 0;
let offsetMessageId = 0;
let offsetMessageThreadId = 0;
// TDLib 1.8.64+ renamed `offset_message_thread_id` → `offset_forum_topic_id`
// in the getForumTopics request, and `next_offset_message_thread_id` →
// `next_offset_forum_topic_id` in the response. Individual topic infos
// also moved from `info.message_thread_id` → `info.forum_topic_id`.
let offsetForumTopicId = 0;
let pageCount = 0;
// eslint-disable-next-line no-constant-condition
@@ -80,12 +84,16 @@ export async function getForumTopicList(
const prevOffsetDate = offsetDate;
const prevOffsetMessageId = offsetMessageId;
const prevOffsetMessageThreadId = offsetMessageThreadId;
const prevOffsetForumTopicId = offsetForumTopicId;
const result = await invokeWithTimeout<{
topics?: {
info?: {
// Both names — 1.8.50 used the first, 1.8.64+ uses the second.
// Read both so a future TDLib downgrade or transition build is
// still handled.
message_thread_id?: number;
forum_topic_id?: number;
name?: string;
is_general?: boolean;
};
@@ -93,45 +101,49 @@ export async function getForumTopicList(
next_offset_date?: number;
next_offset_message_id?: number;
next_offset_message_thread_id?: number;
next_offset_forum_topic_id?: number;
}>(client, {
_: "getForumTopics",
chat_id: Number(chatId),
query: "",
offset_date: offsetDate,
offset_message_id: offsetMessageId,
offset_message_thread_id: offsetMessageThreadId,
offset_forum_topic_id: offsetForumTopicId,
limit: 100,
});
if (!result.topics || result.topics.length === 0) break;
for (const t of result.topics) {
if (!t.info?.message_thread_id) continue;
const topicId = t.info?.forum_topic_id ?? t.info?.message_thread_id;
if (!topicId) continue;
topics.push({
topicId: BigInt(t.info.message_thread_id),
name: t.info.is_general ? "General" : (t.info.name ?? "Unnamed"),
topicId: BigInt(topicId),
name: t.info?.is_general ? "General" : (t.info?.name ?? "Unnamed"),
});
}
// Check if there are more pages
const nextForumTopicId =
result.next_offset_forum_topic_id ?? result.next_offset_message_thread_id;
if (
!result.next_offset_date &&
!result.next_offset_message_id &&
!result.next_offset_message_thread_id
!nextForumTopicId
) {
break;
}
offsetDate = result.next_offset_date ?? 0;
offsetMessageId = result.next_offset_message_id ?? 0;
offsetMessageThreadId = result.next_offset_message_thread_id ?? 0;
offsetForumTopicId = nextForumTopicId ?? 0;
// Stuck detection: if offsets didn't advance, break
if (
offsetDate === prevOffsetDate &&
offsetMessageId === prevOffsetMessageId &&
offsetMessageThreadId === prevOffsetMessageThreadId
offsetForumTopicId === prevOffsetForumTopicId
) {
log.warn(
{ chatId: chatId.toString(), topicCount: topics.length },
@@ -178,6 +190,7 @@ export async function getTopicMessages(
const archives: TelegramMessage[] = [];
const photos: TelegramPhoto[] = [];
const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null;
let maxScannedMessageId: bigint | null = null;
let currentFromId = 0;
let totalScanned = 0;
@@ -209,6 +222,7 @@ export async function getTopicMessages(
document?: {
id: number;
size: number;
remote?: { unique_id?: string };
};
};
photo?: {
@@ -225,20 +239,32 @@ export async function getTopicMessages(
}>(client, {
_: "searchChatMessages",
chat_id: Number(chatId),
// TDLib 1.8.64+ replaced the top-level `message_thread_id` and
// `saved_messages_topic_id` parameters with a single tagged-union
// `topic_id: MessageTopic$Input`. For a forum topic, use the
// messageTopicForum variant carrying the forum_topic_id.
topic_id: {
_: "messageTopicForum",
forum_topic_id: Number(topicId),
},
query: "",
message_thread_id: Number(topicId),
from_message_id: currentFromId,
offset: 0,
limit: Math.min(limit, 100),
filter: null,
sender_id: null,
saved_messages_topic_id: 0,
});
if (!result.messages || result.messages.length === 0) break;
totalScanned += result.messages.length;
// Track highest message ID (first message = newest, since results are newest-first)
const batchMaxId = BigInt(result.messages[0].id);
if (maxScannedMessageId === null || batchMaxId > maxScannedMessageId) {
maxScannedMessageId = batchMaxId;
}
for (const msg of result.messages) {
// Check for archive documents
const doc = msg.content?.document;
@@ -250,6 +276,7 @@ export async function getTopicMessages(
fileSize: BigInt(doc.document.size),
date: new Date(msg.date * 1000),
mediaAlbumId: msg.media_album_id && msg.media_album_id !== "0" ? msg.media_album_id : undefined,
remoteUniqueId: doc.document.remote?.unique_id || undefined,
});
continue;
}
@@ -302,6 +329,7 @@ export async function getTopicMessages(
archives: archives.reverse(),
photos: photos.reverse(),
totalScanned,
maxScannedMessageId,
};
}

View File

@@ -3,12 +3,25 @@ import { stat } from "fs/promises";
import type { Client } from "tdl";
import { config } from "../util/config.js";
import { childLogger } from "../util/logger.js";
import { withFloodWait } from "../util/retry.js";
import { withFloodWait, extractFloodWaitSeconds } from "../util/retry.js";
const log = childLogger("upload");
/**
* Custom error class to distinguish upload stalls from other errors.
* When consecutive stalls occur, the caller can use this signal to
* recreate the TDLib client (whose event stream may have degraded).
*/
export class UploadStallError extends Error {
constructor(message: string) {
super(message);
this.name = "UploadStallError";
}
}
export interface UploadResult {
messageId: bigint;
messageIds: bigint[];
}
/**
@@ -28,7 +41,7 @@ export async function uploadToChannel(
filePaths: string[],
caption?: string
): Promise<UploadResult> {
let firstMessageId: bigint | null = null;
const allMessageIds: bigint[] = [];
for (let i = 0; i < filePaths.length; i++) {
const filePath = filePaths[i];
@@ -49,11 +62,9 @@ export async function uploadToChannel(
"Uploading file to channel"
);
const serverMsgId = await sendAndWaitForUpload(client, chatId, filePath, fileCaption, fileName, fileSizeMB);
const serverMsgId = await sendWithRetry(client, chatId, filePath, fileCaption, fileName, fileSizeMB);
if (i === 0) {
firstMessageId = serverMsgId;
}
allMessageIds.push(serverMsgId);
// Rate limit delay between uploads
if (i < filePaths.length - 1) {
@@ -61,16 +72,96 @@ export async function uploadToChannel(
}
}
if (firstMessageId === null) {
if (allMessageIds.length === 0) {
throw new Error("Upload failed: no messages sent");
}
log.info(
{ chatId: Number(chatId), messageId: Number(firstMessageId), files: filePaths.length },
{ chatId: Number(chatId), messageId: Number(allMessageIds[0]), files: filePaths.length },
"All uploads confirmed by Telegram"
);
return { messageId: firstMessageId };
return { messageId: allMessageIds[0], messageIds: allMessageIds };
}
/**
* Retry wrapper for sendAndWaitForUpload.
* Handles:
* - Rate limits (429 / FLOOD_WAIT) from updateMessageSendFailed — waits and retries
* - Stall / timeout — retries with a cooldown
*/
const MAX_UPLOAD_RETRIES = 3;
async function sendWithRetry(
client: Client,
chatId: bigint,
filePath: string,
caption: string | undefined,
fileName: string,
fileSizeMB: number
): Promise<bigint> {
for (let attempt = 0; attempt <= MAX_UPLOAD_RETRIES; attempt++) {
try {
return await sendAndWaitForUpload(client, chatId, filePath, caption, fileName, fileSizeMB);
} catch (err) {
const isLastAttempt = attempt >= MAX_UPLOAD_RETRIES;
// Rate limit from Telegram (429 / FLOOD_WAIT / "retry after N")
const waitSeconds = extractFloodWaitSeconds(err);
if (waitSeconds !== null && !isLastAttempt) {
const jitter = 1000 + Math.random() * 4000;
const waitMs = waitSeconds * 1000 + jitter;
log.warn(
{ fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES, waitSeconds },
`Upload rate-limited — sleeping ${waitSeconds}s before retry`
);
await sleep(waitMs);
continue;
}
// Stall or timeout — fail fast and let the caller recreate the TDLib
// client. Retrying on the same degraded event stream wastes ~15 min
// per attempt because the underlying issue (missing send-success
// events) is client-level, not transient. The set ends up in
// SkippedPackage and the caller's watermark cap ensures it gets
// retried next cycle on a fresh client.
const errMsg = err instanceof Error ? err.message : "";
if (errMsg.includes("stalled") || errMsg.includes("timed out")) {
log.warn(
{ fileName, attempt: attempt + 1 },
"Upload stalled — failing fast so caller can recreate TDLib client"
);
throw new UploadStallError(
`Upload stalled for ${fileName}: ${errMsg}`
);
}
// Transient Telegram server-side error (HTTP 5xx returned via
// updateMessageSendFailed). These are NOT FLOOD_WAIT, NOT stalls — just
// TG having a bad moment. They typically resolve on a short backoff, so
// retry up to MAX_UPLOAD_RETRIES with linear backoff before giving up.
const lowerMsg = errMsg.toLowerCase();
const isTransientServerError =
lowerMsg.includes("internal server error") ||
lowerMsg.includes("internal error") ||
lowerMsg.includes("server error") ||
lowerMsg.includes("bad gateway") ||
lowerMsg.includes("service unavailable") ||
lowerMsg.includes("gateway timeout");
if (isTransientServerError && !isLastAttempt) {
const backoffMs = 15_000 * (attempt + 1) + Math.random() * 5_000;
log.warn(
{ fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES, backoffMs: Math.round(backoffMs) },
`Transient Telegram server error — retrying after backoff`
);
await sleep(backoffMs);
continue;
}
throw err;
}
}
throw new Error(`Upload failed after ${MAX_UPLOAD_RETRIES} retries for ${fileName}`);
}
/**
@@ -94,8 +185,15 @@ async function sendAndWaitForUpload(
let lastLoggedPercent = 0;
let tempMsgId: number | null = null;
let uploadStarted = false;
let lastProgressBytes = 0;
let lastProgressTime = Date.now();
// Events for our message can arrive before `sendMessage` resolves
// (TDLib emits them while our .then() is still in the microtask queue).
// Buffer them and replay once tempMsgId is known.
let pendingSuccess: { oldMsgId: number; finalId: number } | null = null;
let pendingFailure: { oldMsgId: number; errorMsg: string; code?: number } | null = null;
// Timeout: 20 minutes per GB, minimum 15 minutes
const timeoutMs = Math.max(
15 * 60_000,
@@ -114,8 +212,10 @@ async function sendAndWaitForUpload(
}
}, timeoutMs);
// Stall detection: no progress for 5 minutes after upload started → reject
const STALL_TIMEOUT_MS = 5 * 60_000;
// Stall detection: no progress for 3 minutes after upload started → reject
// (reduced from 5min — once data is fully sent, confirmation should arrive quickly;
// a 3min silence strongly indicates a degraded TDLib event stream)
const STALL_TIMEOUT_MS = 3 * 60_000;
const stallChecker = setInterval(() => {
if (settled || !uploadStarted) return;
const stallMs = Date.now() - lastProgressTime;
@@ -130,6 +230,26 @@ async function sendAndWaitForUpload(
}
}, 30_000);
const completeWithSuccess = (finalId: number) => {
if (settled) return;
settled = true;
cleanup();
log.info(
{ fileName, tempMsgId, finalMsgId: finalId },
"Upload confirmed by Telegram"
);
resolve(BigInt(finalId));
};
const completeWithFailure = (errorMsg: string, code?: number) => {
if (settled) return;
settled = true;
cleanup();
const error = new Error(`Upload failed for ${fileName}: ${errorMsg}`);
(error as Error & { code?: number }).code = code;
reject(error);
};
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const handleUpdate = (update: any) => {
// Track upload progress via updateFile events
@@ -137,9 +257,14 @@ async function sendAndWaitForUpload(
const file = update.file;
if (file?.remote?.is_uploading_active && file.expected_size > 0) {
uploadStarted = true;
lastProgressTime = Date.now();
const uploaded = file.remote.uploaded_size ?? 0;
// Only reset stall timer when bytes actually advance
if (uploaded > lastProgressBytes) {
lastProgressBytes = uploaded;
lastProgressTime = Date.now();
}
const total = file.expected_size;
const percent = Math.round((uploaded / total) * 100);
if (percent >= lastLoggedPercent + 20) {
@@ -155,31 +280,29 @@ async function sendAndWaitForUpload(
// The money event: upload succeeded, we get the final server message ID
if (update?._ === "updateMessageSendSucceeded") {
const msg = update.message;
const oldMsgId = update.old_message_id;
if (tempMsgId !== null && oldMsgId === tempMsgId) {
if (!settled) {
settled = true;
cleanup();
const finalId = BigInt(msg.id);
log.info(
{ fileName, tempMsgId, finalMsgId: Number(finalId) },
"Upload confirmed by Telegram"
);
resolve(finalId);
const oldMsgId: number = update.old_message_id;
if (tempMsgId === null) {
// Race: event arrived before our .then() assigned tempMsgId.
// Buffer it and process once tempMsgId is known.
pendingSuccess = { oldMsgId, finalId: msg.id };
return;
}
if (oldMsgId === tempMsgId) {
completeWithSuccess(msg.id);
}
}
// Upload failed
if (update?._ === "updateMessageSendFailed") {
const oldMsgId = update.old_message_id;
if (tempMsgId !== null && oldMsgId === tempMsgId) {
if (!settled) {
settled = true;
cleanup();
const errorMsg = update.error?.message ?? "Unknown upload error";
reject(new Error(`Upload failed for ${fileName}: ${errorMsg}`));
const oldMsgId: number = update.old_message_id;
const errorMsg: string = update.error?.message ?? "Unknown upload error";
const code: number | undefined = update.error?.code;
if (tempMsgId === null) {
pendingFailure = { oldMsgId, errorMsg, code };
return;
}
if (oldMsgId === tempMsgId) {
completeWithFailure(errorMsg, code);
}
}
};
@@ -223,6 +346,13 @@ async function sendAndWaitForUpload(
{ fileName, tempMsgId },
"Message queued, waiting for upload confirmation"
);
// Replay any event that arrived before we knew tempMsgId
if (pendingSuccess && pendingSuccess.oldMsgId === tempMsgId) {
completeWithSuccess(pendingSuccess.finalId);
} else if (pendingFailure && pendingFailure.oldMsgId === tempMsgId) {
completeWithFailure(pendingFailure.errorMsg, pendingFailure.code);
}
})
.catch((err) => {
if (!settled) {

View File

@@ -7,6 +7,11 @@ export const config = {
logLevel: (process.env.LOG_LEVEL ?? "info") as "debug" | "info" | "warn" | "error",
telegramApiId: parseInt(process.env.TELEGRAM_API_ID ?? "0", 10),
telegramApiHash: process.env.TELEGRAM_API_HASH ?? "",
/** Maximum file part size for Telegram upload (in MiB). Default 1950 (under 2GB non-Premium limit).
* Set to 3900 for Premium accounts (under 4GB limit). */
maxPartSizeMB: parseInt(process.env.MAX_PART_SIZE_MB ?? "1950", 10),
/** Time window for auto-grouping ungrouped packages from the same channel (minutes). 0 = disabled. */
autoGroupTimeWindowMinutes: parseInt(process.env.AUTO_GROUP_TIME_WINDOW_MINUTES ?? "5", 10),
/** Maximum jitter added to scheduler interval (in minutes) */
jitterMinutes: 5,
/** Maximum time span for multipart archive parts (in hours). 0 = no limit. */
@@ -15,4 +20,26 @@ export const config = {
apiDelayMs: 1000,
/** Max retries for rate-limited requests */
maxRetries: 5,
/** After this many failed attempts on the same source message, the worker
* stops auto-retrying and lets the watermark advance past it. The user can
* manually retry via the UI to reset and try again. */
maxSkipAttempts: parseInt(process.env.WORKER_MAX_SKIP_ATTEMPTS ?? "5", 10),
/** Window in which a recent successful empty scan lets us skip the next
* scan entirely. Default 5 minutes. */
skipRecentScanWindowMs: parseInt(
process.env.WORKER_SKIP_RECENT_SCAN_WINDOW_MS ?? "300000",
10
),
/** After this many consecutive empty scans, a channel/topic enters
* backoff and is only scanned every Nth cycle. */
emptyScanBackoffThreshold: parseInt(
process.env.WORKER_EMPTY_SCAN_BACKOFF_THRESHOLD ?? "5",
10
),
/** While in backoff, scan only every Nth cycle. Default 5 = scan every
* fifth cycle = once every ~5 hours given the 60-min default interval. */
emptyScanBackoffEveryNth: parseInt(
process.env.WORKER_EMPTY_SCAN_BACKOFF_EVERY_NTH ?? "5",
10
),
} as const;

View File

@@ -2,39 +2,66 @@ import { childLogger } from "./logger.js";
const log = childLogger("mutex");
let locked = false;
let holder = "";
const queue: Array<{ resolve: () => void; reject: (err: Error) => void; label: string }> = [];
/**
* Maximum time to wait for the TDLib mutex (ms).
* If the mutex is not available within this time, the operation is rejected.
* Default: 30 minutes (long enough for large downloads, short enough to detect hangs).
*/
const MUTEX_WAIT_TIMEOUT_MS = 30 * 60 * 1000;
const locks = new Map<string, boolean>();
const holders = new Map<string, string>();
const queues = new Map<
string,
Array<{ resolve: () => void; reject: (err: Error) => void; label: string }>
>();
/**
* Ensures only one TDLib client runs at a time across the entire worker process.
* Both the scheduler (auth, ingestion) and the fetch listener acquire this
* before creating any TDLib client.
* Force-release a stuck mutex.
* This should only be called when the holder is known to be stuck (e.g. after
* a cycle timeout). It releases the lock and lets the next queued waiter proceed.
*/
export function forceReleaseMutex(key: string): void {
if (!locks.has(key)) return;
const holder = holders.get(key);
log.warn({ key, holder }, "Force-releasing stuck TDLib mutex");
locks.delete(key);
holders.delete(key);
const next = queues.get(key)?.shift();
if (next) {
log.info({ key, next: next.label }, "TDLib mutex force-released to next waiter");
next.resolve();
} else {
queues.delete(key);
log.info({ key }, "TDLib mutex force-released (no waiters)");
}
}
/**
* Ensures only one TDLib operation runs at a time FOR THE SAME KEY.
* Different keys run concurrently — this allows two accounts to ingest in parallel
* while still preventing concurrent use of the same account's TDLib state dir.
*
* Includes a wait timeout to prevent indefinite blocking if the current holder hangs.
* key: the account phone number for account-specific ops (auth, ingest),
* or 'global' for ops that don't belong to a specific account.
* label: human-readable name for logging.
*/
export async function withTdlibMutex<T>(
key: string,
label: string,
fn: () => Promise<T>
): Promise<T> {
if (locked) {
log.info({ waiting: label, holder }, "Waiting for TDLib mutex");
if (locks.get(key)) {
log.info({ waiting: label, key, holder: holders.get(key) }, "Waiting for TDLib mutex");
await new Promise<void>((resolve, reject) => {
const timer = setTimeout(() => {
const idx = queue.indexOf(entry);
const q = queues.get(key) ?? [];
const idx = q.indexOf(entry);
if (idx !== -1) {
queue.splice(idx, 1);
reject(new Error(
q.splice(idx, 1);
reject(
new Error(
`TDLib mutex wait timeout after ${MUTEX_WAIT_TIMEOUT_MS / 60_000}min ` +
`(waiting: ${label}, holder: ${holder})`
));
`(waiting: ${label}, key: ${key}, holder: ${holders.get(key)})`
)
);
}
}, MUTEX_WAIT_TIMEOUT_MS);
@@ -46,25 +73,28 @@ export async function withTdlibMutex<T>(
reject,
label,
};
queue.push(entry);
if (!queues.has(key)) queues.set(key, []);
queues.get(key)!.push(entry);
});
}
locked = true;
holder = label;
log.debug({ label }, "TDLib mutex acquired");
locks.set(key, true);
holders.set(key, label);
log.debug({ key, label }, "TDLib mutex acquired");
try {
return await fn();
} finally {
locked = false;
holder = "";
const next = queue.shift();
locks.delete(key);
holders.delete(key);
const next = queues.get(key)?.shift();
if (next) {
log.debug({ next: next.label }, "TDLib mutex releasing to next waiter");
log.debug({ key, next: next.label }, "TDLib mutex releasing to next waiter");
next.resolve();
} else {
log.debug({ label }, "TDLib mutex released");
queues.delete(key);
log.debug({ key, label }, "TDLib mutex released");
}
}
}

File diff suppressed because it is too large Load Diff