Commit Graph

67 Commits

Author SHA1 Message Date
18a0efb3d4 chore(tdlib): upgrade tdl 8.0.0 → 8.1.0 and prebuilt-tdlib 1.8.50 → 1.8.64
12 versions of TDLib bug fixes, performance improvements, and stricter
type definitions in @prebuilt-tdlib/types.

Two API breakages handled:

1. `getChatFolders` (plural) was removed — folder IDs now arrive via
   the `updateChatFolders` update event. Replaced the synchronous call
   with a 200ms event listener; if no folders arrive, we proceed with
   just main + archive lists. Chats inside folders are still reachable
   from chatListMain so this isn't a functional regression.

2. The new tdl `Client.invoke` signature requires a literal `_` field
   and rejects `Record<string, any>` shapes. Our `invokeWithTimeout`
   wrapper is intentionally generic — cast through `any` at the call
   site with a comment explaining why.

Both worker and bot type-check + build cleanly with the new versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:45:44 +02:00
2ccc9820cd fix(recovery): distinguish 'message gone' from 'TDLib couldn't tell us'
The old verifyMessageExists returned a bare boolean. Any error other
than HTTP 404 was treated as "exists" — meaning a TDLib connection
problem or transient TG hiccup at recovery time caused the worker to
declare "all destination messages verified" when it had actually
verified nothing.

Replaced with a discriminated VerifyResult:
  - exists         — message present and is a document, keep Package
  - deleted        — TG confirms it's gone (404 / MESSAGE_ID_INVALID /
                     "Message not found"), reset Package for re-upload
  - wrong-content  — message exists but isn't messageDocument, reset
  - unknown        — TDLib threw a non-404 error; do NOT reset, retry
                     next startup

Recovery summary now reports all four counts and switches to a
non-success message when unknownCount > 0, so a degraded TDLib run
doesn't hide behind a green log line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:43:09 +02:00
c72b5a4b48 feat(worker): add backfill_filelists pg_notify listener
Companion to 0bdd4ba (RAR parser fix). 4,380 RAR packages and ~450
ZIP/7Z packages in the DB have fileCount=0 because of the old broken
parser (and a handful of edge cases). This adds an on-demand backfill
that re-indexes their file lists.

Triggered by:
  SELECT pg_notify('backfill_filelists', '{"limit":50,"archiveType":"RAR"}');

Both payload fields are optional. archiveType filters to ZIP/RAR/SEVEN_Z;
default limit is 100. Multiple notifications queue sequentially so
TDLib downloads don't compete for the per-account mutex.

For each candidate:
  1. Resolve destChannel.telegramId from the Package
  2. getMessage for each destMessageId in destMessageIds[] (handles
     multipart) to recover the file_id from Telegram
  3. downloadFile (uses TDLib cache when available — most are fast)
  4. Run readZipCentralDirectory / readRarContents / read7zContents
  5. Transactionally replace PackageFile rows + update fileCount

Re-check of fileCount inside the transaction ensures a concurrent
backfill from another worker (or a fresh ingestion of the same archive)
doesn't get clobbered.

Prefers the Premium account when both are linked, for faster downloads
and to avoid the speed-limit throttling on the secondary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:42:01 +02:00
0bdd4ba0cc fix(rar-reader): use unrar lt (technical) so file listings actually work
Diagnosed from production: all 4,380 RAR packages in the database have
fileCount = 0. The old parser used \`unrar l -v\` and a regex that
expected an 8-column \`Attributes Size Packed Ratio% Date Time CRC32 Name\`
output. unrar 6.21's actual \`l -v\` output is 5 columns: \`Attributes
Size Date Time Name\` — no Packed, no Ratio, no CRC32. So every RAR
silently parsed to zero entries.

Switch to \`unrar lt\` (list technical), which emits one block per file
with key:value lines:

         Name: Lost Kingdom 2023 01 January/Nagas/NagaCaptainBody.stl
         Type: File
         Size: 22503584
  Packed size: 21430123
         CRC32: A1B2C3D4
         ...

The new parser tokenizes blocks on blank lines and matches "key: value"
lines per block. Handles multi-word keys ("Packed size", "Host OS") and
gracefully skips Directory entries and the archive header block. Also
tolerates BLAKE2sp checksums for newer RAR archives.

Verified against a live 644MB RAR with 201 entries (194 files, 7 dirs);
parser returns 194 entries with correct paths, sizes, and CRC32s.

Future RAR ingestions will populate fileCount and PackageFile rows
correctly. Backfilling existing 4,380 packages requires a separate
pass — added in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:38:46 +02:00
901f32ff41 feat(worker): retry old SkippedPackages + prefer specific topics over General
Three connected safeguards driven by user feedback after deploying the
incremental watermark and repost-detection fixes.

1. SkippedPackage retry pass (watermark pull-back)
   The auto-retry chain (d99a506 + watermark cap) only works for failures
   that occur AFTER the fix is deployed. Pre-existing SkippedPackages may
   sit below the current watermark — example from prod: secondary's
   "Turnbase Delivery Folder.7z" at msgId 37,109,104,640 vs watermark
   37,111,201,792. The auto-retry never sees it.

   Before scanning each channel/topic, we now query SkippedPackages with
   attemptCount < cap for that scope and pull the watermark back to
   (lowestSkippedMsgId - 1n) when needed. Both forum and non-forum
   branches handle this.

2. Topic scan order: specific topics first, General last
   In forum channels, files often appear in both a specific topic (e.g.,
   "Artisan Guild January 2022") AND in General. The first encounter
   created the Package and locked in the topic context. If we happened
   to scan General first, the Package recorded the less-informative
   topic.

   We now sort topics so General is processed last. New Packages get
   the more specific topic name as their context by default.

3. Backfill specific topic on existing Packages
   For Packages that were already created with General topic context,
   when findRepostedPackage matches and the current scan is in a more
   specific topic, update the existing Package's sourceTopicId (and
   creator, if it was derived from "General") to the more specific one.
   Audit log shows both old and new topic IDs.

The findRepostedPackage query also got an ORDER BY so it returns the
most-specific existing match (non-null sourceTopicId first) when
multiple Packages share the same filename + size in a channel — giving
the audit log richer context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 09:02:54 +02:00
ff4e150544 fix: skip download when the same file was already uploaded from this channel
Diagnosed from production: in 8 hours of main's current run, zero
uploads happened despite the worker being busy 100% of the time. Logs
showed continuous "Downloading archive part" entries with no
corresponding upload activity.

Root cause: the source channel ("Model Printing Emporium") frequently
reposts the same file at new Telegram message IDs. Concrete example
from the DB:
  - "(EN) PaintGuides All.zip"  → present 6 times, msgIds 44B → 92B
  - "00 Welcome Pack.7z"        → present 2 times, msgIds 91B and 177B
  - "FanteZi April 2022-...zip" → uploaded May 8 at msgId 24,697,110,528;
                                  current run re-downloading at 87,488,987,136

packageExistsBySourceMessage(channelId, msgId) correctly misses because
the msgId is different. We download the (potentially gigabyte-sized)
file, hash it, then packageExistsByHash hits and we discard the
download. ~30 seconds wasted per repost x thousands of reposts = whole
runs spent uploading nothing.

Fix: add findRepostedPackage(sourceChannelId, fileName, fileSize) — a
pre-download check that catches reposts by the strong (channel + name
+ total size) signal. On hit, skip the set entirely. Watermark
advances normally (no minFailedId tracking) so the next cycle sees
the channel as caught up.

False-positive risk: two unrelated files in the same channel with
identical name AND identical total fileSize. Extremely rare in
practice; if it ever happens, the new file is silently treated as a
duplicate. Logged at info level with the existing Package ID and dest
message ID so the user can audit if a file is mysteriously missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:54:20 +02:00
77aeb4cc00 fix: advance channel/topic watermark incrementally per successful set
All checks were successful
continuous-integration/drone/push Build is passing
Diagnosed from production logs for the main (Premium) account:

  RUNNING   2026-05-21 → in progress, 22h     ingested: 0
  FAILED    2026-05-14 → 2026-05-21 (7.4d)    ingested: 5,426 (killed by restart)
  FAILED    2026-05-06 → 2026-05-14 (7.7d)    ingested: 8,300 (killed by restart)

Main's two source channels have 378k+ messages each. A full scan takes
days, but the worker gets restarted (container update, cycle timeout,
etc.) every few days. updateLastProcessedMessage was only called at the
END of a channel's scan — so the watermark on AccountChannelMap stayed
NULL through restart after restart, and every new run re-scanned from
message 0.

That explains the user's symptom: "main wasn't uploading although it
said it did". The dashboard showed currentStep alternating through
downloading / hashing / deduplicating, but zipsIngested stayed at 0
because every archive the run encountered was already a hash-duplicate
of something uploaded by a previous run.

Fix: processArchiveSets now accepts an onWatermarkAdvance callback.
After each successful set (ingested OR confirmed duplicate), the callback
fires with a watermark capped below the current minFailedId. Both call
sites (forum/topic and non-forum) wire it to upsertTopicProgress /
updateLastProcessedMessage. The end-of-scan write is retained for the
no-archives and all-failures-with-fallback cases.

Worst-case progress loss on restart now is one in-flight archive set,
not the entire scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:20:20 +02:00
379bf246cd feat(worker): per-account safeguards for second-account upload failures
Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.

1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
   processArchiveSets now filters out SkippedPackage rows whose
   attemptCount has reached the cap. Removing them from the working
   list means they are not tracked in minFailedId, so the watermark
   cap from d99a506 does not pin progress below them anymore. A bad
   file no longer blocks the rest of the channel forever; the user
   can manually retry via the UI to reset the count.

2. Account phone in error messages
   Every SkippedPackage row and SystemNotification produced from a
   failure is now prefixed with [<phone>] in errorMessage / message,
   and the JSON context includes accountPhone. When two accounts
   share a source channel and only one is failing, the UI tells you
   which one.

3. Explicit getChat for destination at run start
   loadChats only loads main/archive/folder chat lists. If an account
   archived or moved the destination chat, sendMessage failed silently
   per-archive. Now we getChat the destination once per cycle; on
   failure we record a SystemNotification and skip the account's
   entire ingestion cycle (no point downloading what we can't upload).

4. Retry on transient Telegram server errors
   The "Turnbase Delivery Folder.7z" failure on the secondary and
   "10. Kingdom of the Depth.part1.rar" on the main were both
   "Internal Server Error during file upload" — a TG-side hiccup, not
   a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
   with linear backoff (15s, 30s, 45s + jitter) before giving up.

5. Channel-access-lost notification
   "Iridium 2 w/ Add-ons [Completed]" has been throwing
   "Can't access the chat" every cycle for the secondary. The worker
   now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
   24h per channel/account) so the admin sees it and can re-join or
   unlink the channel instead of just losing visibility into the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:07:57 +02:00
26e2cba69d fix: buffer upload confirmation events to close tempMsgId race
sendMessage resolves with the temporary message ID inside a .then()
microtask. If TDLib emits updateMessageSendSucceeded synchronously
(cached file, already-known media), the event handler fires while
tempMsgId is still null — the success is dropped and the promise hangs
until the 15-min upload timeout fires.

Buffer success/failure events that arrive before tempMsgId is known,
then replay them in the .then() callback once tempMsgId is set.
Extract completeWithSuccess / completeWithFailure helpers so the
resolution path is shared between live events and replayed events.

This race matters more now that stalls fail fast — without the buffer,
a fast-completing upload could still hang for 15 min before recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:48:38 +02:00
84cc8d995b fix: fail fast on upload stall instead of retrying on broken client
Previously a single TDLib event-stream degradation cost ~45 minutes
per archive: 3 retries x 15-min minimum timeout, all on the same
broken client. The retries had no chance of succeeding because the
underlying issue (missing updateMessageSendSucceeded events) is a
client-level problem, not a transient send failure.

Now the first stall throws UploadStallError immediately. The caller
in processArchiveSets already recreates the TDLib client on
UploadStallError, so we drop from ~45 min recovery to ~15 min
(one timeout cycle) per stalled archive.

The stalled set is recorded in SkippedPackage; with the watermark
cap from d99a506 it gets retried on the next ingestion cycle with
a fresh client.

FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle
legitimate rate limiting, not stalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:47:08 +02:00
d99a506b10 fix: cap watermark below failed sets so failures retry next cycle
Previously the channel/topic watermark could advance past failed
archive sets in two ways:

1. A later successful set raised maxProcessedId past a failed earlier
   set within the same scan.
2. scanResult.maxScannedMessageId was used as fallback even when
   archives in the scan had failed (added in 77c26ad to prevent
   re-scanning empty channels).

Both paths buried failed archives below the watermark on the next
cycle — they sat permanently in SkippedPackage with no auto-recovery.

Now processArchiveSets returns the lowest failed source message ID
alongside the highest processed one. The caller caps the watermark at
(minFailedId - 1n) so the next scan re-includes the failed messages
and processOneArchiveSet retries them. Successful sets above the
failure boundary are not re-uploaded — packageExistsBySourceMessage
early-skips them on the second pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:46:26 +02:00
59038889ae fix: prevent pool exhaustion that caused 4-hour duplicate check stall
All checks were successful
continuous-integration/drone/push Build is passing
The pg pool had max=5 connections shared between Prisma operations and
advisory locks. With 2 account locks held permanently and hash locks
from timed-out (but still running) background work, pool.connect()
would block forever — causing the Turnbase.7z stall.

- Increase pool max from 5 to 15 for headroom
- Add 30s connectionTimeoutMillis so pool.connect() throws instead of
  hanging forever when the pool is exhausted
- On startup, terminate zombie PostgreSQL sessions from previous worker
  instances that hold stale advisory locks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:39:00 +02:00
77c26adb31 perf: set watermarks even when no archives found to prevent re-scanning
All checks were successful
continuous-integration/drone/push Build is passing
Previously, channels/topics with no new archives never had their
watermark updated. This meant every cycle re-scanned all messages from
scratch just to discover nothing new — especially costly for the 1079-
topic Model Printing Emporium forum.

- Add maxScannedMessageId to ChannelScanResult (highest msg ID seen)
- Set channel watermark to scan boundary when no archives are found
- Set topic watermark to scan boundary when no archives are found
- Fall back to scan watermark when archive processing doesn't advance it

After one full cycle, subsequent cycles will skip already-scanned
messages via the early-exit boundary check, dramatically reducing
TDLib API calls on channels with mostly non-archive content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 20:37:42 +02:00
35cce3151c perf: early-exit channel scan when all messages are below watermark
searchChatMessages returns newest-first. Once the oldest message on a
page is at or below the lastProcessedMessageId boundary, all remaining
pages are even older. Stop scanning immediately instead of reading every
message in the channel.

This was already implemented for topic scans but missing from channel
scans. On a test run, total messages scanned dropped from 3805 to 1615
(57% reduction) for an account with no new archives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 19:58:30 +02:00
d6c82ede1e fix: auto-recover from TDLib upload stalls by recreating client
When TDLib's event stream degrades, uploads complete (bytes sent) but
confirmations never arrive. Previously the worker retried 3x with the
same broken client, wasting 60+ min per archive and holding the mutex.

- Add UploadStallError class to distinguish stalls from other failures
- Reduce stall detection timeout from 5min to 3min (faster detection)
- Recreate TDLib client after consecutive upload stalls instead of
  retrying on the same degraded connection
- Add forceReleaseMutex() to prevent cascade failures when one account
  blocks others via stuck mutex after cycle timeout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 18:02:42 +02:00
7e48131f67 fix: clear timeout on race settlement to prevent orphaned timers
All checks were successful
continuous-integration/drone/push Build is passing
2026-05-02 23:44:18 +02:00
a79cb4749b fix: use per-account mutex keys in fetch/extract listeners, add cycle timeout and error logging 2026-05-02 23:40:37 +02:00
e9017fc518 feat: parallel account ingestion via per-key TDLib mutex 2026-05-02 23:31:02 +02:00
4f59d19ac2 feat: apply per-account Premium 4GB upload limit to bypass repacking 2026-05-02 23:28:00 +02:00
579276ee2d fix: widen hash lock try/finally to prevent lock leak on error paths 2026-05-02 23:24:08 +02:00
b48cc510a4 feat: add two-phase DB write and hash advisory lock to prevent double-uploads 2026-05-02 23:13:55 +02:00
614c8e5b74 feat: add createPackageStub and updatePackageWithMetadata for two-phase DB write 2026-05-02 23:06:17 +02:00
3019c23f70 feat: add per-content-hash advisory lock to prevent concurrent duplicate uploads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:04:43 +02:00
436a576085 feat: detect and persist Telegram Premium status after authentication
After TDLib login completes, calls getMe() to detect isPremium, persists
it to DB via updateAccountPremiumStatus, and returns { client, isPremium }
from createTdlibClient. All callers updated to destructure accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:02:46 +02:00
f454303352 feat: add isPremium field to TelegramAccount
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:58:53 +02:00
af7094637d feat: file upload from UI, notification dismiss, audit false positive fix
Manual file upload:
- Upload dialog in STL page with drag-and-drop file picker
- Files saved to shared Docker volume (/data/uploads)
- Worker processes via pg_notify('manual_upload') channel
- Hashes, reads metadata, splits >2GB, uploads to Telegram
- Multiple files automatically grouped
- Status polling shows upload/processing/complete states

Notification fixes:
- Add dismiss (X) button on each notification
- Add "Clear" button to remove all notifications
- Fix false positive MISSING_PART alerts from legacy packages
  (only flag when >1 destMessageIds stored but count wrong,
  not when only 1 ID from backfill)

Infrastructure:
- ManualUpload + ManualUploadFile schema + migration
- Shared manual_uploads Docker volume between app and worker
- Upload API routes (POST /api/uploads, GET /api/uploads/[id])
- Worker manual-upload processor with full pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:26:06 +02:00
f4aa9d9a2f feat: complete remaining features — training, FTS, bot groups, repair, re-tag
All checks were successful
continuous-integration/drone/push Build is passing
Manual override training (GroupingRule):
- Learn patterns from manual group creation (common filename prefix or creator)
- Apply learned rules as first auto-grouping pass (highest confidence after albums)
- GroupingRule model stores pattern, channel, signal type, confidence

Hash verification after upload:
- Re-hash upload files on disk before indexing to catch disk corruption
- Creates HASH_MISMATCH notification on discrepancy

Grouping conflict detection:
- After all grouping passes, check if grouped packages match rules from different groups
- Creates GROUPING_CONFLICT notification for manual review

Per-channel grouping flags:
- Add autoGroupEnabled boolean to TelegramChannel (default true)
- Auto-grouping passes (all except album) gated behind this flag
- Album grouping always runs as it reflects Telegram's native behavior

Full-text search (tsvector):
- Add searchVector tsvector column with GIN index and auto-update trigger
- Backfill 1870 existing packages
- FTS with ts_rank for ranked results, ILIKE fallback for short/failed queries
- Applied to both web app and bot search

Bot group awareness:
- /group <query> — view group info or search groups by name
- /sendgroup <id> — send all packages in a group to linked Telegram account

Bulk repair:
- repairPackageAction clears dest info and resets watermark for re-processing
- Repair button in notification bell for MISSING_PART and HASH_MISMATCH alerts
- /api/notifications/repair endpoint

Retroactive category re-tagging:
- When channel category changes, auto-update tags on all existing packages
- Removes old category tag, adds new one

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:34:14 +02:00
7f9a03d4ee feat: group merge, ZIP/reply/caption grouping, integrity audit
Group merge UI:
- Add mergeGroups query and mergeGroupsAction server action
- Add "Start Merge" / "Merge Here" buttons to group row actions
- Two-step UX: click Start on source, click Merge Here on target

ZIP path prefix grouping (Signal 7):
- Compare PackageFile.path root folders across ungrouped packages
- Auto-group if 2+ packages share the same dominant root folder

Reply chain grouping (Signal 6):
- Capture reply_to_message_id during channel scanning
- Group archives that reply to the same root message
- Add replyToMessageId field to Package schema

Caption fuzzy match grouping (Signal 8):
- Capture source caption during channel scanning
- Normalize captions (strip extensions, extract significant words)
- Group packages with matching normalized caption keys
- Add sourceCaption field to Package schema

Periodic integrity audit:
- Check multipart packages for completeness (parts vs destMessageIds)
- Detect orphaned indexes (destChannelId set but no destMessageId)
- Runs after each ingestion cycle, deduplicates notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:19:36 +02:00
2c46ab0843 feat: pattern/creator grouping, notification UI, failure alerts
Pattern grouping (Signal 3):
- Extract YYYY-MM dates, month names, and project prefixes from filenames
- Auto-group packages sharing the same pattern within a channel
- Groups created with groupingSource=AUTO_PATTERN

Creator grouping (Signal 4):
- Auto-group 3+ ungrouped packages from the same creator within a channel
- Runs after pattern grouping as lowest-priority automatic signal

Notification UI:
- Add NotificationBell component to header with unread badge
- Popover panel shows recent notifications with severity icons
- Mark individual or all notifications as read
- Polls every 30 seconds for updates

Failure notifications:
- Upload/download failures now create SystemNotification records
- Visible in the notification bell alongside hash mismatch alerts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:43:55 +02:00
9e78cc5d19 feat: grouping phase 1 — schema, ungrouped tab, time-window grouping, hash verification
Schema:
- Add GroupingSource enum (ALBUM, MANUAL, AUTO_TIME, AUTO_PATTERN, etc.)
- Add groupingSource field to PackageGroup with backfill
- Add SystemNotification model for persistent alerts
- Add NotificationType and NotificationSeverity enums

Ungrouped staging tab:
- Add listUngroupedPackages/countUngroupedPackages queries
- Add "Ungrouped" tab to STL page showing packages without a group

Time-window auto-grouping:
- After album grouping, cluster ungrouped packages within configurable
  time window (default 5 min, AUTO_GROUP_TIME_WINDOW_MINUTES env var)
- Groups named from common filename prefix
- Groups created with groupingSource=AUTO_TIME

Hash verification after split:
- Re-hash split parts and compare to original contentHash
- Log error and create SystemNotification on mismatch
- Prevents silently corrupted split uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:00:27 +02:00
194c87a256 fix: raise size limit and make MAX_PART_SIZE configurable
All checks were successful
continuous-integration/drone/push Build is passing
- Raise WORKER_MAX_ZIP_SIZE_MB from 4GB to 200GB (production .env)
- Make MAX_PART_SIZE configurable via MAX_PART_SIZE_MB env var
  (default 1950 MiB, set to 3900 for Premium accounts)
- Remove hardcoded 1950 MiB constants in split.ts and worker.ts
- Add grouping system audit report with real-world failure cases

10 archives were blocked by the 4GB limit (up to 70.5GB).
They will be retried on next ingestion cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:41:37 +02:00
718007446f feat: fix multi-part archive forwarding and add kickstarter package linking
All checks were successful
continuous-integration/drone/push Build is passing
Multi-part send fix:
- Add destMessageIds BigInt[] to Package schema with backfill migration
- Worker uploadToChannel now returns all message IDs, stored in DB
- Bot forwards all parts of multi-part archives (not just the first)
- Add retry logic for upload rate limits (429) and download stalls

Kickstarter package linking:
- Add package search/linking queries and API routes
- Add PackageLinkerDialog with search + checkbox selection
- Add "Link Packages" and "Send All" actions to kickstarter table
- Add sendAllKickstarterPackages server action

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:11:35 +01:00
a4156b2ac6 fix: add race condition guard and null check in group queries
- createOrFindPackageGroup: catch unique constraint violation from
  concurrent creates and fall back to findFirst
- createManualGroup: guard against empty package results before
  accessing first element

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:45:29 +01:00
218ccb9282 feat: add album grouping post-processing to worker pipeline
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:28:19 +01:00
b632533f54 feat: add createOrFindPackageGroup and linkPackagesToGroup worker queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:24:31 +01:00
4baf5aad83 feat: capture media_album_id from TDLib messages during scanning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:23:47 +01:00
ad7790c07b feat: add mediaAlbumId to TelegramMessage and TelegramPhoto interfaces
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:23:11 +01:00
d6386209be fix: improve download/upload reliability and fix FILE_PARTS_INVALID
- Add downloadStarted flag to prevent false "stopped unexpectedly" errors
  when TDLib emits initial updateFile before download is active
- Add 5-minute stall detection for both downloads and uploads
- Reduce max split part size from 2GiB to 1950MiB to stay under
  Telegram's internal upload part count limits
- Increase timeouts from max(10min, 15min/GB) to max(15min, 20min/GB)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 21:40:00 +01:00
fe28c31b9e fix: improve worker error handling and reliability
All checks were successful
continuous-integration/drone/push Build is passing
1. Distinguish failure reasons: inspect error messages to label skipped
   packages as DOWNLOAD_FAILED, UPLOAD_FAILED, or EXTRACT_FAILED
   instead of catch-all DOWNLOAD_FAILED.

2. Detect orphaned uploads: before uploading, check if the same content
   hash already has a successful upload on the destination channel. Reuse
   the existing message ID instead of re-uploading (prevents duplicates
   when worker crashed between upload and DB write).

3. Increase timeouts: download from max(5min, GB*10min) to
   max(10min, GB*15min), upload from GB*10min to GB*15min.
   Prevents premature timeouts on slow connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 02:37:23 +01:00
d53e581623 feat: record skipped/failed archives in database for UI visibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:16:12 +01:00
9642adaba7 feat: raise default ingestion size limit from 4GB to 200GB
Multipart archives where individual parts fit under Telegram's 2GB limit
but total size exceeds 4GB were being silently skipped. These can now be
processed up to 200GB total, with each part uploading directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:01:41 +01:00
1425db8774 fix: use loadChats API and load chat folders for complete chat discovery
Some checks failed
continuous-integration/drone/push Build is failing
- Switch from getChats pagination to loadChats (the TDLib-recommended
  API) which properly loads all chats into TDLib's cache and signals
  completion with a 404 error
- Discover and load chat folders via getChatFolders so chats in
  user-created folders are included
- Load from main + archive + all folders in both worker startup and
  getAccountChats channel discovery
- After loading, use getChats with high limit to retrieve all cached IDs
- This ensures private chats, 1-on-1 conversations, Saved Messages,
  basic groups, and archived/folder chats are all discoverable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 22:38:49 +01:00
aef76828ef fix: support large accounts and archived chats in channel discovery
Some checks failed
continuous-integration/drone/push Build is failing
- Increase getChats pagination from 50 pages (5K chats) to 500 pages
  (50K chats) to support accounts with many channels/groups
- Load from both chatListMain AND chatListArchive so older/archived
  chats are discovered and scannable
- Deduplicate chat IDs across both lists
- Worker startup also loads both lists before scanning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:50:14 +01:00
29e95f780c feat: support all chat types in channel discovery and enrich bot messages
Channel Discovery:
- Remove channel/supergroup filter from getAccountChats — all chat types
  (private, groups, Saved Messages, etc.) are now discoverable as sources
- Detect and label the self-chat as "Saved Messages" via getMe
- Update channel picker dialog to accept any chat type string

Bot Rich Messages:
- Enhance package send preview with creator, file count, tags, and source
  channel info in MarkdownV2 caption
- Include tags in new_package subscription notifications
- Expand getPendingSendRequest to fetch richer package data

Performance:
- Reviewed pipeline for many-channel load — getChats pagination fix and
  per-channel getChat pre-load from prior commit address the main concerns
- Channels with no new messages skip in 2-3 API calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:27:48 +01:00
5fd341dfc4 feat: fix channel scanning bugs, add package tags, and kickstarters tab
Bug fixes:
- Fix channels not being scanned by paginating TDLib getChats (was only
  loading first batch, additional channels were unknown to TDLib)
- Add per-channel getChat pre-load as safety net before scanning
- Fix preview pictures not loading by checking previewData instead of
  previewMsgId for hasPreview flag
- Prevent previewMsgId from being set when preview download fails

Package Tags:
- Add tags Text[] column to Package with migration backfilling from
  channel categories
- Worker auto-inherits source channel category as initial tag
- Tag filter dropdown and Tags column in STL Files table
- Server actions for individual and bulk tag editing

Kickstarters Tab:
- New KickstarterHost, Kickstarter, and KickstarterPackage models
- Full CRUD with delivery status, payment status, host management
- Package linking (many-to-many with existing packages)
- Sidebar entry with Gift icon
- Table with search, filters, modal forms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 18:17:44 +01:00
admin
7cd84dbf02 fix: map ArchiveFormat '7Z' to ArchiveType 'SEVEN_Z' in rebuild
All checks were successful
continuous-integration/drone/push Build is passing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 11:03:09 +01:00
admin
ab558e00f5 feat: add preview management, channel controls, invite polish, and recovery
- Auto-extract preview images from ZIP/RAR/7z archives during ingestion
- Upload custom preview images via package drawer
- Select preview from archive contents with on-demand extraction UI
- Manually add Telegram channels by t.me link, username, or invite link
- Invite code UX: bulk create, copy link, usage tracking, delete confirm
- Incomplete upload recovery: verify dest messages on worker startup
- Rebuild package DB by scanning destination channel with live progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 00:09:59 +01:00
admin
bf093cdfca fix: 7z parser handles solid archives with empty Compressed column 2026-03-21 21:18:33 +01:00
admin
a90f653314 feat: add 7z archive content listing via p7zip
- Add p7zip-full to worker Docker image
- New read7zContents() parser using 7z l output
- 7z archives now get full file listings like ZIP/RAR
- Standalone DOCUMENT types still show as single entry
2026-03-21 21:13:58 +01:00
admin
36a7e3d5f4 feat: add channel categories and improved creator detection
- Add category field to TelegramChannel (filterable tag like STL, PDF, D&D)
- Category column in channels table with edit via dropdown menu
- Improved creator extraction: filename patterns + channel title fallback
- extractCreatorFromChannelTitle strips [Completed], (Paid), emoji, etc.
- Fix ArchiveType in PackageListItem and PackageRow for new types
- Add Prisma migration for category column
2026-03-21 20:37:44 +01:00