dragonsstash

mirror of https://github.com/xCyanGrizzly/DragonsStash.git synced 2026-06-13 04:31:16 +00:00

Author	SHA1	Message	Date
xCyanGrizzly	25a6196262	fix(worker): skip integrity test for multipart ZIPs — unzip -t can't span them All checks were successful continuous-integration/drone/push Build is passing Details Diagnosed from production: main downloaded several 28 GB ZIP sets (CA 3D STUDIOS 2023-07.zip.001..007, 2023-08.zip.001..006, ...) and rejected every one of them with: "Archive integrity check failed: Command failed: unzip -tqq /tmp/zips/.../CA 3D STUDIOS 2023-07.zip.001" Root cause: the integrity test I added in `04effed` passed `uploadPaths[0]` to the archive tester. For byte-split multipart ZIPs (`.zip.001`, `.zip.002`, ...), the first chunk isn't a valid ZIP on its own — the central directory only exists at the END of the assembled archive. unzip's spanned-ZIP support uses `.z01/.z02/.../.zip` naming, not `.zip.001/.002`, so even pointing at the assembled-form parts wouldn't help. Three correctness changes: 1. Test runs on `tempPaths[0]` (the original downloaded file) instead of `uploadPaths[0]` (which may be byte-split chunks we created). For single-file ZIPs we re-split, this still tests the unsplit original. 2. Skip the test entirely when archiveType=ZIP AND tempPaths.length>1 — these are source multipart ZIPs we can't validate without concatenating, and the hash check + central-directory parse we already do are sufficient structural signals. 3. RAR and 7Z multipart still ARE tested — `unrar t` and `7z t` both auto-discover sibling parts when pointed at the first one. This unblocks all multipart-ZIP ingestion for the main account. Hours of downloaded archives that were being rejected will now pass through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 11:15:07 +02:00
xCyanGrizzly	166dc556c9	fix(worker): match old General-topic progress rows under new TDLib forum_topic_id All checks were successful continuous-integration/drone/push Build is passing Details After the TDLib 1.8.50 → 1.8.64 upgrade, the worker now correctly enumerates all forum topics in MPE (1,086 of them) — a huge win. But a data-shape mismatch was about to bite us: TDLib changed how the General topic is identified. TDLib 1.8.50: info.message_thread_id = 1048576 (magic constant) TDLib 1.8.64: info.forum_topic_id = 1 Existing topic_progress rows for General carry topicId=1048576. The worker looks up progress via `topicProgressList.find(tp => tp.topicId === topic.topicId)`, which fails for General under the new TDLib → progress becomes null → the scan starts from message 0. For MPE specifically, that means re-scanning all ~378k General-topic messages. Dedup catches the previously-ingested ones (no double upload), but it burns hours of bandwidth before the watermark catches up. Fix: when topicId lookup misses for a topic named "General", fall back to a name match. The first watermark write after that saves under the new ID (1), so future runs hit the topicId match directly without the fallback. The orphaned 1048576 row stays as harmless dead data — we don't delete it in case a TDLib downgrade or revert ever happens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 21:43:39 +02:00
xCyanGrizzly	e8daabd28d	fix(tdlib): handle 1.8.64 renames in searchChatMessages + message reply_to All checks were successful continuous-integration/drone/push Build is passing Details Audit of every TDLib call site against the live 1.8.64 schema in node_modules/@prebuilt-tdlib/types/tdlib-types.d.ts surfaced three additional silent breakages beyond the getForumTopics fix in `106700b`. 1. searchChatMessages parameter restructure The top-level `message_thread_id` and `saved_messages_topic_id` request fields were collapsed into a single tagged-union `topic_id: MessageTopic$Input`. Three call sites affected: - topics.ts getTopicMessages — was passing message_thread_id, now sends topic_id with the messageTopicForum variant carrying forum_topic_id. Without this the topic scan returns the whole channel (or nothing) instead of just the topic. - download.ts getChannelMessages — used to pass message_thread_id: 0; just omit the topic_id field entirely for a flat scan. - rebuild.ts — same treatment. 2. message.reply_to_message_id replaced with reply_to tagged union On incoming messages, the flat `reply_to_message_id` field was replaced with `reply_to: MessageReplyTo` (messageReplyToMessage or messageReplyToStory). Our reply-chain grouping needs the message-ID case. Added extractReplyToMessageId() that reads both old and new shapes so a transition build or future downgrade still works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 16:45:06 +02:00
xCyanGrizzly	106700b13f	fix(topics): handle TDLib 1.8.64 renamed forum-topic fields After the TDLib upgrade in `18a0efb`, getForumTopicList returned 0 topics for every forum channel. Confirmed in production logs: "title":"Model Printing Emporium","topicCount":0 "title":"GB_Butler_Bot2","topicCount":0 "title":"Darnascus 2 : Flamigos Miniatures","topicCount":0 Cycle results: messagesScanned=0, zipsFound=0 — main account's entire ingestion pipeline was a no-op because all source channels are forums. Root cause: TDLib 1.8.64 renamed three fields without bumping the breaking-change indicator we'd notice: Request offset_message_thread_id → offset_forum_topic_id Response next_offset_message_thread_id → next_offset_forum_topic_id Response topics[].info.message_thread_id → topics[].info.forum_topic_id The old field names became no-ops in the new TDLib, so every request came back with an empty topic list and the "stuck pagination" detection correctly bailed out. Fix: send the new field name on the request side, read both old and new names on the response side (so a future TDLib version change in either direction stays handled). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 16:18:08 +02:00
xCyanGrizzly	04effed825	feat(verify): pre-upload integrity test, post-upload read-back, batched recovery All checks were successful continuous-integration/drone/push Build is passing Details Three independent verification improvements landing together. 1. Pre-upload archive integrity test (testArchiveIntegrity) Before sending an archive to the destination channel, runs the appropriate CLI test: - unzip -t for ZIP - unrar t for RAR - 7z t for SEVEN_Z Catches truncated downloads, internal CRC errors, bad central directories, and password-protected archives BEFORE we burn upload bandwidth on a file that can't be extracted. Encrypted archives are specifically flagged so the SkippedPackage error message is clear. 2. Post-upload destination read-back updateMessageSendSucceeded tells us Telegram accepted the upload, but says nothing about whether the destination message actually contains the file we sent. After each successful upload, getMessage each destMessageId and confirm document.size matches uploadPaths[i]'s on-disk size. Mismatches don't abort ingestion — they surface as HASH_MISMATCH / UPLOAD_FAILED SystemNotifications so the admin can see them in the UI and decide whether to recover. 3. Batched recovery (verifyMessagesBatch) recoverIncompleteUploads previously called getMessage (singular) per Package — at 20k packages that's 20k round-trips. Switched to TDLib's getMessages (plural) with batch size 100 → 200 round-trips. On 20k packages this is ~100x faster. Per-message fallback if a whole batch errors out, so one bad batch never loses all verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 08:56:50 +02:00
xCyanGrizzly	c4d9be83bd	feat(worker): auto-tag packages with the slicer(s) their files target Indexes 86k+ Lychee Slicer (.lys/.lyt), 23k+ ChituBox (.chitubox/.ctb/ .cbddlp), 1k+ Anycubic (.photon/.pwmo/.pwmx), and Bambu (.3mf) slicer-specific files. Until now they were just generic extensions in PackageFile. After this commit: - Newly-ingested packages get tags derived from their file list ("lychee", "chitubox", "anycubic", "bambu", "fdm", "mango") - The `backfill_filelists` listener also applies tags to re-indexed packages - A new pure-DB listener `backfill_slicer_tags` walks existing Packages with file lists and applies tags retroactively — no downloads, no TDLib, takes seconds for thousands of rows. Trigger the one-shot retroactive backfill with: SELECT pg_notify('backfill_slicer_tags', '{"limit":5000}'); Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 08:53:18 +02:00
xCyanGrizzly	7d39a13310	feat(worker): use TDLib remote.unique_id as zero-false-positive dedup signal The fileName + size repost detection from `ff4e150` works but has a theoretical false-positive: two unrelated files in the same channel with identical names and identical total sizes get treated as duplicates. TDLib's document.remote.unique_id is a stable identifier per file content — every repost of the exact same file across messages keeps the same unique_id. Using it as the first dedup check eliminates the false-positive risk entirely. Schema: - Package.remoteUniqueId (nullable, since existing rows lack it) - Index on (sourceChannelId, remoteUniqueId) Pipeline: 1. Capture remoteUniqueId in getChannelMessages + getTopicMessages 2. Pass through TelegramMessage type 3. processOneArchiveSet checks findPackageByRemoteUniqueId FIRST (before packageExistsBySourceMessage / findRepostedPackage) 4. createPackageStub stores it on the new Package row Existing 19,952 Packages have remoteUniqueId = NULL — they fall through to the existing checks (source-msg-id, name+size, content-hash). New ingestions populate it and benefit from the strong signal immediately. Old Packages get backfilled organically when their content is re-encountered and a new Package would otherwise be created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 08:50:24 +02:00
xCyanGrizzly	18a0efb3d4	chore(tdlib): upgrade tdl 8.0.0 → 8.1.0 and prebuilt-tdlib 1.8.50 → 1.8.64 12 versions of TDLib bug fixes, performance improvements, and stricter type definitions in @prebuilt-tdlib/types. Two API breakages handled: 1. `getChatFolders` (plural) was removed — folder IDs now arrive via the `updateChatFolders` update event. Replaced the synchronous call with a 200ms event listener; if no folders arrive, we proceed with just main + archive lists. Chats inside folders are still reachable from chatListMain so this isn't a functional regression. 2. The new tdl `Client.invoke` signature requires a literal `_` field and rejects `Record<string, any>` shapes. Our `invokeWithTimeout` wrapper is intentionally generic — cast through `any` at the call site with a comment explaining why. Both worker and bot type-check + build cleanly with the new versions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 08:45:44 +02:00
xCyanGrizzly	2ccc9820cd	fix(recovery): distinguish 'message gone' from 'TDLib couldn't tell us' The old verifyMessageExists returned a bare boolean. Any error other than HTTP 404 was treated as "exists" — meaning a TDLib connection problem or transient TG hiccup at recovery time caused the worker to declare "all destination messages verified" when it had actually verified nothing. Replaced with a discriminated VerifyResult: - exists — message present and is a document, keep Package - deleted — TG confirms it's gone (404 / MESSAGE_ID_INVALID / "Message not found"), reset Package for re-upload - wrong-content — message exists but isn't messageDocument, reset - unknown — TDLib threw a non-404 error; do NOT reset, retry next startup Recovery summary now reports all four counts and switches to a non-success message when unknownCount > 0, so a degraded TDLib run doesn't hide behind a green log line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 00:43:09 +02:00
xCyanGrizzly	c72b5a4b48	feat(worker): add backfill_filelists pg_notify listener Companion to `0bdd4ba` (RAR parser fix). 4,380 RAR packages and ~450 ZIP/7Z packages in the DB have fileCount=0 because of the old broken parser (and a handful of edge cases). This adds an on-demand backfill that re-indexes their file lists. Triggered by: SELECT pg_notify('backfill_filelists', '{"limit":50,"archiveType":"RAR"}'); Both payload fields are optional. archiveType filters to ZIP/RAR/SEVEN_Z; default limit is 100. Multiple notifications queue sequentially so TDLib downloads don't compete for the per-account mutex. For each candidate: 1. Resolve destChannel.telegramId from the Package 2. getMessage for each destMessageId in destMessageIds[] (handles multipart) to recover the file_id from Telegram 3. downloadFile (uses TDLib cache when available — most are fast) 4. Run readZipCentralDirectory / readRarContents / read7zContents 5. Transactionally replace PackageFile rows + update fileCount Re-check of fileCount inside the transaction ensures a concurrent backfill from another worker (or a fresh ingestion of the same archive) doesn't get clobbered. Prefers the Premium account when both are linked, for faster downloads and to avoid the speed-limit throttling on the secondary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 00:42:01 +02:00
xCyanGrizzly	0bdd4ba0cc	fix(rar-reader): use unrar lt (technical) so file listings actually work Diagnosed from production: all 4,380 RAR packages in the database have fileCount = 0. The old parser used \`unrar l -v\` and a regex that expected an 8-column \`Attributes Size Packed Ratio% Date Time CRC32 Name\` output. unrar 6.21's actual \`l -v\` output is 5 columns: \`Attributes Size Date Time Name\` — no Packed, no Ratio, no CRC32. So every RAR silently parsed to zero entries. Switch to \`unrar lt\` (list technical), which emits one block per file with key:value lines: Name: Lost Kingdom 2023 01 January/Nagas/NagaCaptainBody.stl Type: File Size: 22503584 Packed size: 21430123 CRC32: A1B2C3D4 ... The new parser tokenizes blocks on blank lines and matches "key: value" lines per block. Handles multi-word keys ("Packed size", "Host OS") and gracefully skips Directory entries and the archive header block. Also tolerates BLAKE2sp checksums for newer RAR archives. Verified against a live 644MB RAR with 201 entries (194 files, 7 dirs); parser returns 194 entries with correct paths, sizes, and CRC32s. Future RAR ingestions will populate fileCount and PackageFile rows correctly. Backfilling existing 4,380 packages requires a separate pass — added in a follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 00:38:46 +02:00
xCyanGrizzly	901f32ff41	feat(worker): retry old SkippedPackages + prefer specific topics over General Three connected safeguards driven by user feedback after deploying the incremental watermark and repost-detection fixes. 1. SkippedPackage retry pass (watermark pull-back) The auto-retry chain (`d99a506` + watermark cap) only works for failures that occur AFTER the fix is deployed. Pre-existing SkippedPackages may sit below the current watermark — example from prod: secondary's "Turnbase Delivery Folder.7z" at msgId 37,109,104,640 vs watermark 37,111,201,792. The auto-retry never sees it. Before scanning each channel/topic, we now query SkippedPackages with attemptCount < cap for that scope and pull the watermark back to (lowestSkippedMsgId - 1n) when needed. Both forum and non-forum branches handle this. 2. Topic scan order: specific topics first, General last In forum channels, files often appear in both a specific topic (e.g., "Artisan Guild January 2022") AND in General. The first encounter created the Package and locked in the topic context. If we happened to scan General first, the Package recorded the less-informative topic. We now sort topics so General is processed last. New Packages get the more specific topic name as their context by default. 3. Backfill specific topic on existing Packages For Packages that were already created with General topic context, when findRepostedPackage matches and the current scan is in a more specific topic, update the existing Package's sourceTopicId (and creator, if it was derived from "General") to the more specific one. Audit log shows both old and new topic IDs. The findRepostedPackage query also got an ORDER BY so it returns the most-specific existing match (non-null sourceTopicId first) when multiple Packages share the same filename + size in a channel — giving the audit log richer context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 09:02:54 +02:00
xCyanGrizzly	ff4e150544	fix: skip download when the same file was already uploaded from this channel Diagnosed from production: in 8 hours of main's current run, zero uploads happened despite the worker being busy 100% of the time. Logs showed continuous "Downloading archive part" entries with no corresponding upload activity. Root cause: the source channel ("Model Printing Emporium") frequently reposts the same file at new Telegram message IDs. Concrete example from the DB: - "(EN) PaintGuides All.zip" → present 6 times, msgIds 44B → 92B - "00 Welcome Pack.7z" → present 2 times, msgIds 91B and 177B - "FanteZi April 2022-...zip" → uploaded May 8 at msgId 24,697,110,528; current run re-downloading at 87,488,987,136 packageExistsBySourceMessage(channelId, msgId) correctly misses because the msgId is different. We download the (potentially gigabyte-sized) file, hash it, then packageExistsByHash hits and we discard the download. ~30 seconds wasted per repost x thousands of reposts = whole runs spent uploading nothing. Fix: add findRepostedPackage(sourceChannelId, fileName, fileSize) — a pre-download check that catches reposts by the strong (channel + name + total size) signal. On hit, skip the set entirely. Watermark advances normally (no minFailedId tracking) so the next cycle sees the channel as caught up. False-positive risk: two unrelated files in the same channel with identical name AND identical total fileSize. Extremely rare in practice; if it ever happens, the new file is silently treated as a duplicate. Logged at info level with the existing Package ID and dest message ID so the user can audit if a file is mysteriously missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 08:54:20 +02:00
xCyanGrizzly	77aeb4cc00	fix: advance channel/topic watermark incrementally per successful set All checks were successful continuous-integration/drone/push Build is passing Details Diagnosed from production logs for the main (Premium) account: RUNNING 2026-05-21 → in progress, 22h ingested: 0 FAILED 2026-05-14 → 2026-05-21 (7.4d) ingested: 5,426 (killed by restart) FAILED 2026-05-06 → 2026-05-14 (7.7d) ingested: 8,300 (killed by restart) Main's two source channels have 378k+ messages each. A full scan takes days, but the worker gets restarted (container update, cycle timeout, etc.) every few days. updateLastProcessedMessage was only called at the END of a channel's scan — so the watermark on AccountChannelMap stayed NULL through restart after restart, and every new run re-scanned from message 0. That explains the user's symptom: "main wasn't uploading although it said it did". The dashboard showed currentStep alternating through downloading / hashing / deduplicating, but zipsIngested stayed at 0 because every archive the run encountered was already a hash-duplicate of something uploaded by a previous run. Fix: processArchiveSets now accepts an onWatermarkAdvance callback. After each successful set (ingested OR confirmed duplicate), the callback fires with a watermark capped below the current minFailedId. Both call sites (forum/topic and non-forum) wire it to upsertTopicProgress / updateLastProcessedMessage. The end-of-scan write is retained for the no-archives and all-failures-with-fallback cases. Worst-case progress loss on restart now is one in-flight archive set, not the entire scan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:20:20 +02:00
xCyanGrizzly	3b327eb3f3	feat(app): show attempt count column on the Skipped Packages tab attemptCount goes through SkippedPackageItem and SkippedRow into a new column on the data table. Badge color cues: - outline (1) first failure, will auto-retry next cycle - secondary (2-4) has retried but still below cap - destructive (>=5) hit the cap; will not auto-retry until reset The "Skipped" column is renamed to "Last Skipped" since the timestamp now reflects the most recent attempt, not the first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:08:08 +02:00
xCyanGrizzly	379bf246cd	feat(worker): per-account safeguards for second-account upload failures Driven by a real production case: secondary account was attached to 17 source channels but ingesting only ~2-3 archives per cycle. Log analysis showed three distinct issues that this commit addresses. 1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5) processArchiveSets now filters out SkippedPackage rows whose attemptCount has reached the cap. Removing them from the working list means they are not tracked in minFailedId, so the watermark cap from `d99a506` does not pin progress below them anymore. A bad file no longer blocks the rest of the channel forever; the user can manually retry via the UI to reset the count. 2. Account phone in error messages Every SkippedPackage row and SystemNotification produced from a failure is now prefixed with [<phone>] in errorMessage / message, and the JSON context includes accountPhone. When two accounts share a source channel and only one is failing, the UI tells you which one. 3. Explicit getChat for destination at run start loadChats only loads main/archive/folder chat lists. If an account archived or moved the destination chat, sendMessage failed silently per-archive. Now we getChat the destination once per cycle; on failure we record a SystemNotification and skip the account's entire ingestion cycle (no point downloading what we can't upload). 4. Retry on transient Telegram server errors The "Turnbase Delivery Folder.7z" failure on the secondary and "10. Kingdom of the Depth.part1.rar" on the main were both "Internal Server Error during file upload" — a TG-side hiccup, not a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES with linear backoff (15s, 30s, 45s + jitter) before giving up. 5. Channel-access-lost notification "Iridium 2 w/ Add-ons [Completed]" has been throwing "Can't access the chat" every cycle for the secondary. The worker now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per 24h per channel/account) so the admin sees it and can re-join or unlink the channel instead of just losing visibility into the loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:07:57 +02:00
xCyanGrizzly	7a79b52baf	feat(db): add attemptCount on SkippedPackage + CHANNEL_ACCESS_LOST enum attemptCount tracks how many times the worker has tried each failed source message. Combined with WORKER_MAX_SKIP_ATTEMPTS (default 5), the worker will auto-retry across cycles but eventually let the watermark advance past a chronically failing file so cycles aren't pinned forever. The SkippedPackage row stays so the user can manually retry via the UI. CHANNEL_ACCESS_LOST is a new notification type the worker emits when a source channel becomes inaccessible (account got removed, channel deleted, etc.) — surfaces the issue instead of silently failing every cycle as we've been doing with "Iridium 2 w/ Add-ons [Completed]". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:07:40 +02:00
xCyanGrizzly	26e2cba69d	fix: buffer upload confirmation events to close tempMsgId race sendMessage resolves with the temporary message ID inside a .then() microtask. If TDLib emits updateMessageSendSucceeded synchronously (cached file, already-known media), the event handler fires while tempMsgId is still null — the success is dropped and the promise hangs until the 15-min upload timeout fires. Buffer success/failure events that arrive before tempMsgId is known, then replay them in the .then() callback once tempMsgId is set. Extract completeWithSuccess / completeWithFailure helpers so the resolution path is shared between live events and replayed events. This race matters more now that stalls fail fast — without the buffer, a fast-completing upload could still hang for 15 min before recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 22:48:38 +02:00
xCyanGrizzly	84cc8d995b	fix: fail fast on upload stall instead of retrying on broken client Previously a single TDLib event-stream degradation cost ~45 minutes per archive: 3 retries x 15-min minimum timeout, all on the same broken client. The retries had no chance of succeeding because the underlying issue (missing updateMessageSendSucceeded events) is a client-level problem, not a transient send failure. Now the first stall throws UploadStallError immediately. The caller in processArchiveSets already recreates the TDLib client on UploadStallError, so we drop from ~45 min recovery to ~15 min (one timeout cycle) per stalled archive. The stalled set is recorded in SkippedPackage; with the watermark cap from `d99a506` it gets retried on the next ingestion cycle with a fresh client. FLOOD_WAIT retries inside sendWithRetry are unchanged — those handle legitimate rate limiting, not stalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 22:47:08 +02:00
xCyanGrizzly	d99a506b10	fix: cap watermark below failed sets so failures retry next cycle Previously the channel/topic watermark could advance past failed archive sets in two ways: 1. A later successful set raised maxProcessedId past a failed earlier set within the same scan. 2. scanResult.maxScannedMessageId was used as fallback even when archives in the scan had failed (added in `77c26ad` to prevent re-scanning empty channels). Both paths buried failed archives below the watermark on the next cycle — they sat permanently in SkippedPackage with no auto-recovery. Now processArchiveSets returns the lowest failed source message ID alongside the highest processed one. The caller caps the watermark at (minFailedId - 1n) so the next scan re-includes the failed messages and processOneArchiveSet retries them. Successful sets above the failure boundary are not re-uploaded — packageExistsBySourceMessage early-skips them on the second pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 22:46:26 +02:00
xCyanGrizzly	59038889ae	fix: prevent pool exhaustion that caused 4-hour duplicate check stall All checks were successful continuous-integration/drone/push Build is passing Details The pg pool had max=5 connections shared between Prisma operations and advisory locks. With 2 account locks held permanently and hash locks from timed-out (but still running) background work, pool.connect() would block forever — causing the Turnbase.7z stall. - Increase pool max from 5 to 15 for headroom - Add 30s connectionTimeoutMillis so pool.connect() throws instead of hanging forever when the pool is exhausted - On startup, terminate zombie PostgreSQL sessions from previous worker instances that hold stale advisory locks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 20:39:00 +02:00
xCyanGrizzly	77c26adb31	perf: set watermarks even when no archives found to prevent re-scanning All checks were successful continuous-integration/drone/push Build is passing Details Previously, channels/topics with no new archives never had their watermark updated. This meant every cycle re-scanned all messages from scratch just to discover nothing new — especially costly for the 1079- topic Model Printing Emporium forum. - Add maxScannedMessageId to ChannelScanResult (highest msg ID seen) - Set channel watermark to scan boundary when no archives are found - Set topic watermark to scan boundary when no archives are found - Fall back to scan watermark when archive processing doesn't advance it After one full cycle, subsequent cycles will skip already-scanned messages via the early-exit boundary check, dramatically reducing TDLib API calls on channels with mostly non-archive content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 20:37:42 +02:00
xCyanGrizzly	35cce3151c	perf: early-exit channel scan when all messages are below watermark searchChatMessages returns newest-first. Once the oldest message on a page is at or below the lastProcessedMessageId boundary, all remaining pages are even older. Stop scanning immediately instead of reading every message in the channel. This was already implemented for topic scans but missing from channel scans. On a test run, total messages scanned dropped from 3805 to 1615 (57% reduction) for an account with no new archives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 19:58:30 +02:00
xCyanGrizzly	d6c82ede1e	fix: auto-recover from TDLib upload stalls by recreating client When TDLib's event stream degrades, uploads complete (bytes sent) but confirmations never arrive. Previously the worker retried 3x with the same broken client, wasting 60+ min per archive and holding the mutex. - Add UploadStallError class to distinguish stalls from other failures - Reduce stall detection timeout from 5min to 3min (faster detection) - Recreate TDLib client after consecutive upload stalls instead of retrying on the same degraded connection - Add forceReleaseMutex() to prevent cascade failures when one account blocks others via stuck mutex after cycle timeout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 18:02:42 +02:00
xCyanGrizzly	7e48131f67	fix: clear timeout on race settlement to prevent orphaned timers All checks were successful continuous-integration/drone/push Build is passing Details	2026-05-02 23:44:18 +02:00
xCyanGrizzly	a79cb4749b	fix: use per-account mutex keys in fetch/extract listeners, add cycle timeout and error logging	2026-05-02 23:40:37 +02:00
xCyanGrizzly	e9017fc518	feat: parallel account ingestion via per-key TDLib mutex	2026-05-02 23:31:02 +02:00
xCyanGrizzly	4f59d19ac2	feat: apply per-account Premium 4GB upload limit to bypass repacking	2026-05-02 23:28:00 +02:00
xCyanGrizzly	579276ee2d	fix: widen hash lock try/finally to prevent lock leak on error paths	2026-05-02 23:24:08 +02:00
xCyanGrizzly	b48cc510a4	feat: add two-phase DB write and hash advisory lock to prevent double-uploads	2026-05-02 23:13:55 +02:00
xCyanGrizzly	614c8e5b74	feat: add createPackageStub and updatePackageWithMetadata for two-phase DB write	2026-05-02 23:06:17 +02:00
xCyanGrizzly	3019c23f70	feat: add per-content-hash advisory lock to prevent concurrent duplicate uploads Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 23:04:43 +02:00
xCyanGrizzly	436a576085	feat: detect and persist Telegram Premium status after authentication After TDLib login completes, calls getMe() to detect isPremium, persists it to DB via updateAccountPremiumStatus, and returns { client, isPremium } from createTdlibClient. All callers updated to destructure accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 23:02:46 +02:00
xCyanGrizzly	f454303352	feat: add isPremium field to TelegramAccount Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 22:58:53 +02:00
xCyanGrizzly	e29bd79d66	chore: ignore .worktrees directory	2026-05-02 22:54:56 +02:00
xCyanGrizzly	61e61d0085	docs: add worker improvements implementation plan 7-task plan covering double-upload fix (hash lock + two-phase write), parallel account ingestion (per-key mutex), and Premium 4GB upload limit with automatic detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 22:47:52 +02:00
xCyanGrizzly	925d916a3c	Merge branch 'main' of https://github.com/xCyanGrizzly/DragonsStash	2026-05-02 22:38:32 +02:00
xCyanGrizzly	27bacaf24c	docs: add worker improvements design spec Covers double-upload fix (two-phase DB write + hash advisory lock), parallel account processing (remove TDLib mutex), and per-account Premium 4GB upload limit with automatic is_premium detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 22:35:27 +02:00
xCyanGrizzly	be4daf950b	fix: correct User table reference in manual_uploads migration All checks were successful continuous-integration/drone/push Build is passing Details The FK referenced "users" but the actual table is "User" (no @@map in Prisma schema). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 21:29:55 +02:00
xCyanGrizzly	af7094637d	feat: file upload from UI, notification dismiss, audit false positive fix Manual file upload: - Upload dialog in STL page with drag-and-drop file picker - Files saved to shared Docker volume (/data/uploads) - Worker processes via pg_notify('manual_upload') channel - Hashes, reads metadata, splits >2GB, uploads to Telegram - Multiple files automatically grouped - Status polling shows upload/processing/complete states Notification fixes: - Add dismiss (X) button on each notification - Add "Clear" button to remove all notifications - Fix false positive MISSING_PART alerts from legacy packages (only flag when >1 destMessageIds stored but count wrong, not when only 1 ID from backfill) Infrastructure: - ManualUpload + ManualUploadFile schema + migration - Shared manual_uploads Docker volume between app and worker - Upload API routes (POST /api/uploads, GET /api/uploads/[id]) - Worker manual-upload processor with full pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 20:26:06 +02:00
xCyanGrizzly	f4aa9d9a2f	feat: complete remaining features — training, FTS, bot groups, repair, re-tag All checks were successful continuous-integration/drone/push Build is passing Details Manual override training (GroupingRule): - Learn patterns from manual group creation (common filename prefix or creator) - Apply learned rules as first auto-grouping pass (highest confidence after albums) - GroupingRule model stores pattern, channel, signal type, confidence Hash verification after upload: - Re-hash upload files on disk before indexing to catch disk corruption - Creates HASH_MISMATCH notification on discrepancy Grouping conflict detection: - After all grouping passes, check if grouped packages match rules from different groups - Creates GROUPING_CONFLICT notification for manual review Per-channel grouping flags: - Add autoGroupEnabled boolean to TelegramChannel (default true) - Auto-grouping passes (all except album) gated behind this flag - Album grouping always runs as it reflects Telegram's native behavior Full-text search (tsvector): - Add searchVector tsvector column with GIN index and auto-update trigger - Backfill 1870 existing packages - FTS with ts_rank for ranked results, ILIKE fallback for short/failed queries - Applied to both web app and bot search Bot group awareness: - /group <query> — view group info or search groups by name - /sendgroup <id> — send all packages in a group to linked Telegram account Bulk repair: - repairPackageAction clears dest info and resets watermark for re-processing - Repair button in notification bell for MISSING_PART and HASH_MISMATCH alerts - /api/notifications/repair endpoint Retroactive category re-tagging: - When channel category changes, auto-update tags on all existing packages - Removes old category tag, adds new one Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:34:14 +02:00
xCyanGrizzly	7f9a03d4ee	feat: group merge, ZIP/reply/caption grouping, integrity audit Group merge UI: - Add mergeGroups query and mergeGroupsAction server action - Add "Start Merge" / "Merge Here" buttons to group row actions - Two-step UX: click Start on source, click Merge Here on target ZIP path prefix grouping (Signal 7): - Compare PackageFile.path root folders across ungrouped packages - Auto-group if 2+ packages share the same dominant root folder Reply chain grouping (Signal 6): - Capture reply_to_message_id during channel scanning - Group archives that reply to the same root message - Add replyToMessageId field to Package schema Caption fuzzy match grouping (Signal 8): - Capture source caption during channel scanning - Normalize captions (strip extensions, extract significant words) - Group packages with matching normalized caption keys - Add sourceCaption field to Package schema Periodic integrity audit: - Check multipart packages for completeness (parts vs destMessageIds) - Detect orphaned indexes (destChannelId set but no destMessageId) - Runs after each ingestion cycle, deduplicates notifications Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:19:36 +02:00
xCyanGrizzly	2c46ab0843	feat: pattern/creator grouping, notification UI, failure alerts Pattern grouping (Signal 3): - Extract YYYY-MM dates, month names, and project prefixes from filenames - Auto-group packages sharing the same pattern within a channel - Groups created with groupingSource=AUTO_PATTERN Creator grouping (Signal 4): - Auto-group 3+ ungrouped packages from the same creator within a channel - Runs after pattern grouping as lowest-priority automatic signal Notification UI: - Add NotificationBell component to header with unread badge - Popover panel shows recent notifications with severity icons - Mark individual or all notifications as read - Polls every 30 seconds for updates Failure notifications: - Upload/download failures now create SystemNotification records - Visible in the notification bell alongside hash mismatch alerts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:43:55 +02:00
xCyanGrizzly	9e78cc5d19	feat: grouping phase 1 — schema, ungrouped tab, time-window grouping, hash verification Schema: - Add GroupingSource enum (ALBUM, MANUAL, AUTO_TIME, AUTO_PATTERN, etc.) - Add groupingSource field to PackageGroup with backfill - Add SystemNotification model for persistent alerts - Add NotificationType and NotificationSeverity enums Ungrouped staging tab: - Add listUngroupedPackages/countUngroupedPackages queries - Add "Ungrouped" tab to STL page showing packages without a group Time-window auto-grouping: - After album grouping, cluster ungrouped packages within configurable time window (default 5 min, AUTO_GROUP_TIME_WINDOW_MINUTES env var) - Groups named from common filename prefix - Groups created with groupingSource=AUTO_TIME Hash verification after split: - Re-hash split parts and compare to original contentHash - Log error and create SystemNotification on mismatch - Prevents silently corrupted split uploads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 13:00:27 +02:00
xCyanGrizzly	194c87a256	fix: raise size limit and make MAX_PART_SIZE configurable All checks were successful continuous-integration/drone/push Build is passing Details - Raise WORKER_MAX_ZIP_SIZE_MB from 4GB to 200GB (production .env) - Make MAX_PART_SIZE configurable via MAX_PART_SIZE_MB env var (default 1950 MiB, set to 3900 for Premium accounts) - Remove hardcoded 1950 MiB constants in split.ts and worker.ts - Add grouping system audit report with real-world failure cases 10 archives were blocked by the 4GB limit (up to 70.5GB). They will be retried on next ingestion cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:41:37 +02:00
xCyanGrizzly	718007446f	feat: fix multi-part archive forwarding and add kickstarter package linking All checks were successful continuous-integration/drone/push Build is passing Details Multi-part send fix: - Add destMessageIds BigInt[] to Package schema with backfill migration - Worker uploadToChannel now returns all message IDs, stored in DB - Bot forwards all parts of multi-part archives (not just the first) - Add retry logic for upload rate limits (429) and download stalls Kickstarter package linking: - Add package search/linking queries and API routes - Add PackageLinkerDialog with search + checkbox selection - Add "Link Packages" and "Send All" actions to kickstarter table - Add sendAllKickstarterPackages server action Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 18:11:35 +01:00
xCyanGrizzly	527aca7c25	feat: add package grouping for Telegram album files All checks were successful continuous-integration/drone/push Build is passing Details Groups related packages posted together in Telegram channels. Auto-detects albums via media_album_id, supports manual grouping from UI. Groups appear as collapsible rows in STL files table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:46:52 +01:00
xCyanGrizzly	a4156b2ac6	fix: add race condition guard and null check in group queries - createOrFindPackageGroup: catch unique constraint violation from concurrent creates and fall back to findFirst - createManualGroup: guard against empty package results before accessing first element Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:45:29 +01:00
xCyanGrizzly	d50c68f67c	feat: add package grouping UI with expand/collapse, selection, and manual grouping - Update STL page to use listDisplayItems query for mixed package/group display - Rewrite package-columns to handle StlTableRow union type (group headers + packages) - Add group expand/collapse with chevron toggle and indented member rows - Add checkbox selection with "Group N Selected" toolbar button and dialog - Add inline group actions: rename, dissolve, send all, remove member - Add clickable group preview thumbnail with file upload for preview images - Extend DataTable with optional rowClassName prop for group row styling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:39:23 +01:00
xCyanGrizzly	f6e7f5ed3c	feat: add server actions for group management Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:34:29 +01:00

1 2 3 4

195 Commits