Diagnosed from production: in 8 hours of main's current run, zero
uploads happened despite the worker being busy 100% of the time. Logs
showed continuous "Downloading archive part" entries with no
corresponding upload activity.
Root cause: the source channel ("Model Printing Emporium") frequently
reposts the same file at new Telegram message IDs. Concrete example
from the DB:
- "(EN) PaintGuides All.zip" → present 6 times, msgIds 44B → 92B
- "00 Welcome Pack.7z" → present 2 times, msgIds 91B and 177B
- "FanteZi April 2022-...zip" → uploaded May 8 at msgId 24,697,110,528;
current run re-downloading at 87,488,987,136
packageExistsBySourceMessage(channelId, msgId) correctly misses because
the msgId is different. We download the (potentially gigabyte-sized)
file, hash it, then packageExistsByHash hits and we discard the
download. ~30 seconds wasted per repost x thousands of reposts = whole
runs spent uploading nothing.
Fix: add findRepostedPackage(sourceChannelId, fileName, fileSize) — a
pre-download check that catches reposts by the strong (channel + name
+ total size) signal. On hit, skip the set entirely. Watermark
advances normally (no minFailedId tracking) so the next cycle sees
the channel as caught up.
False-positive risk: two unrelated files in the same channel with
identical name AND identical total fileSize. Extremely rare in
practice; if it ever happens, the new file is silently treated as a
duplicate. Logged at info level with the existing Package ID and dest
message ID so the user can audit if a file is mysteriously missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driven by a real production case: secondary account was attached to 17
source channels but ingesting only ~2-3 archives per cycle. Log analysis
showed three distinct issues that this commit addresses.
1. Auto-retry cap (WORKER_MAX_SKIP_ATTEMPTS, default 5)
processArchiveSets now filters out SkippedPackage rows whose
attemptCount has reached the cap. Removing them from the working
list means they are not tracked in minFailedId, so the watermark
cap from d99a506 does not pin progress below them anymore. A bad
file no longer blocks the rest of the channel forever; the user
can manually retry via the UI to reset the count.
2. Account phone in error messages
Every SkippedPackage row and SystemNotification produced from a
failure is now prefixed with [<phone>] in errorMessage / message,
and the JSON context includes accountPhone. When two accounts
share a source channel and only one is failing, the UI tells you
which one.
3. Explicit getChat for destination at run start
loadChats only loads main/archive/folder chat lists. If an account
archived or moved the destination chat, sendMessage failed silently
per-archive. Now we getChat the destination once per cycle; on
failure we record a SystemNotification and skip the account's
entire ingestion cycle (no point downloading what we can't upload).
4. Retry on transient Telegram server errors
The "Turnbase Delivery Folder.7z" failure on the secondary and
"10. Kingdom of the Depth.part1.rar" on the main were both
"Internal Server Error during file upload" — a TG-side hiccup, not
a stall or FLOOD_WAIT. These now retry up to MAX_UPLOAD_RETRIES
with linear backoff (15s, 30s, 45s + jitter) before giving up.
5. Channel-access-lost notification
"Iridium 2 w/ Add-ons [Completed]" has been throwing
"Can't access the chat" every cycle for the secondary. The worker
now surfaces a CHANNEL_ACCESS_LOST notification (deduped to once per
24h per channel/account) so the admin sees it and can re-join or
unlink the channel instead of just losing visibility into the loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manual override training (GroupingRule):
- Learn patterns from manual group creation (common filename prefix or creator)
- Apply learned rules as first auto-grouping pass (highest confidence after albums)
- GroupingRule model stores pattern, channel, signal type, confidence
Hash verification after upload:
- Re-hash upload files on disk before indexing to catch disk corruption
- Creates HASH_MISMATCH notification on discrepancy
Grouping conflict detection:
- After all grouping passes, check if grouped packages match rules from different groups
- Creates GROUPING_CONFLICT notification for manual review
Per-channel grouping flags:
- Add autoGroupEnabled boolean to TelegramChannel (default true)
- Auto-grouping passes (all except album) gated behind this flag
- Album grouping always runs as it reflects Telegram's native behavior
Full-text search (tsvector):
- Add searchVector tsvector column with GIN index and auto-update trigger
- Backfill 1870 existing packages
- FTS with ts_rank for ranked results, ILIKE fallback for short/failed queries
- Applied to both web app and bot search
Bot group awareness:
- /group <query> — view group info or search groups by name
- /sendgroup <id> — send all packages in a group to linked Telegram account
Bulk repair:
- repairPackageAction clears dest info and resets watermark for re-processing
- Repair button in notification bell for MISSING_PART and HASH_MISMATCH alerts
- /api/notifications/repair endpoint
Retroactive category re-tagging:
- When channel category changes, auto-update tags on all existing packages
- Removes old category tag, adds new one
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Group merge UI:
- Add mergeGroups query and mergeGroupsAction server action
- Add "Start Merge" / "Merge Here" buttons to group row actions
- Two-step UX: click Start on source, click Merge Here on target
ZIP path prefix grouping (Signal 7):
- Compare PackageFile.path root folders across ungrouped packages
- Auto-group if 2+ packages share the same dominant root folder
Reply chain grouping (Signal 6):
- Capture reply_to_message_id during channel scanning
- Group archives that reply to the same root message
- Add replyToMessageId field to Package schema
Caption fuzzy match grouping (Signal 8):
- Capture source caption during channel scanning
- Normalize captions (strip extensions, extract significant words)
- Group packages with matching normalized caption keys
- Add sourceCaption field to Package schema
Periodic integrity audit:
- Check multipart packages for completeness (parts vs destMessageIds)
- Detect orphaned indexes (destChannelId set but no destMessageId)
- Runs after each ingestion cycle, deduplicates notifications
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pattern grouping (Signal 3):
- Extract YYYY-MM dates, month names, and project prefixes from filenames
- Auto-group packages sharing the same pattern within a channel
- Groups created with groupingSource=AUTO_PATTERN
Creator grouping (Signal 4):
- Auto-group 3+ ungrouped packages from the same creator within a channel
- Runs after pattern grouping as lowest-priority automatic signal
Notification UI:
- Add NotificationBell component to header with unread badge
- Popover panel shows recent notifications with severity icons
- Mark individual or all notifications as read
- Polls every 30 seconds for updates
Failure notifications:
- Upload/download failures now create SystemNotification records
- Visible in the notification bell alongside hash mismatch alerts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Schema:
- Add GroupingSource enum (ALBUM, MANUAL, AUTO_TIME, AUTO_PATTERN, etc.)
- Add groupingSource field to PackageGroup with backfill
- Add SystemNotification model for persistent alerts
- Add NotificationType and NotificationSeverity enums
Ungrouped staging tab:
- Add listUngroupedPackages/countUngroupedPackages queries
- Add "Ungrouped" tab to STL page showing packages without a group
Time-window auto-grouping:
- After album grouping, cluster ungrouped packages within configurable
time window (default 5 min, AUTO_GROUP_TIME_WINDOW_MINUTES env var)
- Groups named from common filename prefix
- Groups created with groupingSource=AUTO_TIME
Hash verification after split:
- Re-hash split parts and compare to original contentHash
- Log error and create SystemNotification on mismatch
- Prevents silently corrupted split uploads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-part send fix:
- Add destMessageIds BigInt[] to Package schema with backfill migration
- Worker uploadToChannel now returns all message IDs, stored in DB
- Bot forwards all parts of multi-part archives (not just the first)
- Add retry logic for upload rate limits (429) and download stalls
Kickstarter package linking:
- Add package search/linking queries and API routes
- Add PackageLinkerDialog with search + checkbox selection
- Add "Link Packages" and "Send All" actions to kickstarter table
- Add sendAllKickstarterPackages server action
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- createOrFindPackageGroup: catch unique constraint violation from
concurrent creates and fall back to findFirst
- createManualGroup: guard against empty package results before
accessing first element
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Distinguish failure reasons: inspect error messages to label skipped
packages as DOWNLOAD_FAILED, UPLOAD_FAILED, or EXTRACT_FAILED
instead of catch-all DOWNLOAD_FAILED.
2. Detect orphaned uploads: before uploading, check if the same content
hash already has a successful upload on the destination channel. Reuse
the existing message ID instead of re-uploading (prevents duplicates
when worker crashed between upload and DB write).
3. Increase timeouts: download from max(5min, GB*10min) to
max(10min, GB*15min), upload from GB*10min to GB*15min.
Prevents premature timeouts on slow connections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Channel Discovery:
- Remove channel/supergroup filter from getAccountChats — all chat types
(private, groups, Saved Messages, etc.) are now discoverable as sources
- Detect and label the self-chat as "Saved Messages" via getMe
- Update channel picker dialog to accept any chat type string
Bot Rich Messages:
- Enhance package send preview with creator, file count, tags, and source
channel info in MarkdownV2 caption
- Include tags in new_package subscription notifications
- Expand getPendingSendRequest to fetch richer package data
Performance:
- Reviewed pipeline for many-channel load — getChats pagination fix and
per-channel getChat pre-load from prior commit address the main concerns
- Channels with no new messages skip in 2-3 API calls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug fixes:
- Fix channels not being scanned by paginating TDLib getChats (was only
loading first batch, additional channels were unknown to TDLib)
- Add per-channel getChat pre-load as safety net before scanning
- Fix preview pictures not loading by checking previewData instead of
previewMsgId for hasPreview flag
- Prevent previewMsgId from being set when preview download fails
Package Tags:
- Add tags Text[] column to Package with migration backfilling from
channel categories
- Worker auto-inherits source channel category as initial tag
- Tag filter dropdown and Tags column in STL Files table
- Server actions for individual and bulk tag editing
Kickstarters Tab:
- New KickstarterHost, Kickstarter, and KickstarterPackage models
- Full CRUD with delivery status, payment status, host management
- Package linking (many-to-many with existing packages)
- Sidebar entry with Gift icon
- Table with search, filters, modal forms
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-extract preview images from ZIP/RAR/7z archives during ingestion
- Upload custom preview images via package drawer
- Select preview from archive contents with on-demand extraction UI
- Manually add Telegram channels by t.me link, username, or invite link
- Invite code UX: bulk create, copy link, usage tracking, delete confirm
- Incomplete upload recovery: verify dest messages on worker startup
- Rebuild package DB by scanning destination channel with live progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Worker trigger: Add ingestion_trigger pg_notify listener so the worker
picks up on-demand triggers from the UI and runs an immediate cycle with
full activity tracking (currentActivity, currentStep, etc).
2. Remove orphaned IngestionRun creation from triggerIngestion server action.
Previously the UI created RUNNING runs without activity fields, causing
the UI to show "Working..." with no details. Now only the worker creates
runs with proper activity tracking.
3. Default channels to disabled (isActive: false) in schema and all creation
paths. Destination channels are explicitly set to active since they must
receive uploads. Includes Prisma migration.
Co-authored-by: xCyanGrizzly <53275238+xCyanGrizzly@users.noreply.github.com>
Adds full Telegram ZIP ingestion pipeline: TDLib worker service scans source
channels for archive files, deduplicates by content hash, extracts metadata,
uploads to archive channel, and indexes in Postgres. Forum supergroups are
scanned per-topic with topic names used as creator. Filename-based creator
extraction (e.g. "Mammoth Factory - 2026-01.zip") serves as fallback.
Includes admin UI for managing accounts/channels, simplified account setup
(API credentials via env vars), auth code/password submission dialog,
package browser with creator column, and live ingestion activity tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>