24 Commits

Author SHA1 Message Date
59038889ae fix: prevent pool exhaustion that caused 4-hour duplicate check stall
All checks were successful
continuous-integration/drone/push Build is passing
The pg pool had max=5 connections shared between Prisma operations and
advisory locks. With 2 account locks held permanently and hash locks
from timed-out (but still running) background work, pool.connect()
would block forever — causing the Turnbase.7z stall.

- Increase pool max from 5 to 15 for headroom
- Add 30s connectionTimeoutMillis so pool.connect() throws instead of
  hanging forever when the pool is exhausted
- On startup, terminate zombie PostgreSQL sessions from previous worker
  instances that hold stale advisory locks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:39:00 +02:00
77c26adb31 perf: set watermarks even when no archives found to prevent re-scanning
All checks were successful
continuous-integration/drone/push Build is passing
Previously, channels/topics with no new archives never had their
watermark updated. This meant every cycle re-scanned all messages from
scratch just to discover nothing new — especially costly for the 1079-
topic Model Printing Emporium forum.

- Add maxScannedMessageId to ChannelScanResult (highest msg ID seen)
- Set channel watermark to scan boundary when no archives are found
- Set topic watermark to scan boundary when no archives are found
- Fall back to scan watermark when archive processing doesn't advance it

After one full cycle, subsequent cycles will skip already-scanned
messages via the early-exit boundary check, dramatically reducing
TDLib API calls on channels with mostly non-archive content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 20:37:42 +02:00
35cce3151c perf: early-exit channel scan when all messages are below watermark
searchChatMessages returns newest-first. Once the oldest message on a
page is at or below the lastProcessedMessageId boundary, all remaining
pages are even older. Stop scanning immediately instead of reading every
message in the channel.

This was already implemented for topic scans but missing from channel
scans. On a test run, total messages scanned dropped from 3805 to 1615
(57% reduction) for an account with no new archives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 19:58:30 +02:00
d6c82ede1e fix: auto-recover from TDLib upload stalls by recreating client
When TDLib's event stream degrades, uploads complete (bytes sent) but
confirmations never arrive. Previously the worker retried 3x with the
same broken client, wasting 60+ min per archive and holding the mutex.

- Add UploadStallError class to distinguish stalls from other failures
- Reduce stall detection timeout from 5min to 3min (faster detection)
- Recreate TDLib client after consecutive upload stalls instead of
  retrying on the same degraded connection
- Add forceReleaseMutex() to prevent cascade failures when one account
  blocks others via stuck mutex after cycle timeout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 18:02:42 +02:00
7e48131f67 fix: clear timeout on race settlement to prevent orphaned timers
All checks were successful
continuous-integration/drone/push Build is passing
2026-05-02 23:44:18 +02:00
a79cb4749b fix: use per-account mutex keys in fetch/extract listeners, add cycle timeout and error logging 2026-05-02 23:40:37 +02:00
e9017fc518 feat: parallel account ingestion via per-key TDLib mutex 2026-05-02 23:31:02 +02:00
4f59d19ac2 feat: apply per-account Premium 4GB upload limit to bypass repacking 2026-05-02 23:28:00 +02:00
579276ee2d fix: widen hash lock try/finally to prevent lock leak on error paths 2026-05-02 23:24:08 +02:00
b48cc510a4 feat: add two-phase DB write and hash advisory lock to prevent double-uploads 2026-05-02 23:13:55 +02:00
614c8e5b74 feat: add createPackageStub and updatePackageWithMetadata for two-phase DB write 2026-05-02 23:06:17 +02:00
3019c23f70 feat: add per-content-hash advisory lock to prevent concurrent duplicate uploads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:04:43 +02:00
436a576085 feat: detect and persist Telegram Premium status after authentication
After TDLib login completes, calls getMe() to detect isPremium, persists
it to DB via updateAccountPremiumStatus, and returns { client, isPremium }
from createTdlibClient. All callers updated to destructure accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 23:02:46 +02:00
f454303352 feat: add isPremium field to TelegramAccount
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:58:53 +02:00
e29bd79d66 chore: ignore .worktrees directory 2026-05-02 22:54:56 +02:00
61e61d0085 docs: add worker improvements implementation plan
7-task plan covering double-upload fix (hash lock + two-phase write),
parallel account ingestion (per-key mutex), and Premium 4GB upload
limit with automatic detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:47:52 +02:00
925d916a3c Merge branch 'main' of https://github.com/xCyanGrizzly/DragonsStash 2026-05-02 22:38:32 +02:00
27bacaf24c docs: add worker improvements design spec
Covers double-upload fix (two-phase DB write + hash advisory lock),
parallel account processing (remove TDLib mutex), and per-account
Premium 4GB upload limit with automatic is_premium detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 22:35:27 +02:00
be4daf950b fix: correct User table reference in manual_uploads migration
All checks were successful
continuous-integration/drone/push Build is passing
The FK referenced "users" but the actual table is "User" (no @@map in Prisma schema).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 21:29:55 +02:00
af7094637d feat: file upload from UI, notification dismiss, audit false positive fix
Manual file upload:
- Upload dialog in STL page with drag-and-drop file picker
- Files saved to shared Docker volume (/data/uploads)
- Worker processes via pg_notify('manual_upload') channel
- Hashes, reads metadata, splits >2GB, uploads to Telegram
- Multiple files automatically grouped
- Status polling shows upload/processing/complete states

Notification fixes:
- Add dismiss (X) button on each notification
- Add "Clear" button to remove all notifications
- Fix false positive MISSING_PART alerts from legacy packages
  (only flag when >1 destMessageIds stored but count wrong,
  not when only 1 ID from backfill)

Infrastructure:
- ManualUpload + ManualUploadFile schema + migration
- Shared manual_uploads Docker volume between app and worker
- Upload API routes (POST /api/uploads, GET /api/uploads/[id])
- Worker manual-upload processor with full pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:26:06 +02:00
f4aa9d9a2f feat: complete remaining features — training, FTS, bot groups, repair, re-tag
All checks were successful
continuous-integration/drone/push Build is passing
Manual override training (GroupingRule):
- Learn patterns from manual group creation (common filename prefix or creator)
- Apply learned rules as first auto-grouping pass (highest confidence after albums)
- GroupingRule model stores pattern, channel, signal type, confidence

Hash verification after upload:
- Re-hash upload files on disk before indexing to catch disk corruption
- Creates HASH_MISMATCH notification on discrepancy

Grouping conflict detection:
- After all grouping passes, check if grouped packages match rules from different groups
- Creates GROUPING_CONFLICT notification for manual review

Per-channel grouping flags:
- Add autoGroupEnabled boolean to TelegramChannel (default true)
- Auto-grouping passes (all except album) gated behind this flag
- Album grouping always runs as it reflects Telegram's native behavior

Full-text search (tsvector):
- Add searchVector tsvector column with GIN index and auto-update trigger
- Backfill 1870 existing packages
- FTS with ts_rank for ranked results, ILIKE fallback for short/failed queries
- Applied to both web app and bot search

Bot group awareness:
- /group <query> — view group info or search groups by name
- /sendgroup <id> — send all packages in a group to linked Telegram account

Bulk repair:
- repairPackageAction clears dest info and resets watermark for re-processing
- Repair button in notification bell for MISSING_PART and HASH_MISMATCH alerts
- /api/notifications/repair endpoint

Retroactive category re-tagging:
- When channel category changes, auto-update tags on all existing packages
- Removes old category tag, adds new one

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:34:14 +02:00
7f9a03d4ee feat: group merge, ZIP/reply/caption grouping, integrity audit
Group merge UI:
- Add mergeGroups query and mergeGroupsAction server action
- Add "Start Merge" / "Merge Here" buttons to group row actions
- Two-step UX: click Start on source, click Merge Here on target

ZIP path prefix grouping (Signal 7):
- Compare PackageFile.path root folders across ungrouped packages
- Auto-group if 2+ packages share the same dominant root folder

Reply chain grouping (Signal 6):
- Capture reply_to_message_id during channel scanning
- Group archives that reply to the same root message
- Add replyToMessageId field to Package schema

Caption fuzzy match grouping (Signal 8):
- Capture source caption during channel scanning
- Normalize captions (strip extensions, extract significant words)
- Group packages with matching normalized caption keys
- Add sourceCaption field to Package schema

Periodic integrity audit:
- Check multipart packages for completeness (parts vs destMessageIds)
- Detect orphaned indexes (destChannelId set but no destMessageId)
- Runs after each ingestion cycle, deduplicates notifications

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:19:36 +02:00
2c46ab0843 feat: pattern/creator grouping, notification UI, failure alerts
Pattern grouping (Signal 3):
- Extract YYYY-MM dates, month names, and project prefixes from filenames
- Auto-group packages sharing the same pattern within a channel
- Groups created with groupingSource=AUTO_PATTERN

Creator grouping (Signal 4):
- Auto-group 3+ ungrouped packages from the same creator within a channel
- Runs after pattern grouping as lowest-priority automatic signal

Notification UI:
- Add NotificationBell component to header with unread badge
- Popover panel shows recent notifications with severity icons
- Mark individual or all notifications as read
- Polls every 30 seconds for updates

Failure notifications:
- Upload/download failures now create SystemNotification records
- Visible in the notification bell alongside hash mismatch alerts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:43:55 +02:00
9e78cc5d19 feat: grouping phase 1 — schema, ungrouped tab, time-window grouping, hash verification
Schema:
- Add GroupingSource enum (ALBUM, MANUAL, AUTO_TIME, AUTO_PATTERN, etc.)
- Add groupingSource field to PackageGroup with backfill
- Add SystemNotification model for persistent alerts
- Add NotificationType and NotificationSeverity enums

Ungrouped staging tab:
- Add listUngroupedPackages/countUngroupedPackages queries
- Add "Ungrouped" tab to STL page showing packages without a group

Time-window auto-grouping:
- After album grouping, cluster ungrouped packages within configurable
  time window (default 5 min, AUTO_GROUP_TIME_WINDOW_MINUTES env var)
- Groups named from common filename prefix
- Groups created with groupingSource=AUTO_TIME

Hash verification after split:
- Re-hash split parts and compare to original contentHash
- Log error and create SystemNotification on mismatch
- Prevents silently corrupted split uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:00:27 +02:00
49 changed files with 4702 additions and 212 deletions

1
.gitignore vendored
View File

@@ -54,3 +54,4 @@ src/generated
# temp files # temp files
nul nul
tmpclaude-* tmpclaude-*
.worktrees/

View File

@@ -10,7 +10,10 @@ import {
getSubscriptions, getSubscriptions,
addSubscription, addSubscription,
removeSubscription, removeSubscription,
getGroupById,
searchGroups,
} from "./db/queries.js"; } from "./db/queries.js";
import { db } from "./db/client.js";
import { sendTextMessage, sendPhotoMessage } from "./tdlib/client.js"; import { sendTextMessage, sendPhotoMessage } from "./tdlib/client.js";
const log = childLogger("commands"); const log = childLogger("commands");
@@ -78,6 +81,12 @@ export async function handleMessage(msg: IncomingMessage): Promise<void> {
case "/status": case "/status":
await handleStatus(chatId, userId); await handleStatus(chatId, userId);
break; break;
case "/group":
await handleGroup(chatId, args);
break;
case "/sendgroup":
await handleSendGroup(chatId, userId, args);
break;
default: default:
await sendTextMessage( await sendTextMessage(
chatId, chatId,
@@ -117,6 +126,8 @@ async function handleStart(
`/search &lt;query&gt; — Search packages`, `/search &lt;query&gt; — Search packages`,
`/latest [n] — Show latest packages`, `/latest [n] — Show latest packages`,
`/package &lt;id&gt; — Package details`, `/package &lt;id&gt; — Package details`,
`/group &lt;id or name&gt; — View group info and package list`,
`/sendgroup &lt;id&gt; — Send all packages in a group to yourself`,
`/link &lt;code&gt; — Link your Telegram to your web account`, `/link &lt;code&gt; — Link your Telegram to your web account`,
`/subscribe &lt;keyword&gt; — Get notified for new packages`, `/subscribe &lt;keyword&gt; — Get notified for new packages`,
`/subscriptions — View your subscriptions`, `/subscriptions — View your subscriptions`,
@@ -136,6 +147,8 @@ async function handleHelp(chatId: bigint): Promise<void> {
`/search &lt;query&gt; — Search by filename or creator`, `/search &lt;query&gt; — Search by filename or creator`,
`/latest [n] — Show n most recent packages (default: 5)`, `/latest [n] — Show n most recent packages (default: 5)`,
`/package &lt;id&gt; — View package details and file list`, `/package &lt;id&gt; — View package details and file list`,
`/group &lt;id or name&gt; — View group info and package list`,
`/sendgroup &lt;id&gt; — Send all packages in a group to yourself`,
``, ``,
`🔗 <b>Account Linking</b>`, `🔗 <b>Account Linking</b>`,
`/link &lt;code&gt; — Link Telegram to your web account`, `/link &lt;code&gt; — Link Telegram to your web account`,
@@ -432,6 +445,168 @@ async function handleStatus(chatId: bigint, userId: bigint): Promise<void> {
} }
} }
async function handleGroup(chatId: bigint, query: string): Promise<void> {
if (!query) {
await sendTextMessage(
chatId,
"Usage: /group &lt;id or name&gt;\n\nProvide a group ID (starts with 'c') or a name to search.",
"textParseModeHTML"
);
return;
}
const trimmed = query.trim();
// If it looks like a cuid (starts with 'c', ~25 chars), look up by ID directly
if (/^c[a-z0-9]{20,}$/i.test(trimmed)) {
const group = await getGroupById(trimmed);
if (!group) {
await sendTextMessage(chatId, "Group not found.", "textParseModeHTML");
return;
}
const packageLines = group.packages.slice(0, 20).map((pkg, i) => {
const size = formatSize(pkg.fileSize);
return ` ${i + 1}. <b>${escapeHtml(pkg.fileName)}</b> (${size}, ${pkg.fileCount} files) — <code>${pkg.id}</code>`;
});
const more = group.packages.length > 20
? `\n ... and ${group.packages.length - 20} more`
: "";
const response = [
`📦 <b>Group: ${escapeHtml(group.name)}</b>`,
``,
`Packages: ${group.packages.length}`,
`ID: <code>${group.id}</code>`,
``,
`<b>Contents:</b>`,
...packageLines,
more,
``,
`Use /sendgroup ${group.id} to receive all packages.`,
]
.filter((l) => l !== "")
.join("\n");
await sendTextMessage(chatId, response, "textParseModeHTML");
return;
}
// Otherwise search by name
const groups = await searchGroups(trimmed, 5);
if (groups.length === 0) {
await sendTextMessage(
chatId,
`No groups found matching "<b>${escapeHtml(trimmed)}</b>".`,
"textParseModeHTML"
);
return;
}
const lines = groups.map(
(g, i) =>
`${i + 1}. <b>${escapeHtml(g.name)}</b> — ${g._count.packages} package(s)\n ID: <code>${g.id}</code>`
);
const response = [
`🔍 <b>Groups matching "${escapeHtml(trimmed)}":</b>`,
``,
...lines,
``,
`Use /group &lt;id&gt; for full details.`,
].join("\n");
await sendTextMessage(chatId, response, "textParseModeHTML");
}
async function handleSendGroup(
chatId: bigint,
userId: bigint,
args: string
): Promise<void> {
if (!args) {
await sendTextMessage(
chatId,
"Usage: /sendgroup &lt;group-id&gt;",
"textParseModeHTML"
);
return;
}
const groupId = args.trim();
const group = await getGroupById(groupId);
if (!group) {
await sendTextMessage(chatId, "Group not found.", "textParseModeHTML");
return;
}
// Require account linking
const link = await findLinkByTelegramUserId(userId);
if (!link) {
await sendTextMessage(
chatId,
"You must link your account before receiving packages.\nUse /link &lt;code&gt; to connect.",
"textParseModeHTML"
);
return;
}
// Only send packages that have been uploaded to the destination channel
const sendable = group.packages.filter(
(pkg) => pkg.destChannelId && pkg.destMessageId
);
if (sendable.length === 0) {
await sendTextMessage(
chatId,
`No packages in group "<b>${escapeHtml(group.name)}</b>" are ready to send yet.`,
"textParseModeHTML"
);
return;
}
// Create a BotSendRequest for each sendable package
const requests = await Promise.all(
sendable.map((pkg) =>
db.botSendRequest.create({
data: {
packageId: pkg.id,
telegramLinkId: link.id,
requestedByUserId: link.userId,
status: "PENDING",
},
})
)
);
// Fire pg_notify for each request so the send listener picks them up
for (const req of requests) {
await db.$queryRawUnsafe(
`SELECT pg_notify('bot_send', $1)`,
req.id
).catch(() => {
// Best-effort — the bot also processes PENDING requests on its send queue
});
}
await sendTextMessage(
chatId,
[
`✅ <b>Queued ${requests.length} package(s) from "${escapeHtml(group.name)}"</b>`,
``,
`You'll receive each archive shortly. Use /package &lt;id&gt; to check individual packages.`,
].join("\n"),
"textParseModeHTML"
);
log.info(
{ groupId, packageCount: requests.length, userId: userId.toString() },
"Group send queued"
);
}
function escapeHtml(text: string): string { function escapeHtml(text: string): string {
return text return text
.replace(/&/g, "&amp;") .replace(/&/g, "&amp;")

View File

@@ -53,7 +53,52 @@ export async function createTelegramLink(
// ── Package search ── // ── Package search ──
export async function searchPackages(query: string, limit = 10) { export async function searchPackages(query: string, limit = 10) {
const packages = await db.package.findMany({ // Try full-text search first
if (query.length >= 3) {
const tsQuery = query
.trim()
.split(/\s+/)
.filter((w) => w.length >= 2)
.map((w) => w.replace(/[^a-zA-Z0-9]/g, ""))
.filter(Boolean)
.join(" & ");
if (tsQuery) {
try {
const ftsResults = await db.$queryRawUnsafe<{ id: string }[]>(
`SELECT id FROM packages
WHERE "searchVector" @@ to_tsquery('english', $1)
ORDER BY ts_rank("searchVector", to_tsquery('english', $1)) DESC
LIMIT $2`,
tsQuery,
limit
);
if (ftsResults.length > 0) {
return db.package.findMany({
where: { id: { in: ftsResults.map((r) => r.id) } },
orderBy: { indexedAt: "desc" },
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
fileCount: true,
creator: true,
indexedAt: true,
destChannelId: true,
destMessageId: true,
},
});
}
} catch {
// FTS failed — fall back to ILIKE
}
}
}
// Fallback: ILIKE search
return db.package.findMany({
where: { where: {
OR: [ OR: [
{ fileName: { contains: query, mode: "insensitive" } }, { fileName: { contains: query, mode: "insensitive" } },
@@ -74,7 +119,44 @@ export async function searchPackages(query: string, limit = 10) {
destMessageId: true, destMessageId: true,
}, },
}); });
return packages; }
// ── Group queries ──
export async function getGroupById(groupId: string) {
return db.packageGroup.findUnique({
where: { id: groupId },
include: {
packages: {
orderBy: { indexedAt: "desc" },
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
fileCount: true,
creator: true,
destChannelId: true,
destMessageId: true,
},
},
},
});
}
export async function searchGroups(query: string, limit = 5) {
return db.packageGroup.findMany({
where: {
name: { contains: query, mode: "insensitive" },
},
orderBy: { createdAt: "desc" },
take: limit,
select: {
id: true,
name: true,
_count: { select: { packages: true } },
},
});
} }
export async function getLatestPackages(limit = 5) { export async function getLatestPackages(limit = 5) {

View File

@@ -28,6 +28,8 @@ services:
timeout: 5s timeout: 5s
retries: 3 retries: 3
start_period: 60s start_period: 60s
volumes:
- manual_uploads:/data/uploads
restart: unless-stopped restart: unless-stopped
deploy: deploy:
resources: resources:
@@ -54,6 +56,7 @@ services:
volumes: volumes:
- tdlib_state:/data/tdlib - tdlib_state:/data/tdlib
- tmp_zips:/tmp/zips - tmp_zips:/tmp/zips
- manual_uploads:/data/uploads
depends_on: depends_on:
db: db:
condition: service_healthy condition: service_healthy
@@ -121,6 +124,7 @@ volumes:
tdlib_state: tdlib_state:
tdlib_bot_state: tdlib_bot_state:
tmp_zips: tmp_zips:
manual_uploads:
networks: networks:
frontend: frontend:

View File

@@ -0,0 +1,67 @@
# Grouping Phase 1: Foundation + Time-Window Grouping
> **For agentic workers:** Use superpowers:subagent-driven-development to implement this plan.
**Goal:** Add grouping infrastructure (schema, enums, notifications model), an ungrouped staging queue in the UI, and time-window auto-grouping as the first automatic signal beyond album grouping.
**Architecture:** Schema changes lay the foundation. Ungrouped tab is a query filter. Time-window grouping runs as a post-processing pass after album grouping in the worker pipeline.
**Tech Stack:** Prisma schema + migration, worker TypeScript, Next.js App Router.
---
## Task 1: Schema Migration
**Files:**
- Modify: `prisma/schema.prisma`
- Create: migration SQL
Add:
1. `GroupingSource` enum: `ALBUM`, `MANUAL`, `AUTO_TIME`, `AUTO_PATTERN`, `AUTO_REPLY`, `AUTO_ZIP`, `AUTO_CAPTION`
2. `groupingSource GroupingSource @default(MANUAL)` on `PackageGroup`
3. `SystemNotification` model with `type`, `severity`, `title`, `message`, `context` (Json), `isRead`
4. `NotificationType` enum: `HASH_MISMATCH`, `MISSING_PART`, `UPLOAD_FAILED`, `DOWNLOAD_FAILED`, `GROUPING_CONFLICT`, `INTEGRITY_AUDIT`
5. `NotificationSeverity` enum: `INFO`, `WARNING`, `ERROR`
Backfill: `UPDATE package_groups SET "groupingSource" = 'ALBUM' WHERE "mediaAlbumId" IS NOT NULL`
---
## Task 2: Ungrouped Staging Tab in STL Page
**Files:**
- Modify: `src/lib/telegram/queries.ts` — add `listUngroupedPackages()` query
- Modify: `src/app/(app)/stls/page.tsx` — add tab parameter support
- Modify: `src/app/(app)/stls/_components/stl-table.tsx` — add "Ungrouped" tab
Add a tab next to the existing "Skipped" tab that shows packages where `packageGroupId IS NULL`. Uses the existing `PackageListItem` type and table rendering. This gives users a clear view of files that need manual grouping.
---
## Task 3: Time-Window Auto-Grouping in Worker
**Files:**
- Create: `worker/src/grouping.ts` — add `processTimeWindowGroups()` after existing `processAlbumGroups()`
- Modify: `worker/src/worker.ts` — call time-window grouping after album grouping
- Modify: `worker/src/util/config.ts` — add `autoGroupTimeWindowMinutes` config
After album grouping completes, find remaining ungrouped packages from the same channel scan. Cluster packages whose `sourceMessageId` timestamps are within the configured window (default 5 minutes). Create groups for clusters of 2+ with `groupingSource = AUTO_TIME` and name derived from the common filename prefix or first file's base name.
---
## Task 4: Hash Verification After Split
**Files:**
- Modify: `worker/src/worker.ts` — add hash re-check after concat+split
- Modify: `worker/src/archive/hash.ts` — (no changes needed, reuse `hashParts`)
After `concatenateFiles()` + `byteLevelSplit()`, re-hash the split parts and compare to the original `contentHash`. If mismatch, log error and create a `SystemNotification` (once that table exists). This closes the integrity gap identified in the audit.
---
## Task 5: Build & Deploy
Rebuild worker and app images. Deploy. Verify:
- Worker logs show `maxPartSizeMB` and new `autoGroupTimeWindowMinutes` in config
- Ungrouped tab visible in STL page
- Previously-skipped large archives begin processing

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,184 @@
# Worker Improvements Design
**Date:** 2026-05-02
**Status:** Approved
**Scope:** Dragon's Stash Telegram ingestion worker
## Problem Statement
Three issues to address:
1. **Double-uploads**: The same archive occasionally appears twice in the destination Telegram channel. Root causes: (a) the worker crashes between `uploadToChannel()` confirming success and `createPackageWithFiles()` writing to the DB — no DB record means `recoverIncompleteUploads()` can't detect the orphaned Telegram message, and the next cycle re-uploads; (b) two accounts scanning the same source channel can both pass the hash dedup check before either creates a DB record, racing to upload the same file.
2. **Sequential account processing**: Both Telegram accounts are processed one after another via `withTdlibMutex`, even though TDLib fully supports multiple concurrent clients in the same process (each with separate `databaseDirectory` and `filesDirectory`). This halves throughput unnecessarily.
3. **Premium upload limit not used**: The Premium account can upload up to 4 GB per file, but `MAX_UPLOAD_SIZE` is hardcoded at ~1,950 MB. This causes unnecessary file splitting and expensive repack operations for files that could upload directly.
## Solution Overview
Three targeted changes, no architectural overhaul:
1. Two-phase DB write + hash advisory lock (fixes double-uploads)
2. Remove TDLib mutex from the scheduler loop (enables parallel accounts)
3. Per-account `maxUploadSize` from `getMe().is_premium` (enables 4 GB for Premium)
---
## Section 1: Double-Upload Fix
### 1a. Two-Phase DB Write
**Current flow:**
```
uploadToChannel() → preview download → metadata extraction → createPackageWithFiles()
```
If the worker crashes anywhere between upload confirmation and `createPackageWithFiles()`, no DB record exists. `recoverIncompleteUploads()` only checks packages with an existing `destMessageId` in the DB — it cannot find an orphaned Telegram message with no corresponding row.
**New flow:**
```
uploadToChannel()
→ createPackageStub() ← minimal record, destMessageId set immediately
→ preview download
→ metadata extraction
→ updatePackageWithMetadata() ← adds file list, preview, creator, tags
```
`createPackageStub()` writes: `contentHash`, `fileName`, `fileSize`, `archiveType`, `sourceChannelId`, `sourceMessageId`, `destChannelId`, `destMessageId`, `isMultipart`, `partCount`, `ingestionRunId`. File list and preview are left empty.
If the worker crashes after the stub is written:
- `recoverIncompleteUploads()` finds the record (has `destMessageId`), verifies the Telegram message exists, keeps it.
- Next cycle: `packageExistsByHash()` returns true → skips re-upload.
- The stub has `fileCount = 0` and no file listing. The UI shows "metadata pending" rather than failing silently.
Stubs with `fileCount = 0` are valid deliverable packages (the bot can still send the file). Backfilling metadata on stubs is out of scope for this change — the crash case is rare and the stub is functional.
### 1b. Hash Advisory Lock
**The race (two accounts, shared source channel):**
```
Worker A: packageExistsByHash(X) → false (no record yet)
Worker B: packageExistsByHash(X) → false (no record yet)
Worker A: uploads file → destMessageId_A
Worker B: uploads file → destMessageId_B ← duplicate Telegram message
Worker A: createPackageStub() → succeeds (contentHash @unique satisfied)
Worker B: createPackageStub() → fails unique constraint on contentHash
```
Result: two Telegram messages, one DB record. Worker B's upload is wasted.
**Fix:** Before calling `uploadToChannel()`, acquire a PostgreSQL session advisory lock keyed on the content hash:
```sql
SELECT pg_try_advisory_lock(hash_bigint)
```
Where `hash_bigint` is the first 8 bytes of the SHA-256 content hash interpreted as a signed bigint.
- `pg_try_advisory_lock` is non-blocking. If another worker holds the lock (same file, shared channel), return `false` → treat as duplicate, skip.
- After acquiring the lock, **re-run `packageExistsByHash()`** before uploading. This catches the case where another worker finished and released the lock between the first check and this one — without the re-check, the current worker would proceed to re-upload.
- The lock is session-scoped: released automatically on DB session end. No manual cleanup needed on crash.
- The lock is released explicitly after `createPackageStub()` completes (or on any error path).
**Implementation location:** New helper `tryAcquireHashLock(contentHash)` / `releaseHashLock(contentHash)` in `worker/src/db/locks.ts`, reusing the existing DB client pattern.
---
## Section 2: Parallel Account Processing
### Current Constraint
`withTdlibMutex` in `scheduler.ts` serializes all TDLib operations across accounts. This was a conservative guard, but TDLib explicitly supports multiple concurrent clients in the same process provided each has its own `databaseDirectory` and `filesDirectory`.
The codebase already satisfies this requirement:
```typescript
// worker/src/tdlib/client.ts
const dbPath = path.join(config.tdlibStateDir, account.id);
const client = createClient({
databaseDirectory: dbPath,
filesDirectory: path.join(dbPath, "files"),
});
```
Each account gets `<TDLIB_STATE_DIR>/<account.id>/` — fully isolated.
### Change
Replace the sequential `for` loop in `scheduler.ts` with `Promise.allSettled()`:
```typescript
// Before
for (const account of accounts) {
await withTdlibMutex(`ingest:${account.phone}`, () => runWorkerForAccount(account));
}
// After
await Promise.allSettled(accounts.map((account) => runWorkerForAccount(account)));
```
The per-account PostgreSQL advisory lock in `db/locks.ts` already prevents any account from being processed twice simultaneously. `Promise.allSettled()` ensures one account's failure doesn't abort the other.
The `withTdlibMutex` wrapper can be removed from the ingest path entirely. The auth path (`authenticateAccount`) should also be run in parallel but may remain guarded if TDLib auth flows have ordering dependencies — verify during implementation.
**No Docker Compose changes needed.** Both accounts run in the same container.
### Speed Limit Notifications
TDLib fires `updateSpeedLimitNotification` when an account's upload or download speed is throttled (non-Premium accounts). Log this event at `warn` level in the client update handler so it's visible in logs without being actionable.
---
## Section 3: Per-Account Premium Upload Limit
### Premium Detection
After successful authentication, call `getMe()` and read `is_premium: bool` from the returned `user` object. Store this on `TelegramAccount.isPremium` (new boolean field, default `false`, updated on each successful auth).
```typescript
const me = await client.invoke({ _: 'getMe' }) as { is_premium?: boolean };
await updateAccountPremiumStatus(account.id, me.is_premium ?? false);
```
### Upload Size Limits
| Account type | `maxUploadSize` | Effect |
|---|---|---|
| Premium | 3,950 MB | Parts ≤ 3.95 GB upload as-is; repack only for parts >3.95 GB (extremely rare) |
| Non-Premium | 1,950 MB | Current behavior unchanged |
Pass `maxUploadSize` into `processOneArchiveSet()` as a parameter (currently hardcoded as `MAX_UPLOAD_SIZE` at `worker.ts:1023` and in `archive/split.ts`).
The `hasOversizedPart` check and `byteLevelSplit` call both use this value, so the repack step is effectively eliminated for Premium accounts in practice — no separate "skip repack" flag needed.
### Migration
```prisma
model TelegramAccount {
// ... existing fields
isPremium Boolean @default(false)
}
```
One migration, one new query `updateAccountPremiumStatus(accountId, isPremium)`.
---
## Files to Change
| File | Change |
|---|---|
| `prisma/schema.prisma` | Add `isPremium Boolean @default(false)` to `TelegramAccount` |
| `worker/src/db/queries.ts` | Add `updateAccountPremiumStatus()`, `createPackageStub()`, `updatePackageWithMetadata()` |
| `worker/src/db/locks.ts` | Add `tryAcquireHashLock()`, `releaseHashLock()` |
| `worker/src/tdlib/client.ts` | Call `getMe()` after auth, return `isPremium` from `createTdlibClient()` |
| `worker/src/worker.ts` | Two-phase write, hash lock acquire/release, pass `maxUploadSize` per account |
| `worker/src/archive/split.ts` | Accept `maxPartSize` parameter instead of hardcoded constant |
| `worker/src/scheduler.ts` | Replace sequential loop with `Promise.allSettled()`, remove `withTdlibMutex` from ingest path |
---
## What Is Explicitly Out of Scope
- Backfilling metadata on stub records (rare crash case, functional without it)
- Download pre-fetching / pipeline parallelism within one account
- Two separate worker containers (single container is sufficient)
- Bot or app changes (worker-only)

View File

@@ -0,0 +1,32 @@
-- CreateEnum GroupingSource
CREATE TYPE "GroupingSource" AS ENUM ('ALBUM', 'MANUAL', 'AUTO_TIME', 'AUTO_PATTERN', 'AUTO_REPLY', 'AUTO_ZIP', 'AUTO_CAPTION');
-- CreateEnum NotificationType
CREATE TYPE "NotificationType" AS ENUM ('HASH_MISMATCH', 'MISSING_PART', 'UPLOAD_FAILED', 'DOWNLOAD_FAILED', 'GROUPING_CONFLICT', 'INTEGRITY_AUDIT');
-- CreateEnum NotificationSeverity
CREATE TYPE "NotificationSeverity" AS ENUM ('INFO', 'WARNING', 'ERROR');
-- AlterTable: add groupingSource to package_groups
ALTER TABLE "package_groups" ADD COLUMN "groupingSource" "GroupingSource" NOT NULL DEFAULT 'MANUAL';
-- Backfill: mark album-based groups
UPDATE "package_groups" SET "groupingSource" = 'ALBUM' WHERE "mediaAlbumId" IS NOT NULL;
-- CreateTable: system_notifications
CREATE TABLE "system_notifications" (
"id" TEXT NOT NULL,
"type" "NotificationType" NOT NULL,
"severity" "NotificationSeverity" NOT NULL DEFAULT 'INFO',
"title" TEXT NOT NULL,
"message" TEXT NOT NULL,
"context" JSONB,
"isRead" BOOLEAN NOT NULL DEFAULT false,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT "system_notifications_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE INDEX "system_notifications_isRead_createdAt_idx" ON "system_notifications"("isRead", "createdAt");
CREATE INDEX "system_notifications_type_idx" ON "system_notifications"("type");

View File

@@ -0,0 +1,3 @@
-- AlterTable: add sourceCaption and replyToMessageId to packages
ALTER TABLE "packages" ADD COLUMN "sourceCaption" TEXT;
ALTER TABLE "packages" ADD COLUMN "replyToMessageId" BIGINT;

View File

@@ -0,0 +1,47 @@
-- AlterTable: add autoGroupEnabled to telegram_channels
ALTER TABLE "telegram_channels" ADD COLUMN "autoGroupEnabled" BOOLEAN NOT NULL DEFAULT true;
-- CreateTable: grouping_rules
CREATE TABLE "grouping_rules" (
"id" TEXT NOT NULL,
"sourceChannelId" TEXT NOT NULL,
"pattern" TEXT NOT NULL,
"signalType" "GroupingSource" NOT NULL,
"confidence" DOUBLE PRECISION NOT NULL DEFAULT 1.0,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"createdByGroupId" TEXT,
CONSTRAINT "grouping_rules_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE INDEX "grouping_rules_sourceChannelId_idx" ON "grouping_rules"("sourceChannelId");
-- AddForeignKey
ALTER TABLE "grouping_rules" ADD CONSTRAINT "grouping_rules_sourceChannelId_fkey" FOREIGN KEY ("sourceChannelId") REFERENCES "telegram_channels"("id") ON DELETE CASCADE ON UPDATE CASCADE;
-- Full-text search: add tsvector column and GIN index
ALTER TABLE "packages" ADD COLUMN IF NOT EXISTS "searchVector" tsvector;
UPDATE "packages" SET "searchVector" = to_tsvector('english',
coalesce("fileName", '') || ' ' || coalesce("creator", '') || ' ' || coalesce("sourceCaption", '')
) WHERE "searchVector" IS NULL;
CREATE INDEX IF NOT EXISTS "packages_search_vector_idx" ON "packages" USING GIN ("searchVector");
-- Trigger to auto-update searchVector on insert/update
CREATE OR REPLACE FUNCTION packages_search_vector_update() RETURNS trigger AS $$
BEGIN
NEW."searchVector" := to_tsvector('english',
coalesce(NEW."fileName", '') || ' ' || coalesce(NEW."creator", '') || ' ' || coalesce(NEW."sourceCaption", '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS packages_search_vector_trigger ON "packages";
CREATE TRIGGER packages_search_vector_trigger
BEFORE INSERT OR UPDATE OF "fileName", "creator", "sourceCaption"
ON "packages"
FOR EACH ROW
EXECUTE FUNCTION packages_search_vector_update();

View File

@@ -0,0 +1,30 @@
-- CreateEnum
CREATE TYPE "ManualUploadStatus" AS ENUM ('PENDING', 'PROCESSING', 'COMPLETED', 'FAILED');
-- CreateTable
CREATE TABLE "manual_uploads" (
"id" TEXT NOT NULL,
"status" "ManualUploadStatus" NOT NULL DEFAULT 'PENDING',
"groupName" TEXT,
"userId" TEXT NOT NULL,
"errorMessage" TEXT,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"completedAt" TIMESTAMP(3),
CONSTRAINT "manual_uploads_pkey" PRIMARY KEY ("id")
);
CREATE TABLE "manual_upload_files" (
"id" TEXT NOT NULL,
"uploadId" TEXT NOT NULL,
"fileName" TEXT NOT NULL,
"filePath" TEXT NOT NULL,
"fileSize" BIGINT NOT NULL,
"packageId" TEXT,
CONSTRAINT "manual_upload_files_pkey" PRIMARY KEY ("id")
);
CREATE INDEX "manual_uploads_status_idx" ON "manual_uploads"("status");
CREATE INDEX "manual_upload_files_uploadId_idx" ON "manual_upload_files"("uploadId");
ALTER TABLE "manual_uploads" ADD CONSTRAINT "manual_uploads_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE RESTRICT ON UPDATE CASCADE;
ALTER TABLE "manual_upload_files" ADD CONSTRAINT "manual_upload_files_uploadId_fkey" FOREIGN KEY ("uploadId") REFERENCES "manual_uploads"("id") ON DELETE CASCADE ON UPDATE CASCADE;

View File

@@ -0,0 +1,2 @@
-- AlterTable
ALTER TABLE "telegram_accounts" ADD COLUMN "isPremium" BOOLEAN NOT NULL DEFAULT false;

View File

@@ -39,9 +39,10 @@ model User {
settings UserSettings? settings UserSettings?
telegramLink TelegramLink? telegramLink TelegramLink?
kickstarters Kickstarter[] kickstarters Kickstarter[]
inviteCodes InviteCode[] @relation("InviteCreator") inviteCodes InviteCode[] @relation("InviteCreator")
usedInvite InviteCode? @relation("InviteUser", fields: [usedInviteId], references: [id], onDelete: SetNull) usedInvite InviteCode? @relation("InviteUser", fields: [usedInviteId], references: [id], onDelete: SetNull)
usedInviteId String? usedInviteId String?
manualUploads ManualUpload[]
} }
model Account { model Account {
@@ -405,6 +406,7 @@ model TelegramAccount {
isActive Boolean @default(true) isActive Boolean @default(true)
authState AuthState @default(PENDING) authState AuthState @default(PENDING)
authCode String? authCode String?
isPremium Boolean @default(false)
lastSeenAt DateTime? lastSeenAt DateTime?
createdAt DateTime @default(now()) createdAt DateTime @default(now())
updatedAt DateTime @updatedAt updatedAt DateTime @updatedAt
@@ -429,10 +431,13 @@ model TelegramChannel {
createdAt DateTime @default(now()) createdAt DateTime @default(now())
updatedAt DateTime @updatedAt updatedAt DateTime @updatedAt
autoGroupEnabled Boolean @default(true)
accountMaps AccountChannelMap[] accountMaps AccountChannelMap[]
packages Package[] packages Package[]
skippedPackages SkippedPackage[] skippedPackages SkippedPackage[]
packageGroups PackageGroup[] packageGroups PackageGroup[]
groupingRules GroupingRule[]
@@index([type, isActive]) @@index([type, isActive])
@@index([category]) @@index([category])
@@ -474,6 +479,8 @@ model Package {
partCount Int @default(1) partCount Int @default(1)
fileCount Int @default(0) fileCount Int @default(0)
tags String[] @default([]) tags String[] @default([])
sourceCaption String? // Caption text from source Telegram message
replyToMessageId BigInt? // reply_to_message_id from source message (for reply chain grouping)
previewData Bytes? // JPEG thumbnail from nearby Telegram photo (stored as raw bytes) previewData Bytes? // JPEG thumbnail from nearby Telegram photo (stored as raw bytes)
previewMsgId BigInt? // Telegram message ID of the matched photo previewMsgId BigInt? // Telegram message ID of the matched photo
packageGroupId String? packageGroupId String?
@@ -522,6 +529,7 @@ model PackageGroup {
name String name String
mediaAlbumId String? mediaAlbumId String?
sourceChannelId String sourceChannelId String
groupingSource GroupingSource @default(MANUAL)
previewData Bytes? previewData Bytes?
createdAt DateTime @default(now()) createdAt DateTime @default(now())
updatedAt DateTime @updatedAt updatedAt DateTime @updatedAt
@@ -802,3 +810,97 @@ model KickstarterPackage {
@@id([kickstarterId, packageId]) @@id([kickstarterId, packageId])
@@map("kickstarter_packages") @@map("kickstarter_packages")
} }
// ── Grouping & Notifications ──
enum GroupingSource {
ALBUM
MANUAL
AUTO_TIME
AUTO_PATTERN
AUTO_REPLY
AUTO_ZIP
AUTO_CAPTION
}
enum NotificationType {
HASH_MISMATCH
MISSING_PART
UPLOAD_FAILED
DOWNLOAD_FAILED
GROUPING_CONFLICT
INTEGRITY_AUDIT
}
enum NotificationSeverity {
INFO
WARNING
ERROR
}
model SystemNotification {
id String @id @default(cuid())
type NotificationType
severity NotificationSeverity @default(INFO)
title String
message String
context Json?
isRead Boolean @default(false)
createdAt DateTime @default(now())
@@index([isRead, createdAt])
@@index([type])
@@map("system_notifications")
}
model GroupingRule {
id String @id @default(cuid())
sourceChannelId String
pattern String // Regex or keyword pattern learned from manual grouping
signalType GroupingSource // Which grouping signal this rule applies to
confidence Float @default(1.0)
createdAt DateTime @default(now())
createdByGroupId String? // The manual group that spawned this rule
sourceChannel TelegramChannel @relation(fields: [sourceChannelId], references: [id], onDelete: Cascade)
@@index([sourceChannelId])
@@map("grouping_rules")
}
enum ManualUploadStatus {
PENDING
PROCESSING
COMPLETED
FAILED
}
model ManualUpload {
id String @id @default(cuid())
status ManualUploadStatus @default(PENDING)
groupName String? // Group name if multiple files
userId String
errorMessage String?
createdAt DateTime @default(now())
completedAt DateTime?
files ManualUploadFile[]
user User @relation(fields: [userId], references: [id])
@@index([status])
@@map("manual_uploads")
}
model ManualUploadFile {
id String @id @default(cuid())
uploadId String
fileName String
filePath String // Path on shared volume
fileSize BigInt
packageId String? // Set after processing
upload ManualUpload @relation(fields: [uploadId], references: [id], onDelete: Cascade)
@@index([uploadId])
@@map("manual_upload_files")
}

View File

@@ -1,7 +1,7 @@
"use client"; "use client";
import { type ColumnDef } from "@tanstack/react-table"; import { type ColumnDef } from "@tanstack/react-table";
import { FileArchive, Eye, ChevronRight, Layers, Ungroup, Send, ImagePlus } from "lucide-react"; import { FileArchive, Eye, ChevronRight, Layers, Ungroup, Send, ImagePlus, GitMerge } from "lucide-react";
import { DataTableColumnHeader } from "@/components/shared/data-table-column-header"; import { DataTableColumnHeader } from "@/components/shared/data-table-column-header";
import { Badge } from "@/components/ui/badge"; import { Badge } from "@/components/ui/badge";
import { Button } from "@/components/ui/button"; import { Button } from "@/components/ui/button";
@@ -69,6 +69,9 @@ interface PackageColumnsProps {
onGroupPreviewUpload: (groupId: string) => void; onGroupPreviewUpload: (groupId: string) => void;
selectedPackages: Set<string>; selectedPackages: Set<string>;
onToggleSelect: (packageId: string) => void; onToggleSelect: (packageId: string) => void;
mergeSourceId: string | null;
onStartMerge: (groupId: string) => void;
onCompleteMerge: (targetGroupId: string) => void;
} }
export function formatBytes(bytesStr: string): string { export function formatBytes(bytesStr: string): string {
@@ -148,6 +151,9 @@ export function getPackageColumns({
onGroupPreviewUpload, onGroupPreviewUpload,
selectedPackages, selectedPackages,
onToggleSelect, onToggleSelect,
mergeSourceId,
onStartMerge,
onCompleteMerge,
}: PackageColumnsProps): ColumnDef<StlTableRow, unknown>[] { }: PackageColumnsProps): ColumnDef<StlTableRow, unknown>[] {
return [ return [
{ {
@@ -392,6 +398,8 @@ export function getPackageColumns({
cell: ({ row }) => { cell: ({ row }) => {
const data = row.original; const data = row.original;
if (isGroupRow(data)) { if (isGroupRow(data)) {
const isMergeSource = mergeSourceId === data.id;
const canMergeHere = mergeSourceId !== null && mergeSourceId !== data.id;
return ( return (
<div className="flex items-center gap-0.5"> <div className="flex items-center gap-0.5">
<Button <Button
@@ -403,6 +411,26 @@ export function getPackageColumns({
> >
<Send className="h-4 w-4" /> <Send className="h-4 w-4" />
</Button> </Button>
<Button
variant="ghost"
size="icon"
className={`h-8 w-8 ${isMergeSource ? "text-amber-500 bg-amber-500/10 hover:bg-amber-500/20" : ""}`}
onClick={() => onStartMerge(data.id)}
title={isMergeSource ? "Cancel merge (this group is the merge source)" : "Start merge — mark this group as merge source"}
>
<GitMerge className="h-4 w-4" />
</Button>
{canMergeHere && (
<Button
variant="ghost"
size="icon"
className="h-8 w-8 text-primary bg-primary/10 hover:bg-primary/20"
onClick={() => onCompleteMerge(data.id)}
title="Merge source group into this group"
>
<Layers className="h-4 w-4" />
</Button>
)}
<Button <Button
variant="ghost" variant="ghost"
size="icon" size="icon"

View File

@@ -3,7 +3,8 @@
import { useState, useCallback, useTransition, useMemo, useRef } from "react"; import { useState, useCallback, useTransition, useMemo, useRef } from "react";
import { useRouter, usePathname, useSearchParams } from "next/navigation"; import { useRouter, usePathname, useSearchParams } from "next/navigation";
import { toast } from "sonner"; import { toast } from "sonner";
import { Search, Layers } from "lucide-react"; import { Search, Layers, Upload } from "lucide-react";
import { UploadDialog } from "./upload-dialog";
import { useDataTable } from "@/hooks/use-data-table"; import { useDataTable } from "@/hooks/use-data-table";
import { import {
getPackageColumns, getPackageColumns,
@@ -38,7 +39,7 @@ import {
} from "@/components/ui/dialog"; } from "@/components/ui/dialog";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs"; import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Badge } from "@/components/ui/badge"; import { Badge } from "@/components/ui/badge";
import type { DisplayItem, IngestionAccountStatus } from "@/lib/telegram/types"; import type { DisplayItem, IngestionAccountStatus, PackageListItem } from "@/lib/telegram/types";
import type { SkippedRow } from "./skipped-columns"; import type { SkippedRow } from "./skipped-columns";
import { import {
updatePackageCreator, updatePackageCreator,
@@ -49,6 +50,7 @@ import {
removeFromGroupAction, removeFromGroupAction,
sendAllInGroupAction, sendAllInGroupAction,
updateGroupPreviewAction, updateGroupPreviewAction,
mergeGroupsAction,
} from "../actions"; } from "../actions";
interface StlTableProps { interface StlTableProps {
@@ -61,6 +63,9 @@ interface StlTableProps {
skippedData: SkippedRow[]; skippedData: SkippedRow[];
skippedPageCount: number; skippedPageCount: number;
skippedTotalCount: number; skippedTotalCount: number;
ungroupedData: PackageListItem[];
ungroupedPageCount: number;
ungroupedTotalCount: number;
} }
export function StlTable({ export function StlTable({
@@ -73,6 +78,9 @@ export function StlTable({
skippedData, skippedData,
skippedPageCount, skippedPageCount,
skippedTotalCount, skippedTotalCount,
ungroupedData,
ungroupedPageCount,
ungroupedTotalCount,
}: StlTableProps) { }: StlTableProps) {
const router = useRouter(); const router = useRouter();
const pathname = usePathname(); const pathname = usePathname();
@@ -96,6 +104,12 @@ export function StlTable({
const previewInputRef = useRef<HTMLInputElement>(null); const previewInputRef = useRef<HTMLInputElement>(null);
const [uploadGroupId, setUploadGroupId] = useState<string | null>(null); const [uploadGroupId, setUploadGroupId] = useState<string | null>(null);
// Group merge state
const [mergeSourceId, setMergeSourceId] = useState<string | null>(null);
// Upload dialog state
const [uploadOpen, setUploadOpen] = useState(false);
const toggleGroup = useCallback((groupId: string) => { const toggleGroup = useCallback((groupId: string) => {
setExpandedGroups((prev) => { setExpandedGroups((prev) => {
const next = new Set(prev); const next = new Set(prev);
@@ -334,6 +348,35 @@ export function StlTable({
[uploadGroupId, router] [uploadGroupId, router]
); );
const handleStartMerge = useCallback((groupId: string) => {
setMergeSourceId((prev) => {
if (prev === groupId) {
toast.info("Merge cancelled");
return null;
}
toast.info("Merge source selected — click the merge-here button on the target group");
return groupId;
});
}, []);
const handleMergeGroups = useCallback(
(targetGroupId: string) => {
if (!mergeSourceId) return;
const sourceId = mergeSourceId;
startTransition(async () => {
const result = await mergeGroupsAction(targetGroupId, sourceId);
if (result.success) {
toast.success("Groups merged successfully");
setMergeSourceId(null);
router.refresh();
} else {
toast.error(result.error);
}
});
},
[mergeSourceId, router]
);
const columns = getPackageColumns({ const columns = getPackageColumns({
onViewFiles: (pkg) => setViewPkg(pkg), onViewFiles: (pkg) => setViewPkg(pkg),
searchTerm, searchTerm,
@@ -375,10 +418,30 @@ export function StlTable({
onGroupPreviewUpload: handleGroupPreviewUpload, onGroupPreviewUpload: handleGroupPreviewUpload,
selectedPackages, selectedPackages,
onToggleSelect: toggleSelect, onToggleSelect: toggleSelect,
mergeSourceId,
onStartMerge: handleStartMerge,
onCompleteMerge: handleMergeGroups,
}); });
const { table } = useDataTable({ data: tableRows, columns, pageCount }); const { table } = useDataTable({ data: tableRows, columns, pageCount });
const ungroupedRows: StlTableRow[] = useMemo(
() =>
ungroupedData.map((pkg) => ({
...pkg,
_rowType: "package" as const,
_groupId: null,
_isGroupMember: false,
})),
[ungroupedData]
);
const { table: ungroupedTable } = useDataTable({
data: ungroupedRows,
columns,
pageCount: ungroupedPageCount,
});
const activeTag = searchParams.get("tag") ?? ""; const activeTag = searchParams.get("tag") ?? "";
return ( return (
@@ -401,6 +464,14 @@ export function StlTable({
</Badge> </Badge>
)} )}
</TabsTrigger> </TabsTrigger>
<TabsTrigger value="ungrouped" className="gap-1.5">
Ungrouped
{ungroupedTotalCount > 0 && (
<Badge variant="secondary" className="h-5 px-1.5 text-[10px]">
{ungroupedTotalCount}
</Badge>
)}
</TabsTrigger>
</TabsList> </TabsList>
<TabsContent value="packages" className="space-y-4"> <TabsContent value="packages" className="space-y-4">
@@ -430,6 +501,10 @@ export function StlTable({
</Select> </Select>
)} )}
<DataTableViewOptions table={table} /> <DataTableViewOptions table={table} />
<Button variant="outline" size="sm" className="h-9" onClick={() => setUploadOpen(true)}>
<Upload className="mr-2 h-4 w-4" />
Upload Files
</Button>
{selectedPackages.size >= 2 && ( {selectedPackages.size >= 2 && (
<Button <Button
variant="outline" variant="outline"
@@ -472,6 +547,11 @@ export function StlTable({
totalCount={skippedTotalCount} totalCount={skippedTotalCount}
/> />
</TabsContent> </TabsContent>
<TabsContent value="ungrouped" className="space-y-4">
<DataTable table={ungroupedTable} emptyMessage="All packages are grouped!" />
<DataTablePagination table={ungroupedTable} totalCount={ungroupedTotalCount} />
</TabsContent>
</Tabs> </Tabs>
<PackageFilesDrawer <PackageFilesDrawer
@@ -515,6 +595,8 @@ export function StlTable({
</DialogContent> </DialogContent>
</Dialog> </Dialog>
<UploadDialog open={uploadOpen} onOpenChange={setUploadOpen} />
{/* Hidden file input for group preview upload (Task 12) */} {/* Hidden file input for group preview upload (Task 12) */}
<input <input
ref={previewInputRef} ref={previewInputRef}

View File

@@ -0,0 +1,243 @@
"use client";
import { useState, useRef, useTransition, useEffect } from "react";
import { Upload, File, X, Loader2, CheckCircle2, AlertCircle } from "lucide-react";
import { toast } from "sonner";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
interface UploadDialogProps {
open: boolean;
onOpenChange: (open: boolean) => void;
}
function formatSize(bytes: number): string {
if (bytes >= 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024 * 1024)).toFixed(1)} GB`;
if (bytes >= 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(0)} MB`;
return `${(bytes / 1024).toFixed(0)} KB`;
}
type UploadStatus = "idle" | "uploading" | "processing" | "done" | "error";
export function UploadDialog({ open, onOpenChange }: UploadDialogProps) {
const [files, setFiles] = useState<File[]>([]);
const [groupName, setGroupName] = useState("");
const [status, setStatus] = useState<UploadStatus>("idle");
const [error, setError] = useState<string | null>(null);
const [isPending, startTransition] = useTransition();
const fileInputRef = useRef<HTMLInputElement>(null);
const pollRef = useRef<ReturnType<typeof setInterval> | null>(null);
useEffect(() => {
if (open) {
setFiles([]);
setGroupName("");
setStatus("idle");
setError(null);
}
return () => {
if (pollRef.current) clearInterval(pollRef.current);
};
}, [open]);
function handleFileChange(e: React.ChangeEvent<HTMLInputElement>) {
if (e.target.files) {
setFiles(Array.from(e.target.files));
}
}
function removeFile(index: number) {
setFiles((prev) => prev.filter((_, i) => i !== index));
}
function handleUpload() {
if (files.length === 0) return;
startTransition(async () => {
setStatus("uploading");
setError(null);
try {
const formData = new FormData();
for (const file of files) {
formData.append("files", file);
}
if (groupName.trim()) {
formData.append("groupName", groupName.trim());
}
const res = await fetch("/api/uploads", {
method: "POST",
body: formData,
});
const data = await res.json();
if (!res.ok) {
setStatus("error");
setError(data.error ?? "Upload failed");
return;
}
setStatus("processing");
// Poll for completion
pollRef.current = setInterval(async () => {
try {
const statusRes = await fetch(`/api/uploads/${data.uploadId}`);
const statusData = await statusRes.json();
if (statusData.status === "COMPLETED") {
setStatus("done");
toast.success(`${files.length} file(s) uploaded and indexed`);
if (pollRef.current) clearInterval(pollRef.current);
} else if (statusData.status === "FAILED") {
setStatus("error");
setError(statusData.errorMessage ?? "Processing failed");
if (pollRef.current) clearInterval(pollRef.current);
}
} catch {
// Keep polling
}
}, 3000);
// Stop polling after 10 minutes
setTimeout(() => {
if (pollRef.current) {
clearInterval(pollRef.current);
pollRef.current = null;
setStatus((s) => s === "processing" ? "done" : s);
}
}, 600_000);
} catch {
setStatus("error");
setError("Network error");
}
});
}
return (
<Dialog open={open} onOpenChange={onOpenChange}>
<DialogContent className="sm:max-w-lg">
<DialogHeader>
<DialogTitle>Upload Files</DialogTitle>
<DialogDescription>
Upload archive files to be processed and indexed. Multiple files will be automatically grouped.
</DialogDescription>
</DialogHeader>
{status === "idle" && (
<div className="space-y-4">
<div
className="border-2 border-dashed rounded-lg p-8 text-center cursor-pointer hover:border-primary/50 transition-colors"
onClick={() => fileInputRef.current?.click()}
>
<Upload className="h-8 w-8 mx-auto mb-2 text-muted-foreground" />
<p className="text-sm text-muted-foreground">
Click to select files or drag & drop
</p>
<p className="text-xs text-muted-foreground mt-1">
ZIP, RAR, 7Z files up to 4GB each
</p>
<input
ref={fileInputRef}
type="file"
multiple
accept=".zip,.rar,.7z,.pdf,.stl"
onChange={handleFileChange}
className="hidden"
/>
</div>
{files.length > 0 && (
<div className="space-y-2">
{files.map((file, i) => (
<div key={i} className="flex items-center gap-2 p-2 rounded bg-muted/30">
<File className="h-4 w-4 shrink-0 text-muted-foreground" />
<span className="text-sm flex-1 truncate">{file.name}</span>
<span className="text-xs text-muted-foreground">{formatSize(file.size)}</span>
<button onClick={() => removeFile(i)} className="p-0.5 hover:text-destructive">
<X className="h-3.5 w-3.5" />
</button>
</div>
))}
</div>
)}
{files.length > 1 && (
<div>
<Label htmlFor="groupName" className="text-sm">Group Name (optional)</Label>
<Input
id="groupName"
value={groupName}
onChange={(e) => setGroupName(e.target.value)}
placeholder="Auto-generated from filenames"
className="mt-1"
/>
</div>
)}
</div>
)}
{(status === "uploading" || status === "processing") && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-muted/30 border">
<Loader2 className="h-6 w-6 animate-spin text-primary" />
<div>
<p className="text-sm font-medium">
{status === "uploading" ? "Uploading files..." : "Processing & uploading to Telegram..."}
</p>
<p className="text-xs text-muted-foreground mt-0.5">
{status === "uploading"
? "Sending files to server"
: "Hashing, extracting metadata, uploading to destination channel"}
</p>
</div>
</div>
)}
{status === "done" && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-green-500/10 border border-green-500/20">
<CheckCircle2 className="h-6 w-6 text-green-500" />
<div>
<p className="text-sm font-medium text-green-500">Upload complete!</p>
<p className="text-xs text-muted-foreground">Files have been indexed and uploaded to Telegram.</p>
</div>
</div>
)}
{status === "error" && (
<div className="flex items-center gap-3 p-6 rounded-lg bg-destructive/10 border border-destructive/20">
<AlertCircle className="h-6 w-6 text-destructive" />
<div>
<p className="text-sm font-medium text-destructive">Upload failed</p>
<p className="text-xs text-muted-foreground">{error}</p>
</div>
</div>
)}
<DialogFooter>
{status === "idle" && (
<>
<Button variant="outline" onClick={() => onOpenChange(false)}>Cancel</Button>
<Button onClick={handleUpload} disabled={files.length === 0 || isPending}>
<Upload className="h-4 w-4 mr-1" />
Upload {files.length > 0 ? `(${files.length})` : ""}
</Button>
</>
)}
{(status === "done" || status === "error") && (
<Button variant="outline" onClick={() => onOpenChange(false)}>Close</Button>
)}
</DialogFooter>
</DialogContent>
</Dialog>
);
}

View File

@@ -10,6 +10,7 @@ import {
createManualGroup, createManualGroup,
removePackageFromGroup, removePackageFromGroup,
dissolveGroup, dissolveGroup,
mergeGroups,
} from "@/lib/telegram/queries"; } from "@/lib/telegram/queries";
const ALLOWED_IMAGE_TYPES = [ const ALLOWED_IMAGE_TYPES = [
@@ -185,6 +186,62 @@ export async function setPreviewFromExtract(
} }
} }
export async function repairPackageAction(
packageId: string
): Promise<ActionResult> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
const pkg = await prisma.package.findUnique({
where: { id: packageId },
select: {
id: true,
fileName: true,
sourceChannelId: true,
sourceMessageId: true,
destChannelId: true,
destMessageId: true,
},
});
if (!pkg) return { success: false, error: "Package not found" };
// Clear the destination info so the worker re-processes it
await prisma.package.update({
where: { id: packageId },
data: {
destMessageId: null,
destMessageIds: [],
destChannelId: null,
},
});
// Reset the channel watermark to before this message so worker picks it up
await prisma.accountChannelMap.updateMany({
where: {
channelId: pkg.sourceChannelId,
lastProcessedMessageId: { gte: pkg.sourceMessageId },
},
data: { lastProcessedMessageId: pkg.sourceMessageId - BigInt(1) },
});
// Mark related notifications as read
await prisma.systemNotification.updateMany({
where: {
context: { path: ["packageId"], equals: packageId },
isRead: false,
},
data: { isRead: true },
});
revalidatePath("/stls");
return { success: true, data: undefined };
} catch {
return { success: false, error: "Failed to schedule repair" };
}
}
export async function retrySkippedPackageAction( export async function retrySkippedPackageAction(
id: string id: string
): Promise<ActionResult> { ): Promise<ActionResult> {
@@ -435,6 +492,26 @@ export async function updateGroupPreviewAction(
} }
} }
export async function mergeGroupsAction(
targetGroupId: string,
sourceGroupId: string
): Promise<ActionResult> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
if (targetGroupId === sourceGroupId) {
return { success: false, error: "Cannot merge a group with itself" };
}
try {
await mergeGroups(targetGroupId, sourceGroupId);
revalidatePath("/stls");
return { success: true, data: undefined };
} catch {
return { success: false, error: "Failed to merge groups" };
}
}
export async function sendAllInGroupAction( export async function sendAllInGroupAction(
groupId: string groupId: string
): Promise<ActionResult> { ): Promise<ActionResult> {

View File

@@ -1,6 +1,6 @@
import { auth } from "@/lib/auth"; import { auth } from "@/lib/auth";
import { redirect } from "next/navigation"; import { redirect } from "next/navigation";
import { listDisplayItems, searchPackages, getIngestionStatus, getAllPackageTags, listSkippedPackages, countSkippedPackages } from "@/lib/telegram/queries"; import { listDisplayItems, searchPackages, getIngestionStatus, getAllPackageTags, listSkippedPackages, countSkippedPackages, listUngroupedPackages, countUngroupedPackages } from "@/lib/telegram/queries";
import { StlTable } from "./_components/stl-table"; import { StlTable } from "./_components/stl-table";
import type { DisplayItem, PackageListItem } from "@/lib/telegram/types"; import type { DisplayItem, PackageListItem } from "@/lib/telegram/types";
@@ -24,7 +24,7 @@ export default async function StlFilesPage({ searchParams }: Props) {
const tab = (params.tab as string) ?? "packages"; const tab = (params.tab as string) ?? "packages";
// Fetch packages, ingestion status, tags, and skipped count in parallel // Fetch packages, ingestion status, tags, and skipped count in parallel
const [result, ingestionStatus, availableTags, skippedCount] = await Promise.all([ const [result, ingestionStatus, availableTags, skippedCount, ungroupedCount] = await Promise.all([
search search
? searchPackages({ ? searchPackages({
query: search, query: search,
@@ -43,6 +43,7 @@ export default async function StlFilesPage({ searchParams }: Props) {
getIngestionStatus(), getIngestionStatus(),
getAllPackageTags(), getAllPackageTags(),
countSkippedPackages(), countSkippedPackages(),
countUngroupedPackages(),
]); ]);
// For search results, wrap as DisplayItem[]; for non-search, already DisplayItem[] // For search results, wrap as DisplayItem[]; for non-search, already DisplayItem[]
@@ -55,6 +56,11 @@ export default async function StlFilesPage({ searchParams }: Props) {
? await listSkippedPackages({ page, limit: perPage }) ? await listSkippedPackages({ page, limit: perPage })
: null; : null;
// Fetch ungrouped packages only if on that tab
const ungroupedResult = tab === "ungrouped"
? await listUngroupedPackages({ page, limit: perPage })
: null;
return ( return (
<StlTable <StlTable
data={displayItems} data={displayItems}
@@ -66,6 +72,9 @@ export default async function StlFilesPage({ searchParams }: Props) {
skippedData={skippedResult?.items ?? []} skippedData={skippedResult?.items ?? []}
skippedPageCount={skippedResult?.pagination.totalPages ?? 0} skippedPageCount={skippedResult?.pagination.totalPages ?? 0}
skippedTotalCount={skippedCount} skippedTotalCount={skippedCount}
ungroupedData={ungroupedResult?.items ?? []}
ungroupedPageCount={ungroupedResult?.pagination.totalPages ?? 0}
ungroupedTotalCount={ungroupedCount}
/> />
); );
} }

View File

@@ -291,10 +291,25 @@ export async function setChannelCategory(
if (!admin.success) return admin; if (!admin.success) return admin;
try { try {
const existing = await prisma.telegramChannel.findUnique({
where: { id },
select: { category: true },
});
if (!existing) return { success: false, error: "Channel not found" };
const oldCategory = existing.category;
const newCategory = category?.trim() || null;
await prisma.telegramChannel.update({ await prisma.telegramChannel.update({
where: { id }, where: { id },
data: { category: category?.trim() || null }, data: { category: newCategory },
}); });
// Retroactively re-tag packages from this channel when category changes
if (oldCategory !== newCategory && newCategory) {
await retagChannelPackages(id, oldCategory, newCategory);
}
revalidatePath("/telegram"); revalidatePath("/telegram");
return { success: true, data: undefined }; return { success: true, data: undefined };
} catch { } catch {
@@ -302,6 +317,50 @@ export async function setChannelCategory(
} }
} }
export async function retagChannelPackages(
channelId: string,
oldCategory: string | null,
newCategory: string
): Promise<ActionResult<{ updated: number }>> {
const session = await auth();
if (!session?.user?.id) return { success: false, error: "Unauthorized" };
try {
// Find packages from this channel that have the old category tag (or no category tag)
const packages = await prisma.package.findMany({
where: { sourceChannelId: channelId },
select: { id: true, tags: true },
});
let updated = 0;
for (const pkg of packages) {
const tags = [...pkg.tags];
// Remove old category tag if present
if (oldCategory) {
const idx = tags.indexOf(oldCategory);
if (idx !== -1) tags.splice(idx, 1);
}
// Add new category tag if not already present
if (!tags.includes(newCategory)) {
tags.push(newCategory);
}
// Only update if tags actually changed
if (JSON.stringify(tags) !== JSON.stringify(pkg.tags)) {
await prisma.package.update({
where: { id: pkg.id },
data: { tags },
});
updated++;
}
}
revalidatePath("/stls");
return { success: true, data: { updated } };
} catch {
return { success: false, error: "Failed to re-tag packages" };
}
}
export async function setChannelType( export async function setChannelType(
id: string, id: string,
type: "SOURCE" | "DESTINATION" type: "SOURCE" | "DESTINATION"

View File

@@ -0,0 +1,33 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import {
markNotificationRead,
markAllNotificationsRead,
dismissNotification,
clearAllNotifications,
} from "@/data/notification.queries";
export const dynamic = "force-dynamic";
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const body = await request.json().catch(() => ({}));
const id = body.id as string | undefined;
const action = (body.action as string) ?? "read";
if (action === "dismiss" && id) {
await dismissNotification(id);
} else if (action === "clear") {
await clearAllNotifications();
} else if (id) {
await markNotificationRead(id);
} else {
await markAllNotificationsRead();
}
return NextResponse.json({ success: true });
}

View File

@@ -0,0 +1,43 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
export const dynamic = "force-dynamic";
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const body = await request.json().catch(() => ({}));
const notificationId = body.notificationId as string;
if (!notificationId) {
return NextResponse.json({ error: "notificationId required" }, { status: 400 });
}
const notification = await prisma.systemNotification.findUnique({
where: { id: notificationId },
});
if (!notification) {
return NextResponse.json({ error: "Notification not found" }, { status: 404 });
}
const context = notification.context as Record<string, unknown> | null;
const packageId = context?.packageId as string | undefined;
if (!packageId) {
return NextResponse.json({ error: "Notification has no associated package" }, { status: 400 });
}
// Import and call the repair action
const { repairPackageAction } = await import("@/app/(app)/stls/actions");
const result = await repairPackageAction(packageId);
if (!result.success) {
return NextResponse.json({ error: result.error }, { status: 500 });
}
return NextResponse.json({ success: true });
}

View File

@@ -0,0 +1,27 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import {
getRecentNotifications,
getUnreadNotificationCount,
} from "@/data/notification.queries";
export const dynamic = "force-dynamic";
export async function GET() {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const [notifications, unreadCount] = await Promise.all([
getRecentNotifications(30),
getUnreadNotificationCount(),
]);
const serialized = notifications.map((n) => ({
...n,
createdAt: n.createdAt.toISOString(),
}));
return NextResponse.json({ notifications: serialized, unreadCount });
}

View File

@@ -0,0 +1,43 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
export const dynamic = "force-dynamic";
export async function GET(
_request: Request,
{ params }: { params: Promise<{ id: string }> }
) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
const { id } = await params;
const upload = await prisma.manualUpload.findUnique({
where: { id },
include: {
files: {
select: { id: true, fileName: true, fileSize: true, packageId: true },
},
},
});
if (!upload || upload.userId !== session.user.id) {
return NextResponse.json({ error: "Not found" }, { status: 404 });
}
return NextResponse.json({
id: upload.id,
status: upload.status,
groupName: upload.groupName,
errorMessage: upload.errorMessage,
files: upload.files.map((f) => ({
...f,
fileSize: f.fileSize.toString(),
})),
createdAt: upload.createdAt.toISOString(),
completedAt: upload.completedAt?.toISOString() ?? null,
});
}

View File

@@ -0,0 +1,83 @@
import { NextResponse } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
import { writeFile, mkdir } from "fs/promises";
import path from "path";
export const dynamic = "force-dynamic";
const UPLOAD_DIR = process.env.UPLOAD_DIR ?? "/data/uploads";
const MAX_FILE_SIZE = 4 * 1024 * 1024 * 1024; // 4GB per file
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
try {
const formData = await request.formData();
const files = formData.getAll("files") as File[];
const groupName = formData.get("groupName") as string | null;
if (!files.length) {
return NextResponse.json({ error: "No files provided" }, { status: 400 });
}
// Create the upload record
const upload = await prisma.manualUpload.create({
data: {
userId: session.user.id,
groupName: groupName || (files.length > 1 ? files[0].name.replace(/\.[^.]+$/, "") : null),
status: "PENDING",
},
});
// Save files to shared volume
const uploadDir = path.join(UPLOAD_DIR, upload.id);
await mkdir(uploadDir, { recursive: true });
for (const file of files) {
if (file.size > MAX_FILE_SIZE) {
return NextResponse.json(
{ error: `File "${file.name}" exceeds 4GB limit` },
{ status: 400 }
);
}
const filePath = path.join(uploadDir, file.name);
const buffer = Buffer.from(await file.arrayBuffer());
await writeFile(filePath, buffer);
await prisma.manualUploadFile.create({
data: {
uploadId: upload.id,
fileName: file.name,
filePath,
fileSize: BigInt(file.size),
},
});
}
// Notify worker
try {
await prisma.$queryRawUnsafe(
`SELECT pg_notify('manual_upload', $1)`,
upload.id
);
} catch {
// Best-effort
}
return NextResponse.json({
uploadId: upload.id,
fileCount: files.length,
status: "PENDING",
});
} catch (err) {
return NextResponse.json(
{ error: err instanceof Error ? err.message : "Upload failed" },
{ status: 500 }
);
}
}

View File

@@ -6,6 +6,7 @@ import { Button } from "@/components/ui/button";
import { Sheet, SheetContent, SheetTrigger } from "@/components/ui/sheet"; import { Sheet, SheetContent, SheetTrigger } from "@/components/ui/sheet";
import { UserMenu } from "./user-menu"; import { UserMenu } from "./user-menu";
import { MobileSidebar } from "./mobile-sidebar"; import { MobileSidebar } from "./mobile-sidebar";
import { NotificationBell } from "./notification-bell";
const routeTitles: Record<string, string> = { const routeTitles: Record<string, string> = {
"/dashboard": "Dashboard", "/dashboard": "Dashboard",
@@ -38,7 +39,8 @@ export function Header() {
<h1 className="text-lg font-semibold">{title}</h1> <h1 className="text-lg font-semibold">{title}</h1>
<div className="ml-auto"> <div className="ml-auto flex items-center gap-1">
<NotificationBell />
<UserMenu /> <UserMenu />
</div> </div>
</header> </header>

View File

@@ -0,0 +1,268 @@
"use client";
import { useState, useEffect, useCallback } from "react";
import { Bell, AlertTriangle, AlertCircle, Info, CheckCircle2, X, Trash2 } from "lucide-react";
import { Button } from "@/components/ui/button";
import { Badge } from "@/components/ui/badge";
import {
Popover,
PopoverContent,
PopoverTrigger,
} from "@/components/ui/popover";
import { ScrollArea } from "@/components/ui/scroll-area";
import { toast } from "sonner";
interface Notification {
id: string;
type: string;
severity: "INFO" | "WARNING" | "ERROR";
title: string;
message: string;
isRead: boolean;
createdAt: string;
}
const severityIcon = {
INFO: Info,
WARNING: AlertTriangle,
ERROR: AlertCircle,
};
const severityColor = {
INFO: "text-blue-400",
WARNING: "text-orange-400",
ERROR: "text-red-400",
};
export function NotificationBell() {
const [notifications, setNotifications] = useState<Notification[]>([]);
const [unreadCount, setUnreadCount] = useState(0);
const [open, setOpen] = useState(false);
const fetchNotifications = useCallback(async () => {
try {
const res = await fetch("/api/notifications");
if (res.ok) {
const data = await res.json();
setNotifications(data.notifications ?? []);
setUnreadCount(data.unreadCount ?? 0);
}
} catch {
// Ignore fetch errors
}
}, []);
// Poll every 30 seconds + on mount
useEffect(() => {
fetchNotifications();
const interval = setInterval(fetchNotifications, 30_000);
return () => clearInterval(interval);
}, [fetchNotifications]);
// Refresh when popover opens
useEffect(() => {
if (open) fetchNotifications();
}, [open, fetchNotifications]);
async function handleMarkAllRead() {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({}),
});
setNotifications((prev) => prev.map((n) => ({ ...n, isRead: true })));
setUnreadCount(0);
} catch {
// Ignore
}
}
async function handleMarkRead(id: string) {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ id }),
});
setNotifications((prev) =>
prev.map((n) => (n.id === id ? { ...n, isRead: true } : n))
);
setUnreadCount((c) => Math.max(0, c - 1));
} catch {
// Ignore
}
}
async function handleDismiss(id: string) {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ id, action: "dismiss" }),
});
setNotifications((prev) => prev.filter((n) => n.id !== id));
setUnreadCount((c) => Math.max(0, c - 1));
} catch {
// Ignore
}
}
async function handleClearAll() {
try {
await fetch("/api/notifications/read", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ action: "clear" }),
});
setNotifications([]);
setUnreadCount(0);
} catch {
// Ignore
}
}
async function handleRepair(notificationId: string) {
try {
const res = await fetch("/api/notifications/repair", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ notificationId }),
});
if (res.ok) {
toast.success("Repair scheduled — package will be re-processed on next cycle");
fetchNotifications();
}
} catch {
// Ignore
}
}
function formatTime(iso: string): string {
const d = new Date(iso);
const now = new Date();
const diffMs = now.getTime() - d.getTime();
const diffMin = Math.floor(diffMs / 60_000);
if (diffMin < 1) return "just now";
if (diffMin < 60) return `${diffMin}m ago`;
const diffHr = Math.floor(diffMin / 60);
if (diffHr < 24) return `${diffHr}h ago`;
const diffDay = Math.floor(diffHr / 24);
return `${diffDay}d ago`;
}
return (
<Popover open={open} onOpenChange={setOpen}>
<PopoverTrigger asChild>
<Button variant="ghost" size="icon" className="relative h-9 w-9">
<Bell className="h-4 w-4" />
{unreadCount > 0 && (
<Badge
variant="destructive"
className="absolute -top-1 -right-1 h-4 min-w-4 px-1 text-[10px] leading-none"
>
{unreadCount > 99 ? "99+" : unreadCount}
</Badge>
)}
</Button>
</PopoverTrigger>
<PopoverContent className="w-96 p-0" align="end">
<div className="flex items-center justify-between border-b px-4 py-3">
<h3 className="text-sm font-semibold">Notifications</h3>
<div className="flex items-center gap-1">
{unreadCount > 0 && (
<Button
variant="ghost"
size="sm"
className="h-7 text-xs"
onClick={handleMarkAllRead}
>
Mark all read
</Button>
)}
{notifications.length > 0 && (
<Button
variant="ghost"
size="sm"
className="h-7 text-xs text-muted-foreground"
onClick={handleClearAll}
>
<Trash2 className="h-3 w-3 mr-1" />
Clear
</Button>
)}
</div>
</div>
<ScrollArea className="max-h-[400px]">
{notifications.length === 0 ? (
<div className="flex flex-col items-center justify-center py-8 text-muted-foreground">
<CheckCircle2 className="h-8 w-8 mb-2 opacity-50" />
<p className="text-sm">All clear!</p>
</div>
) : (
<div className="divide-y">
{notifications.map((n) => {
const Icon = severityIcon[n.severity] ?? Info;
const color = severityColor[n.severity] ?? "text-muted-foreground";
return (
<div
key={n.id}
className={`flex w-full gap-3 px-4 py-3 text-left hover:bg-muted/50 transition-colors ${
!n.isRead ? "bg-muted/20" : ""
}`}
role="button"
tabIndex={0}
onClick={() => !n.isRead && handleMarkRead(n.id)}
onKeyDown={(e) => {
if (e.key === "Enter" || e.key === " ") {
if (!n.isRead) handleMarkRead(n.id);
}
}}
>
<Icon className={`h-4 w-4 mt-0.5 shrink-0 ${color}`} />
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2">
<p className={`text-sm truncate ${!n.isRead ? "font-medium" : ""}`}>
{n.title}
</p>
{!n.isRead && (
<span className="h-2 w-2 rounded-full bg-primary shrink-0" />
)}
<button
className="ml-auto shrink-0 p-0.5 rounded hover:bg-muted text-muted-foreground hover:text-foreground"
onClick={(e) => { e.stopPropagation(); handleDismiss(n.id); }}
title="Dismiss"
>
<X className="h-3 w-3" />
</button>
</div>
<p className="text-xs text-muted-foreground line-clamp-2 mt-0.5">
{n.message}
</p>
<p className="text-[10px] text-muted-foreground mt-1">
{formatTime(n.createdAt)}
</p>
{(n.type === "MISSING_PART" || n.type === "HASH_MISMATCH") && (
<Button
variant="outline"
size="sm"
className="h-6 px-2 text-xs mt-1"
onClick={(e) => {
e.stopPropagation();
handleRepair(n.id);
}}
>
Repair
</Button>
)}
</div>
</div>
);
})}
</div>
)}
</ScrollArea>
</PopoverContent>
</Popover>
);
}

View File

@@ -0,0 +1,45 @@
import { prisma } from "@/lib/prisma";
export async function getUnreadNotificationCount(): Promise<number> {
return prisma.systemNotification.count({
where: { isRead: false },
});
}
export async function getRecentNotifications(limit = 20) {
return prisma.systemNotification.findMany({
orderBy: { createdAt: "desc" },
take: limit,
select: {
id: true,
type: true,
severity: true,
title: true,
message: true,
isRead: true,
createdAt: true,
},
});
}
export async function markNotificationRead(id: string) {
return prisma.systemNotification.update({
where: { id },
data: { isRead: true },
});
}
export async function markAllNotificationsRead() {
return prisma.systemNotification.updateMany({
where: { isRead: false },
data: { isRead: true },
});
}
export async function dismissNotification(id: string) {
return prisma.systemNotification.delete({ where: { id } });
}
export async function clearAllNotifications() {
return prisma.systemNotification.deleteMany({});
}

View File

@@ -340,6 +340,30 @@ export async function listPackageFiles(options: {
}; };
} }
async function fullTextSearchPackageIds(query: string, limit: number): Promise<string[]> {
// Convert user query to tsquery — handle multi-word by joining with &
const tsQuery = query
.trim()
.split(/\s+/)
.filter((w) => w.length >= 2)
.map((w) => w.replace(/[^a-zA-Z0-9]/g, ""))
.filter(Boolean)
.join(" & ");
if (!tsQuery) return [];
const results = await prisma.$queryRawUnsafe<{ id: string }[]>(
`SELECT id FROM packages
WHERE "searchVector" @@ to_tsquery('english', $1)
ORDER BY ts_rank("searchVector", to_tsquery('english', $1)) DESC
LIMIT $2`,
tsQuery,
limit
);
return results.map((r) => r.id);
}
export async function searchPackages(options: { export async function searchPackages(options: {
query: string; query: string;
page: number; page: number;
@@ -366,14 +390,26 @@ export async function searchPackages(options: {
); );
const fileMatchedIds = fileMatches.map((f) => f.packageId); const fileMatchedIds = fileMatches.map((f) => f.packageId);
// Try full-text search first (better ranking, handles word stemming)
let ftsPackageNameIds: string[] = [];
if (options.searchIn === "both" && q.length >= 3) {
try {
ftsPackageNameIds = await fullTextSearchPackageIds(q, 200);
} catch {
// FTS failed — fall back to ILIKE below
}
}
const packageNameIds = const packageNameIds =
options.searchIn === "both" options.searchIn === "both"
? ( ? ftsPackageNameIds.length > 0
await prisma.package.findMany({ ? ftsPackageNameIds
where: { fileName: { contains: q, mode: "insensitive" } }, : (
select: { id: true }, await prisma.package.findMany({
}) where: { fileName: { contains: q, mode: "insensitive" } },
).map((p) => p.id) select: { id: true },
})
).map((p) => p.id)
: []; : [];
// Also match by group name // Also match by group name
@@ -571,6 +607,72 @@ export async function countSkippedPackages(): Promise<number> {
return prisma.skippedPackage.count(); return prisma.skippedPackage.count();
} }
export async function listUngroupedPackages(options: {
page: number;
limit: number;
}) {
const { page, limit } = options;
const skip = (page - 1) * limit;
const where = { packageGroupId: null, destMessageId: { not: null } };
const [items, total] = await Promise.all([
prisma.package.findMany({
where,
orderBy: { indexedAt: "desc" },
skip,
take: limit,
select: {
id: true,
fileName: true,
fileSize: true,
archiveType: true,
creator: true,
fileCount: true,
isMultipart: true,
partCount: true,
tags: true,
indexedAt: true,
previewData: true,
sourceChannel: { select: { id: true, title: true } },
},
}),
prisma.package.count({ where }),
]);
return {
items: items.map((p) => ({
id: p.id,
fileName: p.fileName,
fileSize: p.fileSize.toString(),
contentHash: "",
archiveType: p.archiveType,
creator: p.creator,
fileCount: p.fileCount,
isMultipart: p.isMultipart,
partCount: p.partCount,
tags: p.tags,
indexedAt: p.indexedAt.toISOString(),
hasPreview: !!p.previewData,
sourceChannel: p.sourceChannel,
matchedFileCount: 0,
matchedByContent: false,
})),
pagination: {
total,
totalPages: Math.ceil(total / limit),
page,
limit,
},
};
}
export async function countUngroupedPackages(): Promise<number> {
return prisma.package.count({
where: { packageGroupId: null, destMessageId: { not: null } },
});
}
export async function getPackageGroup(groupId: string) { export async function getPackageGroup(groupId: string) {
return prisma.packageGroup.findUnique({ return prisma.packageGroup.findUnique({
where: { id: groupId }, where: { id: groupId },
@@ -630,6 +732,53 @@ export async function createManualGroup(name: string, packageIds: string[]) {
data: { packageGroupId: group.id }, data: { packageGroupId: group.id },
}); });
// Learn a grouping rule from the manual override
try {
const linkedPkgs = await prisma.package.findMany({
where: { id: { in: packageIds } },
select: { fileName: true, creator: true },
});
// Extract the common filename pattern
const fileNames = linkedPkgs.map((p) => p.fileName);
let pattern = "";
if (fileNames.length > 1) {
// Find longest common prefix
let prefix = fileNames[0];
for (let i = 1; i < fileNames.length; i++) {
while (!fileNames[i].startsWith(prefix)) {
prefix = prefix.slice(0, -1);
if (!prefix) break;
}
}
const trimmed = prefix.replace(/[\s\-_.(]+$/, "");
if (trimmed.length >= 4) {
pattern = trimmed;
}
}
// Fall back to shared creator
if (!pattern) {
const creators = [...new Set(linkedPkgs.map((p) => p.creator).filter(Boolean))];
if (creators.length === 1 && creators[0]) {
pattern = creators[0];
}
}
if (pattern) {
await prisma.groupingRule.create({
data: {
sourceChannelId: firstPkg.sourceChannelId,
pattern,
signalType: "MANUAL",
createdByGroupId: group.id,
},
});
}
} catch {
// Best-effort — don't fail the group creation if rule learning fails
}
// Clean up empty groups left behind // Clean up empty groups left behind
await prisma.packageGroup.deleteMany({ await prisma.packageGroup.deleteMany({
where: { packages: { none: {} }, id: { not: group.id } }, where: { packages: { none: {} }, id: { not: group.id } },
@@ -670,3 +819,13 @@ export async function dissolveGroup(groupId: string) {
}); });
await prisma.packageGroup.delete({ where: { id: groupId } }); await prisma.packageGroup.delete({ where: { id: groupId } });
} }
export async function mergeGroups(targetGroupId: string, sourceGroupId: string) {
// Move all packages from source group to target group
await prisma.package.updateMany({
where: { packageGroupId: sourceGroupId },
data: { packageGroupId: targetGroupId },
});
// Delete the now-empty source group
await prisma.packageGroup.delete({ where: { id: sourceGroupId } });
}

View File

@@ -11,6 +11,8 @@ export interface TelegramMessage {
fileSize: bigint; fileSize: bigint;
date: Date; date: Date;
mediaAlbumId?: string; mediaAlbumId?: string;
replyToMessageId?: bigint; // NEW
caption?: string; // NEW
} }
export interface ArchiveSet { export interface ArchiveSet {

View File

@@ -18,20 +18,22 @@ const log = childLogger("split");
const MAX_PART_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n; const MAX_PART_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n;
/** /**
* Split a file into ≤2GB parts using byte-level splitting. * Split a file into parts using byte-level splitting.
* Returns paths to the split parts. If the file is already ≤2GB, returns the original path. * Returns paths to the split parts. If the file fits in one part, returns the original path.
* Pass maxPartSize to override the global default (e.g., 3950 MiB for Premium accounts).
*/ */
export async function byteLevelSplit(filePath: string): Promise<string[]> { export async function byteLevelSplit(filePath: string, maxPartSize?: bigint): Promise<string[]> {
const effectiveMax = maxPartSize ?? MAX_PART_SIZE;
const stats = await stat(filePath); const stats = await stat(filePath);
const fileSize = BigInt(stats.size); const fileSize = BigInt(stats.size);
if (fileSize <= MAX_PART_SIZE) { if (fileSize <= effectiveMax) {
return [filePath]; return [filePath];
} }
const dir = path.dirname(filePath); const dir = path.dirname(filePath);
const baseName = path.basename(filePath); const baseName = path.basename(filePath);
const partSize = Number(MAX_PART_SIZE); const partSize = Number(effectiveMax);
const totalParts = Math.ceil(Number(fileSize) / partSize); const totalParts = Math.ceil(Number(fileSize) / partSize);
const parts: string[] = []; const parts: string[] = [];

119
worker/src/audit.ts Normal file
View File

@@ -0,0 +1,119 @@
import { db } from "./db/client.js";
import { childLogger } from "./util/logger.js";
const log = childLogger("audit");
/**
* Periodic integrity audit: checks all packages for consistency.
* Creates SystemNotification records for any issues found.
*
* Checks performed:
* 1. Multipart completeness: destMessageIds.length should match partCount
* 2. Missing destination: packages with destChannelId but no destMessageId
*/
export async function runIntegrityAudit(): Promise<{ checked: number; issues: number }> {
log.info("Starting integrity audit");
let checked = 0;
let issues = 0;
// Check 1: Multipart packages with wrong number of destination message IDs
const multipartPackages = await db.package.findMany({
where: {
isMultipart: true,
partCount: { gt: 1 },
destMessageId: { not: null },
},
select: {
id: true,
fileName: true,
partCount: true,
destMessageIds: true,
sourceChannelId: true,
sourceChannel: { select: { title: true } },
},
});
checked += multipartPackages.length;
for (const pkg of multipartPackages) {
const actualParts = pkg.destMessageIds.length;
// Only flag when we have >1 stored IDs but count doesn't match.
// Packages with exactly 1 ID are legacy (backfilled from single destMessageId) — not actionable.
if (actualParts > 1 && actualParts !== pkg.partCount) {
issues++;
// Check if we already have a notification for this
const existing = await db.systemNotification.findFirst({
where: {
type: "MISSING_PART",
context: { path: ["packageId"], equals: pkg.id },
},
select: { id: true },
});
if (!existing) {
await db.systemNotification.create({
data: {
type: "MISSING_PART",
severity: "WARNING",
title: `Incomplete multipart: ${pkg.fileName}`,
message: `Expected ${pkg.partCount} parts but only ${actualParts} destination message IDs stored`,
context: {
packageId: pkg.id,
fileName: pkg.fileName,
expectedParts: pkg.partCount,
actualParts,
sourceChannelId: pkg.sourceChannelId,
channelTitle: pkg.sourceChannel.title,
},
},
});
log.warn(
{ packageId: pkg.id, fileName: pkg.fileName, expected: pkg.partCount, actual: actualParts },
"Multipart package has mismatched part count"
);
}
}
}
// Check 2: Packages with dest channel but no dest message (orphaned index)
const orphanedCount = await db.package.count({
where: {
destChannelId: { not: null },
destMessageId: null,
},
});
if (orphanedCount > 0) {
issues++;
const existing = await db.systemNotification.findFirst({
where: {
type: "INTEGRITY_AUDIT",
context: { path: ["check"], equals: "orphaned_index" },
createdAt: { gte: new Date(Date.now() - 24 * 60 * 60 * 1000) },
},
select: { id: true },
});
if (!existing) {
await db.systemNotification.create({
data: {
type: "INTEGRITY_AUDIT",
severity: "INFO",
title: `${orphanedCount} packages with missing destination message`,
message: `Found ${orphanedCount} packages that have a destination channel set but no destination message ID. These may be from interrupted uploads.`,
context: {
check: "orphaned_index",
count: orphanedCount,
},
},
});
}
}
log.info({ checked, issues }, "Integrity audit complete");
return { checked, issues };
}

View File

@@ -5,7 +5,14 @@ import { config } from "../util/config.js";
const pool = new pg.Pool({ const pool = new pg.Pool({
connectionString: config.databaseUrl, connectionString: config.databaseUrl,
max: 5, // Pool needs headroom for: 2 account advisory locks (held for entire cycle),
// up to 2 concurrent hash locks, plus Prisma operations from both accounts.
// Previously max=5 caused pool exhaustion and indefinite hangs.
max: 15,
// Prevent pool.connect() from blocking forever when pool is exhausted.
// Throws an error after 30s so the operation can fail and retry instead of
// silently hanging for hours (as happened with the Turnbase.7z stall).
connectionTimeoutMillis: 30_000,
}); });
const adapter = new PrismaPg(pool); const adapter = new PrismaPg(pool);

View File

@@ -79,3 +79,66 @@ export async function releaseLock(accountId: string): Promise<void> {
client.release(); client.release();
} }
} }
/**
* Derive a lock ID for a content hash. Prefixes with "hash:" so the resulting
* 32-bit integer does not collide with account advisory lock IDs.
*/
function contentHashToLockId(contentHash: string): number {
return hashToLockId(`hash:${contentHash}`);
}
/**
* Acquire a per-content-hash advisory lock before uploading.
* Prevents two concurrent workers from uploading the same archive
* when both scan a shared source channel.
*
* Returns true if acquired (proceed with upload).
* Returns false if already held (another worker is handling this archive — skip).
*
* MUST be released via releaseHashLock() after createPackageStub() completes,
* including on all error paths (use try/finally).
*/
export async function tryAcquireHashLock(contentHash: string): Promise<boolean> {
const lockId = contentHashToLockId(contentHash);
const client = await pool.connect();
try {
const result = await client.query<{ pg_try_advisory_lock: boolean }>(
"SELECT pg_try_advisory_lock($1)",
[lockId]
);
const acquired = result.rows[0]?.pg_try_advisory_lock ?? false;
if (acquired) {
heldConnections.set(`hash:${contentHash}`, client);
log.debug({ hash: contentHash.slice(0, 16), lockId }, "Hash lock acquired");
return true;
} else {
client.release();
log.debug({ hash: contentHash.slice(0, 16), lockId }, "Hash lock held by another worker — skipping");
return false;
}
} catch (err) {
client.release();
throw err;
}
}
/**
* Release the per-content-hash advisory lock.
* Call after createPackageStub() completes (or on any error path).
*/
export async function releaseHashLock(contentHash: string): Promise<void> {
const lockId = contentHashToLockId(contentHash);
const client = heldConnections.get(`hash:${contentHash}`);
if (!client) {
log.warn({ hash: contentHash.slice(0, 16) }, "No held connection for hash lock release");
return;
}
try {
await client.query("SELECT pg_advisory_unlock($1)", [lockId]);
log.debug({ hash: contentHash.slice(0, 16) }, "Hash lock released");
} finally {
heldConnections.delete(`hash:${contentHash}`);
client.release();
}
}

View File

@@ -74,6 +74,105 @@ export async function getUploadedPackageByHash(contentHash: string) {
}); });
} }
export interface CreatePackageStubInput {
contentHash: string;
fileName: string;
fileSize: bigint;
archiveType: ArchiveType;
sourceChannelId: string;
sourceMessageId: bigint;
sourceTopicId?: bigint | null;
destChannelId: string;
destMessageId: bigint;
destMessageIds: bigint[];
isMultipart: boolean;
partCount: number;
ingestionRunId: string;
creator?: string | null;
tags?: string[];
}
/**
* Write a minimal Package record immediately after Telegram confirms the upload.
* Call this before preview/metadata extraction so recoverIncompleteUploads() can
* detect and verify the package if the worker crashes mid-metadata.
*
* Follow with updatePackageWithMetadata() once file entries and preview are ready.
*/
export async function createPackageStub(
input: CreatePackageStubInput
): Promise<{ id: string }> {
const pkg = await db.package.create({
data: {
contentHash: input.contentHash,
fileName: input.fileName,
fileSize: input.fileSize,
archiveType: input.archiveType,
sourceChannelId: input.sourceChannelId,
sourceMessageId: input.sourceMessageId,
sourceTopicId: input.sourceTopicId ?? undefined,
destChannelId: input.destChannelId,
destMessageId: input.destMessageId,
destMessageIds: input.destMessageIds,
isMultipart: input.isMultipart,
partCount: input.partCount,
fileCount: 0,
ingestionRunId: input.ingestionRunId,
creator: input.creator ?? undefined,
tags: input.tags?.length ? input.tags : undefined,
},
select: { id: true },
});
try {
await db.$queryRawUnsafe(
`SELECT pg_notify('new_package', $1)`,
JSON.stringify({
packageId: pkg.id,
fileName: input.fileName,
creator: input.creator ?? null,
tags: input.tags ?? [],
})
);
} catch {
// Best-effort
}
return pkg;
}
/**
* Update a stub Package with file entries and preview after metadata extraction.
* Called as Phase 2 of the two-phase write after createPackageStub().
*/
export async function updatePackageWithMetadata(
packageId: string,
input: {
files: {
path: string;
fileName: string;
extension: string | null;
compressedSize: bigint;
uncompressedSize: bigint;
crc32: string | null;
}[];
previewData?: Buffer | null;
previewMsgId?: bigint | null;
}
): Promise<void> {
await db.package.update({
where: { id: packageId },
data: {
fileCount: input.files.length,
previewData: input.previewData ? new Uint8Array(input.previewData) : undefined,
previewMsgId: input.previewMsgId ?? undefined,
files: {
create: input.files,
},
},
});
}
/** /**
* Check if a package already exists for a given source message ID * Check if a package already exists for a given source message ID
* AND was successfully uploaded to the destination (destMessageId is set). * AND was successfully uploaded to the destination (destMessageId is set).
@@ -119,6 +218,8 @@ export interface CreatePackageInput {
tags?: string[]; tags?: string[];
previewData?: Buffer | null; previewData?: Buffer | null;
previewMsgId?: bigint | null; previewMsgId?: bigint | null;
sourceCaption?: string | null;
replyToMessageId?: bigint | null;
files: { files: {
path: string; path: string;
fileName: string; fileName: string;
@@ -150,6 +251,8 @@ export async function createPackageWithFiles(input: CreatePackageInput) {
tags: input.tags && input.tags.length > 0 ? input.tags : undefined, tags: input.tags && input.tags.length > 0 ? input.tags : undefined,
previewData: input.previewData ? new Uint8Array(input.previewData) : undefined, previewData: input.previewData ? new Uint8Array(input.previewData) : undefined,
previewMsgId: input.previewMsgId ?? undefined, previewMsgId: input.previewMsgId ?? undefined,
sourceCaption: input.sourceCaption ?? undefined,
replyToMessageId: input.replyToMessageId ?? undefined,
files: { files: {
create: input.files, create: input.files,
}, },
@@ -304,6 +407,16 @@ export async function updateAccountAuthState(
}); });
} }
export async function updateAccountPremiumStatus(
accountId: string,
isPremium: boolean
): Promise<void> {
await db.telegramAccount.update({
where: { id: accountId },
data: { isPremium },
});
}
export async function getAccountAuthCode(accountId: string) { export async function getAccountAuthCode(accountId: string) {
const account = await db.telegramAccount.findUnique({ const account = await db.telegramAccount.findUnique({
where: { id: accountId }, where: { id: accountId },
@@ -587,3 +700,46 @@ export async function linkPackagesToGroup(
data: { packageGroupId: groupId }, data: { packageGroupId: groupId },
}); });
} }
export async function createTimeWindowGroup(input: {
sourceChannelId: string;
name: string;
packageIds: string[];
}): Promise<string> {
const group = await db.packageGroup.create({
data: {
sourceChannelId: input.sourceChannelId,
name: input.name,
groupingSource: "AUTO_TIME",
},
});
await db.package.updateMany({
where: { id: { in: input.packageIds } },
data: { packageGroupId: group.id },
});
return group.id;
}
export async function createAutoGroup(input: {
sourceChannelId: string;
name: string;
packageIds: string[];
groupingSource: "ALBUM" | "MANUAL" | "AUTO_TIME" | "AUTO_PATTERN" | "AUTO_ZIP" | "AUTO_CAPTION" | "AUTO_REPLY";
}): Promise<string> {
const group = await db.packageGroup.create({
data: {
sourceChannelId: input.sourceChannelId,
name: input.name,
groupingSource: input.groupingSource,
},
});
await db.package.updateMany({
where: { id: { in: input.packageIds } },
data: { packageGroupId: group.id },
});
return group.id;
}

View File

@@ -101,16 +101,14 @@ export async function processExtractRequest(requestId: string): Promise<void> {
try { try {
await mkdir(tempDir, { recursive: true }); await mkdir(tempDir, { recursive: true });
// Wrap the entire TDLib session in the mutex so no other TDLib const accounts = await getActiveAccounts();
// operation can run concurrently (TDLib is single-session). if (accounts.length === 0) {
await withTdlibMutex("extract", async () => { throw new Error("No authenticated Telegram accounts available");
const accounts = await getActiveAccounts(); }
if (accounts.length === 0) { const account = accounts[0];
throw new Error("No authenticated Telegram accounts available");
}
const account = accounts[0]; await withTdlibMutex(account.phone, "extract", async () => {
const client = await createTdlibClient({ id: account.id, phone: account.phone }); const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try { try {
// Load chat list so TDLib can find the dest channel // Load chat list so TDLib can find the dest channel

View File

@@ -5,6 +5,7 @@ import { withTdlibMutex } from "./util/mutex.js";
import { processFetchRequest } from "./worker.js"; import { processFetchRequest } from "./worker.js";
import { processExtractRequest } from "./extract-listener.js"; import { processExtractRequest } from "./extract-listener.js";
import { rebuildPackageDatabase } from "./rebuild.js"; import { rebuildPackageDatabase } from "./rebuild.js";
import { processManualUpload } from "./manual-upload.js";
import { generateInviteLink, createSupergroup, searchPublicChat } from "./tdlib/chats.js"; import { generateInviteLink, createSupergroup, searchPublicChat } from "./tdlib/chats.js";
import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js"; import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js";
import { triggerImmediateCycle } from "./scheduler.js"; import { triggerImmediateCycle } from "./scheduler.js";
@@ -13,6 +14,7 @@ import {
getGlobalSetting, getGlobalSetting,
setGlobalSetting, setGlobalSetting,
getActiveAccounts, getActiveAccounts,
getChannelFetchRequest,
upsertChannel, upsertChannel,
ensureAccountChannelLink, ensureAccountChannelLink,
updateFetchRequestStatus, updateFetchRequestStatus,
@@ -55,6 +57,7 @@ async function connectListener(): Promise<void> {
await pgClient.query("LISTEN join_channel"); await pgClient.query("LISTEN join_channel");
await pgClient.query("LISTEN archive_extract"); await pgClient.query("LISTEN archive_extract");
await pgClient.query("LISTEN rebuild_packages"); await pgClient.query("LISTEN rebuild_packages");
await pgClient.query("LISTEN manual_upload");
pgClient.on("notification", (msg) => { pgClient.on("notification", (msg) => {
if (msg.channel === "channel_fetch" && msg.payload) { if (msg.channel === "channel_fetch" && msg.payload) {
@@ -71,6 +74,8 @@ async function connectListener(): Promise<void> {
handleArchiveExtract(msg.payload); handleArchiveExtract(msg.payload);
} else if (msg.channel === "rebuild_packages" && msg.payload) { } else if (msg.channel === "rebuild_packages" && msg.payload) {
handleRebuildPackages(msg.payload); handleRebuildPackages(msg.payload);
} else if (msg.channel === "manual_upload" && msg.payload) {
handleManualUpload(msg.payload);
} }
}); });
@@ -96,7 +101,7 @@ async function connectListener(): Promise<void> {
} }
}); });
log.info("Fetch listener started (channel_fetch, generate_invite, create_destination, ingestion_trigger, join_channel, archive_extract, rebuild_packages)"); log.info("Fetch listener started (channel_fetch, generate_invite, create_destination, ingestion_trigger, join_channel, archive_extract, rebuild_packages, manual_upload)");
} catch (err) { } catch (err) {
log.error({ err }, "Failed to start fetch listener — retrying"); log.error({ err }, "Failed to start fetch listener — retrying");
scheduleReconnect(); scheduleReconnect();
@@ -129,7 +134,9 @@ let fetchQueue: Promise<void> = Promise.resolve();
function handleChannelFetch(requestId: string): void { function handleChannelFetch(requestId: string): void {
fetchQueue = fetchQueue.then(async () => { fetchQueue = fetchQueue.then(async () => {
try { try {
await withTdlibMutex("fetch-channels", () => const request = await getChannelFetchRequest(requestId);
const key = request?.account?.phone ?? "global";
await withTdlibMutex(key, "fetch-channels", () =>
processFetchRequest(requestId) processFetchRequest(requestId)
); );
} catch (err) { } catch (err) {
@@ -143,22 +150,20 @@ function handleChannelFetch(requestId: string): void {
function handleGenerateInvite(channelId: string): void { function handleGenerateInvite(channelId: string): void {
fetchQueue = fetchQueue.then(async () => { fetchQueue = fetchQueue.then(async () => {
try { try {
await withTdlibMutex("generate-invite", async () => { const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to generate invite link");
return;
}
const account = accounts[0];
await withTdlibMutex(account.phone, "generate-invite", async () => {
const destChannel = await getGlobalDestinationChannel(); const destChannel = await getGlobalDestinationChannel();
if (!destChannel || destChannel.id !== channelId) { if (!destChannel || destChannel.id !== channelId) {
log.warn({ channelId }, "Destination channel mismatch, skipping invite generation"); log.warn({ channelId }, "Destination channel mismatch, skipping invite generation");
return; return;
} }
// Use the first available authenticated account to generate the link const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to generate invite link");
return;
}
const account = accounts[0];
const client = await createTdlibClient({ id: account.id, phone: account.phone });
try { try {
const link = await generateInviteLink(client, destChannel.telegramId); const link = await generateInviteLink(client, destChannel.telegramId);
@@ -183,7 +188,13 @@ function handleCreateDestination(payload: string): void {
const parsed = JSON.parse(payload) as { requestId: string; title: string }; const parsed = JSON.parse(payload) as { requestId: string; title: string };
requestId = parsed.requestId; requestId = parsed.requestId;
await withTdlibMutex("create-destination", async () => { const accounts = await getActiveAccounts();
if (accounts.length === 0) {
throw new Error("No authenticated accounts available to create the group");
}
const account = accounts[0];
await withTdlibMutex(account.phone, "create-destination", async () => {
const { db } = await import("./db/client.js"); const { db } = await import("./db/client.js");
// Mark the request as in-progress // Mark the request as in-progress
@@ -192,14 +203,7 @@ function handleCreateDestination(payload: string): void {
data: { status: "IN_PROGRESS" }, data: { status: "IN_PROGRESS" },
}); });
// Use the first available authenticated account const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
const accounts = await getActiveAccounts();
if (accounts.length === 0) {
throw new Error("No authenticated accounts available to create the group");
}
const account = accounts[0];
const client = await createTdlibClient({ id: account.id, phone: account.phone });
try { try {
// Create the supergroup via TDLib // Create the supergroup via TDLib
@@ -324,16 +328,16 @@ function handleJoinChannel(payload: string): void {
const parsed = JSON.parse(payload) as { requestId: string; input: string; accountId: string }; const parsed = JSON.parse(payload) as { requestId: string; input: string; accountId: string };
requestId = parsed.requestId; requestId = parsed.requestId;
await withTdlibMutex("join-channel", async () => { const accounts = await getActiveAccounts();
const account = accounts.find((a) => a.id === parsed.accountId) ?? accounts[0];
if (!account) {
throw new Error("No authenticated accounts available");
}
await withTdlibMutex(account.phone, "join-channel", async () => {
await updateFetchRequestStatus(requestId!, "IN_PROGRESS"); await updateFetchRequestStatus(requestId!, "IN_PROGRESS");
const accounts = await getActiveAccounts(); const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
const account = accounts.find((a) => a.id === parsed.accountId) ?? accounts[0];
if (!account) {
throw new Error("No authenticated accounts available");
}
const client = await createTdlibClient({ id: account.id, phone: account.phone });
try { try {
const linkInfo = parseTelegramInput(parsed.input); const linkInfo = parseTelegramInput(parsed.input);
@@ -503,7 +507,12 @@ function handleIngestionTrigger(): void {
function handleRebuildPackages(requestId: string): void { function handleRebuildPackages(requestId: string): void {
fetchQueue = fetchQueue.then(async () => { fetchQueue = fetchQueue.then(async () => {
try { try {
await withTdlibMutex("rebuild-packages", () => const accounts = await getActiveAccounts();
if (accounts.length === 0) {
log.warn("No authenticated accounts to rebuild packages");
return;
}
await withTdlibMutex(accounts[0].phone, "rebuild-packages", () =>
rebuildPackageDatabase(requestId) rebuildPackageDatabase(requestId)
); );
} catch (err) { } catch (err) {
@@ -511,3 +520,11 @@ function handleRebuildPackages(requestId: string): void {
} }
}); });
} }
// ── Manual upload handler ──
function handleManualUpload(uploadId: string): void {
fetchQueue = fetchQueue
.then(() => processManualUpload(uploadId))
.catch((err) => log.error({ err, uploadId }, "Manual upload processing failed"));
}

View File

@@ -1,7 +1,8 @@
import type { Client } from "tdl"; import type { Client } from "tdl";
import type { TelegramPhoto } from "./preview/match.js"; import type { TelegramPhoto } from "./preview/match.js";
import { downloadPhotoThumbnail } from "./tdlib/download.js"; import { downloadPhotoThumbnail } from "./tdlib/download.js";
import { createOrFindPackageGroup, linkPackagesToGroup } from "./db/queries.js"; import { createOrFindPackageGroup, linkPackagesToGroup, createTimeWindowGroup, createAutoGroup } from "./db/queries.js";
import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js"; import { childLogger } from "./util/logger.js";
import { db } from "./db/client.js"; import { db } from "./db/client.js";
@@ -77,3 +78,591 @@ export async function processAlbumGroups(
} }
} }
} }
/**
* Apply learned GroupingRules from manual overrides.
* For each rule, find ungrouped packages whose fileName contains the pattern.
*/
export async function processRuleBasedGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const rules = await db.groupingRule.findMany({
where: { sourceChannelId },
orderBy: { confidence: "desc" },
});
if (rules.length === 0) return;
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
select: { id: true, fileName: true, creator: true },
});
if (ungrouped.length < 2) return;
for (const rule of rules) {
const matches = ungrouped.filter((pkg) => {
const lower = rule.pattern.toLowerCase();
return pkg.fileName.toLowerCase().includes(lower) ||
(pkg.creator && pkg.creator.toLowerCase().includes(lower));
});
if (matches.length < 2) continue;
// Check if any are already grouped (by a previous rule in this loop)
const stillUngrouped = await db.package.findMany({
where: {
id: { in: matches.map((m) => m.id) },
packageGroupId: null,
},
select: { id: true },
});
if (stillUngrouped.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: rule.pattern,
packageIds: stillUngrouped.map((m) => m.id),
groupingSource: "MANUAL",
});
log.info(
{ groupId, ruleId: rule.id, pattern: rule.pattern, memberCount: stillUngrouped.length },
"Applied learned grouping rule"
);
} catch (err) {
log.warn({ err, ruleId: rule.id }, "Failed to apply grouping rule");
}
}
}
/**
* After album grouping, cluster remaining ungrouped packages from the same channel
* that were posted within a configurable time window.
* Only groups packages that were just indexed in this scan cycle (the `indexedPackages` list).
*/
export async function processTimeWindowGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
if (config.autoGroupTimeWindowMinutes <= 0) return;
// Find which of the just-indexed packages are still ungrouped
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
orderBy: { sourceMessageId: "asc" },
select: {
id: true,
fileName: true,
sourceMessageId: true,
indexedAt: true,
},
});
if (ungrouped.length < 2) return;
const windowMs = config.autoGroupTimeWindowMinutes * 60 * 1000;
// Cluster by time proximity: walk through sorted list, start new cluster when gap > window
const clusters: typeof ungrouped[] = [];
let current: typeof ungrouped = [ungrouped[0]];
for (let i = 1; i < ungrouped.length; i++) {
const prev = current[current.length - 1];
const gap = Math.abs(ungrouped[i].indexedAt.getTime() - prev.indexedAt.getTime());
if (gap <= windowMs) {
current.push(ungrouped[i]);
} else {
clusters.push(current);
current = [ungrouped[i]];
}
}
clusters.push(current);
// Create groups for clusters with 2+ packages
for (const cluster of clusters) {
if (cluster.length < 2) continue;
// Derive group name from common filename prefix
const name = findCommonPrefix(cluster.map((p) => p.fileName)) || cluster[0].fileName;
try {
const groupId = await createTimeWindowGroup({
sourceChannelId,
name,
packageIds: cluster.map((p) => p.id),
});
log.info(
{ groupId, name, memberCount: cluster.length },
"Created time-window group"
);
} catch (err) {
log.warn({ err, clusterSize: cluster.length }, "Failed to create time-window group");
}
}
}
/**
* Group ungrouped packages that share a date pattern (YYYY-MM, YYYY_MM, etc.)
* or project slug extracted from their filenames.
*/
export async function processPatternGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
},
select: { id: true, fileName: true },
});
if (ungrouped.length < 2) return;
// Group by extracted pattern
const patternMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
const pattern = extractPattern(pkg.fileName);
if (!pattern) continue;
const group = patternMap.get(pattern) ?? [];
group.push(pkg);
patternMap.set(pattern, group);
}
for (const [pattern, members] of patternMap) {
if (members.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: pattern,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_PATTERN",
});
log.info(
{ groupId, pattern, memberCount: members.length },
"Created pattern-based group"
);
} catch (err) {
log.warn({ err, pattern }, "Failed to create pattern group");
}
}
}
/**
* Extract a grouping pattern from a filename.
* Matches: YYYY-MM, YYYY_MM, "Month Year", or a project prefix before common separators.
* Returns null if no usable pattern found.
*/
function extractPattern(fileName: string): string | null {
// Strip extension for matching
const name = fileName.replace(/\.(zip|rar|7z|pdf|stl)(\.\d+)?$/i, "");
// Match YYYY-MM or YYYY_MM patterns
const dateMatch = name.match(/(\d{4})[\-_](\d{2})/);
if (dateMatch) {
return `${dateMatch[1]}-${dateMatch[2]}`;
}
// Match "Month Year" patterns (e.g., "January 2025", "Jan 2025")
const months = "(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|aug(?:ust)?|sep(?:tember)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)";
const monthYearMatch = name.match(new RegExp(`(${months})\\s*(\\d{4})`, "i"));
if (monthYearMatch) {
const monthStr = monthYearMatch[1].toLowerCase().slice(0, 3);
const monthNum = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"].indexOf(monthStr) + 1;
if (monthNum > 0) {
return `${monthYearMatch[2]}-${String(monthNum).padStart(2, "0")}`;
}
}
// Match project prefix: text before " - ", " ", or "(". Must be at least 5 chars.
const prefixMatch = name.match(/^(.{5,}?)(?:\s*[\-]\s|\s*\()/);
if (prefixMatch) {
return prefixMatch[1].trim();
}
return null;
}
/**
* Group ungrouped packages that share the same creator within a channel.
* Only groups if there are 3+ packages from the same creator (to avoid
* over-grouping when a creator only has a couple files).
*/
export async function processCreatorGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
creator: { not: null },
},
select: { id: true, fileName: true, creator: true },
});
if (ungrouped.length < 3) return;
// Group by creator
const creatorMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.creator) continue;
const key = pkg.creator.toLowerCase();
const group = creatorMap.get(key) ?? [];
group.push(pkg);
creatorMap.set(key, group);
}
for (const [, members] of creatorMap) {
if (members.length < 3) continue;
const creatorName = members[0].creator!;
const name = findCommonPrefix(members.map((m) => m.fileName)) || creatorName;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_PATTERN",
});
log.info(
{ groupId, creator: creatorName, memberCount: members.length },
"Created creator-based group"
);
} catch (err) {
log.warn({ err, creator: creatorName }, "Failed to create creator group");
}
}
}
/**
* Group ungrouped packages that share the same root folder inside their archives.
* E.g., if two packages both contain files under "ProjectX/", they're likely related.
* Only considers packages with 3+ files (to avoid false positives from flat archives).
*/
export async function processZipPathGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
// Find ungrouped packages that have indexed files
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
fileCount: { gte: 3 },
},
select: {
id: true,
fileName: true,
files: {
select: { path: true },
take: 50,
},
},
});
if (ungrouped.length < 2) return;
// Extract the dominant root folder for each package
const packageRoots = new Map<string, { id: string; fileName: string }[]>();
for (const pkg of ungrouped) {
const root = extractRootFolder(pkg.files.map((f) => f.path));
if (!root) continue;
const key = root.toLowerCase();
const group = packageRoots.get(key) ?? [];
group.push({ id: pkg.id, fileName: pkg.fileName });
packageRoots.set(key, group);
}
// Create groups for roots shared by 2+ packages
for (const [root, members] of packageRoots) {
if (members.length < 2) continue;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name: root,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_ZIP",
});
log.info(
{ groupId, rootFolder: root, memberCount: members.length },
"Created ZIP path prefix group"
);
} catch (err) {
log.warn({ err, rootFolder: root }, "Failed to create ZIP path group");
}
}
}
/**
* Group ungrouped packages that reply to the same root message.
* If message B and C both reply to message A, they're grouped together.
*/
export async function processReplyChainGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
replyToMessageId: { not: null },
},
select: {
id: true,
fileName: true,
replyToMessageId: true,
},
});
if (ungrouped.length < 2) return;
// Group by replyToMessageId
const replyMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.replyToMessageId) continue;
const key = pkg.replyToMessageId.toString();
const group = replyMap.get(key) ?? [];
group.push(pkg);
replyMap.set(key, group);
}
for (const [replyId, members] of replyMap) {
if (members.length < 2) continue;
const name = findCommonPrefix(members.map((m) => m.fileName)) || members[0].fileName;
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_REPLY" as const,
});
log.info(
{ groupId, replyToMessageId: replyId, memberCount: members.length },
"Created reply-chain group"
);
} catch (err) {
log.warn({ err, replyToMessageId: replyId }, "Failed to create reply-chain group");
}
}
}
/**
* Group ungrouped packages with similar captions from the same channel.
* Uses normalized caption comparison — two captions match if they share
* the same significant words (ignoring common words and file extensions).
*/
export async function processCaptionGroups(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const ungrouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: null,
sourceCaption: { not: null },
},
select: {
id: true,
fileName: true,
sourceCaption: true,
},
});
if (ungrouped.length < 2) return;
// Group by normalized caption key
const captionMap = new Map<string, typeof ungrouped>();
for (const pkg of ungrouped) {
if (!pkg.sourceCaption) continue;
const key = normalizeCaptionKey(pkg.sourceCaption);
if (!key) continue;
const group = captionMap.get(key) ?? [];
group.push(pkg);
captionMap.set(key, group);
}
for (const [, members] of captionMap) {
if (members.length < 2) continue;
const name = members[0].sourceCaption!.slice(0, 80);
try {
const groupId = await createAutoGroup({
sourceChannelId,
name,
packageIds: members.map((m) => m.id),
groupingSource: "AUTO_CAPTION" as const,
});
log.info(
{ groupId, memberCount: members.length },
"Created caption-match group"
);
} catch (err) {
log.warn({ err }, "Failed to create caption group");
}
}
}
/**
* Normalize a caption for grouping: lowercase, strip extensions and numbers,
* extract significant words (3+ chars), sort, and join.
* Two captions with the same key are considered a match.
*/
function normalizeCaptionKey(caption: string): string | null {
const stripped = caption
.toLowerCase()
.replace(/\.(zip|rar|7z|stl|pdf|obj|gcode)(\.\d+)?/gi, "")
.replace(/[^a-z0-9\s]/g, " ");
const words = stripped
.split(/\s+/)
.filter((w) => w.length >= 3)
.filter((w) => !["the", "and", "for", "with", "from", "part", "file", "files"].includes(w));
if (words.length < 2) return null;
return words.sort().join(" ");
}
/**
* Extract the dominant root folder from a list of archive file paths.
* Returns the first path segment that appears in >50% of files.
* Returns null for flat archives or archives with no common root.
*/
function extractRootFolder(paths: string[]): string | null {
if (paths.length === 0) return null;
// Count first path segments
const segmentCounts = new Map<string, number>();
for (const p of paths) {
// Normalize separators and get first segment
const normalized = p.replace(/\\/g, "/");
const firstSlash = normalized.indexOf("/");
if (firstSlash <= 0) continue; // Skip root-level files
const segment = normalized.slice(0, firstSlash);
// Skip common noise folders
if (segment === "__MACOSX" || segment === ".DS_Store" || segment === "Thumbs.db") continue;
segmentCounts.set(segment, (segmentCounts.get(segment) ?? 0) + 1);
}
if (segmentCounts.size === 0) return null;
// Find the most common segment
let maxSegment = "";
let maxCount = 0;
for (const [seg, count] of segmentCounts) {
if (count > maxCount) {
maxSegment = seg;
maxCount = count;
}
}
// Must appear in >50% of files and be at least 3 chars
if (maxCount < paths.length * 0.5 || maxSegment.length < 3) return null;
return maxSegment;
}
/**
* Detect packages that could have been grouped differently.
* Checks if any grouped package's filename matches a GroupingRule
* that would place it in a different group.
*/
export async function detectGroupingConflicts(
sourceChannelId: string,
indexedPackages: IndexedPackageRef[]
): Promise<void> {
const rules = await db.groupingRule.findMany({
where: { sourceChannelId },
});
if (rules.length === 0) return;
const grouped = await db.package.findMany({
where: {
id: { in: indexedPackages.map((p) => p.packageId) },
packageGroupId: { not: null },
},
select: {
id: true,
fileName: true,
packageGroupId: true,
packageGroup: { select: { name: true, groupingSource: true } },
},
});
for (const pkg of grouped) {
for (const rule of rules) {
if (pkg.fileName.toLowerCase().includes(rule.pattern.toLowerCase())) {
// Check if the rule's source group is different from current group
if (rule.createdByGroupId && rule.createdByGroupId !== pkg.packageGroupId) {
try {
await db.systemNotification.create({
data: {
type: "GROUPING_CONFLICT",
severity: "INFO",
title: `Potential grouping conflict: ${pkg.fileName}`,
message: `Grouped by ${pkg.packageGroup?.groupingSource ?? "unknown"} into "${pkg.packageGroup?.name}", but also matches rule "${rule.pattern}" from a different manual group`,
context: {
packageId: pkg.id,
fileName: pkg.fileName,
currentGroupId: pkg.packageGroupId,
matchedRuleId: rule.id,
matchedPattern: rule.pattern,
},
},
});
} catch {
// Best-effort
}
break; // One notification per package
}
}
}
}
}
/**
* Find the longest common prefix among a list of filenames,
* trimming trailing separators and partial words.
*/
function findCommonPrefix(names: string[]): string {
if (names.length === 0) return "";
if (names.length === 1) return names[0];
let prefix = names[0];
for (let i = 1; i < names.length; i++) {
while (!names[i].startsWith(prefix)) {
prefix = prefix.slice(0, -1);
if (prefix.length === 0) return "";
}
}
// Trim trailing separators and partial words
const trimmed = prefix.replace(/[\s\-_.(]+$/, "");
return trimmed.length >= 3 ? trimmed : "";
}

View File

@@ -27,6 +27,33 @@ async function main(): Promise<void> {
await cleanupTempDir(); await cleanupTempDir();
await markStaleRunsAsFailed(); await markStaleRunsAsFailed();
// Release any advisory locks orphaned by a previous worker instance.
// When Docker kills a container, PostgreSQL may keep the session alive
// (zombie connections), holding advisory locks that block the new worker.
try {
const result = await pool.query(`
SELECT pid, state, left(query, 80) as query, age(clock_timestamp(), state_change) as idle_time
FROM pg_stat_activity
WHERE datname = current_database()
AND pid != pg_backend_pid()
AND state = 'idle'
AND query LIKE '%pg_try_advisory_lock%'
AND state_change < clock_timestamp() - interval '5 minutes'
`);
for (const row of result.rows) {
log.warn(
{ pid: row.pid, idleTime: row.idle_time, query: row.query },
"Terminating stale advisory lock session from previous worker"
);
await pool.query("SELECT pg_terminate_backend($1)", [row.pid]);
}
if (result.rows.length > 0) {
log.info({ terminated: result.rows.length }, "Cleaned up stale advisory lock sessions");
}
} catch (err) {
log.warn({ err }, "Failed to clean up stale advisory locks (non-fatal)");
}
// Verify destination messages exist for all "uploaded" packages. // Verify destination messages exist for all "uploaded" packages.
// Resets any packages whose dest message is missing so they get re-processed. // Resets any packages whose dest message is missing so they get re-processed.
await recoverIncompleteUploads(); await recoverIncompleteUploads();

211
worker/src/manual-upload.ts Normal file
View File

@@ -0,0 +1,211 @@
import path from "path";
import { rm } from "fs/promises";
import { db } from "./db/client.js";
import { childLogger } from "./util/logger.js";
import { config } from "./util/config.js";
import { hashParts } from "./archive/hash.js";
import { byteLevelSplit } from "./archive/split.js";
import { uploadToChannel } from "./upload/channel.js";
import { createTdlibClient, closeTdlibClient } from "./tdlib/client.js";
import { readZipCentralDirectory } from "./archive/zip-reader.js";
import { readRarContents } from "./archive/rar-reader.js";
import { read7zContents } from "./archive/sevenz-reader.js";
import { getActiveAccounts } from "./db/queries.js";
const log = childLogger("manual-upload");
export async function processManualUpload(uploadId: string): Promise<void> {
log.info({ uploadId }, "Processing manual upload");
const upload = await db.manualUpload.findUnique({
where: { id: uploadId },
include: { files: true },
});
if (!upload || upload.status !== "PENDING") {
log.warn({ uploadId }, "Manual upload not found or not pending");
return;
}
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "PROCESSING" },
});
try {
// Get destination channel
const destSetting = await db.globalSetting.findUnique({
where: { key: "destination_channel_id" },
});
if (!destSetting) throw new Error("No destination channel configured");
const destChannel = await db.telegramChannel.findFirst({
where: { id: destSetting.value, type: "DESTINATION", isActive: true },
});
if (!destChannel) throw new Error("Destination channel not found or inactive");
// Get a TDLib client (use first active account)
const accounts = await getActiveAccounts();
const account = accounts[0];
if (!account) throw new Error("No authenticated Telegram account available");
const { client } = await createTdlibClient({ id: account.id, phone: account.phone });
try {
const packageIds: string[] = [];
for (const file of upload.files) {
try {
const filePath = file.filePath;
const fileName = file.fileName;
const fileSize = file.fileSize;
log.info({ fileName, fileSize: Number(fileSize) }, "Processing file");
// Determine archive type
let archiveType: "ZIP" | "RAR" | "SEVEN_Z" | "DOCUMENT" = "DOCUMENT";
const ext = fileName.toLowerCase();
if (ext.endsWith(".zip")) archiveType = "ZIP";
else if (ext.endsWith(".rar")) archiveType = "RAR";
else if (ext.endsWith(".7z")) archiveType = "SEVEN_Z";
// Hash the file
const contentHash = await hashParts([filePath]);
// Check for duplicates
const existing = await db.package.findFirst({
where: { contentHash, destMessageId: { not: null } },
select: { id: true },
});
if (existing) {
log.info({ fileName, contentHash }, "Duplicate file, skipping upload");
await db.manualUploadFile.update({
where: { id: file.id },
data: { packageId: existing.id },
});
packageIds.push(existing.id);
continue;
}
// Read archive metadata
let entries: {
path: string;
fileName: string;
extension: string | null;
compressedSize: bigint;
uncompressedSize: bigint;
crc32: string | null;
}[] = [];
try {
if (archiveType === "ZIP") entries = await readZipCentralDirectory([filePath]);
else if (archiveType === "RAR") entries = await readRarContents(filePath);
else if (archiveType === "SEVEN_Z") entries = await read7zContents(filePath);
} catch {
log.debug({ fileName }, "Could not read archive metadata");
}
// Split if needed
const MAX_UPLOAD_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n;
let uploadPaths = [filePath];
if (fileSize > MAX_UPLOAD_SIZE) {
uploadPaths = await byteLevelSplit(filePath);
}
// Upload to Telegram
const destResult = await uploadToChannel(
client,
destChannel.telegramId,
uploadPaths
);
// Create package record
const pkg = await db.package.create({
data: {
contentHash,
fileName,
fileSize,
archiveType,
sourceChannelId: destChannel.id,
sourceMessageId: destResult.messageId,
destChannelId: destChannel.id,
destMessageId: destResult.messageId,
destMessageIds: destResult.messageIds,
isMultipart: uploadPaths.length > 1,
partCount: uploadPaths.length,
fileCount: entries.length,
files: entries.length > 0 ? { create: entries } : undefined,
},
});
await db.manualUploadFile.update({
where: { id: file.id },
data: { packageId: pkg.id },
});
packageIds.push(pkg.id);
log.info({ fileName, packageId: pkg.id }, "File processed and uploaded");
// Clean up split files (but not the original)
if (uploadPaths.length > 1) {
for (const splitPath of uploadPaths) {
if (splitPath !== filePath) {
await rm(splitPath, { force: true }).catch(() => {});
}
}
}
} catch (fileErr) {
log.error({ err: fileErr, fileName: file.fileName }, "Failed to process file");
}
}
// Group packages if multiple files
if (packageIds.length >= 2) {
const groupName =
upload.groupName ?? upload.files[0].fileName.replace(/\.[^.]+$/, "");
const group = await db.packageGroup.create({
data: {
name: groupName,
sourceChannelId: destChannel.id,
groupingSource: "MANUAL",
},
});
await db.package.updateMany({
where: { id: { in: packageIds } },
data: { packageGroupId: group.id },
});
log.info(
{ groupId: group.id, groupName, packageCount: packageIds.length },
"Created group for uploaded files"
);
}
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "COMPLETED", completedAt: new Date() },
});
log.info(
{ uploadId, fileCount: upload.files.length, packageCount: packageIds.length },
"Manual upload completed"
);
} finally {
await closeTdlibClient(client);
}
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
log.error({ err, uploadId }, "Manual upload failed");
await db.manualUpload.update({
where: { id: uploadId },
data: { status: "FAILED", errorMessage: message },
});
}
// Clean up uploaded files
try {
const uploadDir = path.join("/data/uploads", uploadId);
await rm(uploadDir, { recursive: true, force: true });
} catch {
// Best-effort cleanup
}
}

View File

@@ -63,7 +63,7 @@ export async function rebuildPackageDatabase(
} }
const account = accounts[0]; const account = accounts[0];
const client = await createTdlibClient({ const { client } = await createTdlibClient({
id: account.id, id: account.id,
phone: account.phone, phone: account.phone,
}); });

View File

@@ -63,7 +63,7 @@ export async function recoverIncompleteUploads(): Promise<void> {
let client: Client | undefined; let client: Client | undefined;
try { try {
client = await createTdlibClient({ id: account.id, phone: account.phone }); ({ client } = await createTdlibClient({ id: account.id, phone: account.phone }));
// Load the chat list so TDLib can resolve chat IDs // Load the chat list so TDLib can resolve chat IDs
try { try {

View File

@@ -1,8 +1,9 @@
import { config } from "./util/config.js"; import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js"; import { childLogger } from "./util/logger.js";
import { withTdlibMutex } from "./util/mutex.js"; import { withTdlibMutex, forceReleaseMutex } from "./util/mutex.js";
import { getActiveAccounts, getPendingAccounts } from "./db/queries.js"; import { getActiveAccounts, getPendingAccounts } from "./db/queries.js";
import { runWorkerForAccount, authenticateAccount } from "./worker.js"; import { runWorkerForAccount, authenticateAccount } from "./worker.js";
import { runIntegrityAudit } from "./audit.js";
const log = childLogger("scheduler"); const log = childLogger("scheduler");
@@ -23,8 +24,8 @@ const CYCLE_TIMEOUT_MS = (parseInt(process.env.WORKER_CYCLE_TIMEOUT_MINUTES ?? "
* 1. Authenticate any PENDING accounts (triggers SMS code flow + auto-fetch channels) * 1. Authenticate any PENDING accounts (triggers SMS code flow + auto-fetch channels)
* 2. Process all active AUTHENTICATED accounts for ingestion * 2. Process all active AUTHENTICATED accounts for ingestion
* *
* All TDLib operations are wrapped in the mutex to ensure only one client * Each account's TDLib operations are wrapped in a per-key mutex so different
* runs at a time (also shared with the fetch listener for on-demand requests). * accounts run concurrently while the same account is still serialized.
* *
* The cycle has a configurable timeout (WORKER_CYCLE_TIMEOUT_MINUTES, default 4h). * The cycle has a configurable timeout (WORKER_CYCLE_TIMEOUT_MINUTES, default 4h).
* Once the timeout elapses, no new accounts will be started but any in-progress * Once the timeout elapses, no new accounts will be started but any in-progress
@@ -54,7 +55,7 @@ async function runCycle(): Promise<void> {
log.warn("Cycle timeout reached during authentication phase, stopping"); log.warn("Cycle timeout reached during authentication phase, stopping");
break; break;
} }
await withTdlibMutex(`auth:${account.phone}`, () => await withTdlibMutex(account.phone, `auth:${account.phone}`, () =>
authenticateAccount(account) authenticateAccount(account)
); );
} }
@@ -70,23 +71,54 @@ async function runCycle(): Promise<void> {
log.info({ accountCount: accounts.length }, "Processing accounts"); log.info({ accountCount: accounts.length }, "Processing accounts");
for (const account of accounts) { const results = await Promise.allSettled(
if (Date.now() - cycleStart > CYCLE_TIMEOUT_MS) { accounts.map((account) => {
log.warn( let timer: ReturnType<typeof setTimeout>;
{ elapsed: Math.round((Date.now() - cycleStart) / 60_000), timeoutMinutes: CYCLE_TIMEOUT_MS / 60_000 }, return Promise.race([
"Cycle timeout reached, skipping remaining accounts" withTdlibMutex(account.phone, `ingest:${account.phone}`, () =>
runWorkerForAccount(account)
),
new Promise<never>((_, reject) => {
timer = setTimeout(
() => reject(new Error(`Account ${account.phone} ingestion timed out after ${CYCLE_TIMEOUT_MS / 60_000}min`)),
CYCLE_TIMEOUT_MS
);
}),
]).finally(() => clearTimeout(timer));
})
);
for (let i = 0; i < results.length; i++) {
if (results[i].status === "rejected") {
const reason = (results[i] as PromiseRejectedResult).reason;
log.error(
{ phone: accounts[i].phone, err: reason },
"Account ingestion failed"
); );
break; // If the cycle timed out, force-release the mutex so the next cycle
// (or other operations like fetch-channels) can proceed immediately
// instead of waiting 30 minutes for the mutex timeout.
const errMsg = reason instanceof Error ? reason.message : String(reason);
if (errMsg.includes("timed out") || errMsg.includes("mutex wait timeout")) {
forceReleaseMutex(accounts[i].phone);
}
} }
await withTdlibMutex(`ingest:${account.phone}`, () =>
runWorkerForAccount(account)
);
} }
log.info( log.info(
{ elapsed: Math.round((Date.now() - cycleStart) / 1000) }, { elapsed: Math.round((Date.now() - cycleStart) / 1000) },
"Ingestion cycle complete" "Ingestion cycle complete"
); );
// Run integrity audit after all accounts are processed
try {
const auditResult = await runIntegrityAudit();
if (auditResult.issues > 0) {
log.info({ ...auditResult }, "Integrity audit found issues");
}
} catch (auditErr) {
log.warn({ err: auditErr }, "Integrity audit failed");
}
} catch (err) { } catch (err) {
log.error({ err }, "Ingestion cycle failed"); log.error({ err }, "Ingestion cycle failed");
} finally { } finally {

View File

@@ -6,6 +6,7 @@ import { childLogger } from "../util/logger.js";
import { import {
updateAccountAuthState, updateAccountAuthState,
getAccountAuthCode, getAccountAuthCode,
updateAccountPremiumStatus,
} from "../db/queries.js"; } from "../db/queries.js";
const log = childLogger("tdlib-client"); const log = childLogger("tdlib-client");
@@ -27,7 +28,7 @@ interface AccountConfig {
*/ */
export async function createTdlibClient( export async function createTdlibClient(
account: AccountConfig account: AccountConfig
): Promise<Client> { ): Promise<{ client: Client; isPremium: boolean }> {
const dbPath = path.join(config.tdlibStateDir, account.id); const dbPath = path.join(config.tdlibStateDir, account.id);
const client = createClient({ const client = createClient({
@@ -78,7 +79,30 @@ export async function createTdlibClient(
await updateAccountAuthState(account.id, "AUTHENTICATED"); await updateAccountAuthState(account.id, "AUTHENTICATED");
log.info({ accountId: account.id }, "TDLib client authenticated"); log.info({ accountId: account.id }, "TDLib client authenticated");
return client;
let isPremium = false;
try {
const me = await client.invoke({ _: "getMe" }) as { is_premium?: boolean };
isPremium = me.is_premium ?? false;
await updateAccountPremiumStatus(account.id, isPremium);
log.info({ accountId: account.id, isPremium }, "Account Premium status detected");
} catch (err) {
log.warn({ err, accountId: account.id }, "Could not detect Premium status, defaulting to false");
}
client.on("update", (update: unknown) => {
const u = update as { _?: string; is_upload?: boolean };
if (u?._ === "updateSpeedLimitNotification") {
log.warn(
{ accountId: account.id, isUpload: u.is_upload },
u.is_upload
? "Upload speed limited by Telegram (account is not Premium)"
: "Download speed limited by Telegram (account is not Premium)"
);
}
});
return { client, isPremium };
} catch (err) { } catch (err) {
log.error({ err, accountId: account.id }, "TDLib authentication failed"); log.error({ err, accountId: account.id }, "TDLib authentication failed");
await updateAccountAuthState(account.id, "EXPIRED"); await updateAccountAuthState(account.id, "EXPIRED");

View File

@@ -39,6 +39,7 @@ interface TdMessage {
id: number; id: number;
date: number; date: number;
media_album_id?: string; media_album_id?: string;
reply_to_message_id?: number;
content: { content: {
_: string; _: string;
document?: { document?: {
@@ -78,6 +79,8 @@ export interface ChannelScanResult {
archives: TelegramMessage[]; archives: TelegramMessage[];
photos: TelegramPhoto[]; photos: TelegramPhoto[];
totalScanned: number; totalScanned: number;
/** Highest message ID seen during scan (for watermark, even when no archives found). */
maxScannedMessageId: bigint | null;
} }
export type ScanProgressCallback = (messagesScanned: number) => void; export type ScanProgressCallback = (messagesScanned: number) => void;
@@ -157,6 +160,7 @@ export async function getChannelMessages(
const archives: TelegramMessage[] = []; const archives: TelegramMessage[] = [];
const photos: TelegramPhoto[] = []; const photos: TelegramPhoto[] = [];
const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null; const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null;
let maxScannedMessageId: bigint | null = null;
// Open the chat so TDLib can access it // Open the chat so TDLib can access it
try { try {
@@ -203,6 +207,12 @@ export async function getChannelMessages(
totalScanned += result.messages.length; totalScanned += result.messages.length;
// Track highest message ID (first message in batch = newest, since results are newest-first)
const batchMaxId = BigInt(result.messages[0].id);
if (maxScannedMessageId === null || batchMaxId > maxScannedMessageId) {
maxScannedMessageId = batchMaxId;
}
for (const msg of result.messages) { for (const msg of result.messages) {
// Check for archive documents // Check for archive documents
const doc = msg.content?.document; const doc = msg.content?.document;
@@ -216,6 +226,8 @@ export async function getChannelMessages(
fileSize: BigInt(doc.document.size), fileSize: BigInt(doc.document.size),
date: new Date(msg.date * 1000), date: new Date(msg.date * 1000),
mediaAlbumId: msg.media_album_id && msg.media_album_id !== "0" ? msg.media_album_id : undefined, mediaAlbumId: msg.media_album_id && msg.media_album_id !== "0" ? msg.media_album_id : undefined,
replyToMessageId: msg.reply_to_message_id ? BigInt(msg.reply_to_message_id) : undefined,
caption: msg.content?.caption?.text || undefined,
}); });
continue; continue;
} }
@@ -243,6 +255,11 @@ export async function getChannelMessages(
fromMessageId = result.messages[result.messages.length - 1].id; fromMessageId = result.messages[result.messages.length - 1].id;
if (result.messages.length < Math.min(limit, 100)) break; if (result.messages.length < Math.min(limit, 100)) break;
// Early exit: searchChatMessages returns newest-first. Once the oldest
// message on this page is at or below the boundary, all remaining pages
// are even older — no new messages exist, stop scanning immediately.
if (boundary && fromMessageId <= boundary) break;
await sleep(config.apiDelayMs); await sleep(config.apiDelayMs);
} }
} }
@@ -263,6 +280,7 @@ export async function getChannelMessages(
archives: archives.reverse(), archives: archives.reverse(),
photos: photos.reverse(), photos: photos.reverse(),
totalScanned, totalScanned,
maxScannedMessageId,
}; };
} }

View File

@@ -178,6 +178,7 @@ export async function getTopicMessages(
const archives: TelegramMessage[] = []; const archives: TelegramMessage[] = [];
const photos: TelegramPhoto[] = []; const photos: TelegramPhoto[] = [];
const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null; const boundary = lastProcessedMessageId ? Number(lastProcessedMessageId) : null;
let maxScannedMessageId: bigint | null = null;
let currentFromId = 0; let currentFromId = 0;
let totalScanned = 0; let totalScanned = 0;
@@ -239,6 +240,12 @@ export async function getTopicMessages(
totalScanned += result.messages.length; totalScanned += result.messages.length;
// Track highest message ID (first message = newest, since results are newest-first)
const batchMaxId = BigInt(result.messages[0].id);
if (maxScannedMessageId === null || batchMaxId > maxScannedMessageId) {
maxScannedMessageId = batchMaxId;
}
for (const msg of result.messages) { for (const msg of result.messages) {
// Check for archive documents // Check for archive documents
const doc = msg.content?.document; const doc = msg.content?.document;
@@ -302,6 +309,7 @@ export async function getTopicMessages(
archives: archives.reverse(), archives: archives.reverse(),
photos: photos.reverse(), photos: photos.reverse(),
totalScanned, totalScanned,
maxScannedMessageId,
}; };
} }

View File

@@ -7,6 +7,18 @@ import { withFloodWait, extractFloodWaitSeconds } from "../util/retry.js";
const log = childLogger("upload"); const log = childLogger("upload");
/**
* Custom error class to distinguish upload stalls from other errors.
* When consecutive stalls occur, the caller can use this signal to
* recreate the TDLib client (whose event stream may have degraded).
*/
export class UploadStallError extends Error {
constructor(message: string) {
super(message);
this.name = "UploadStallError";
}
}
export interface UploadResult { export interface UploadResult {
messageId: bigint; messageId: bigint;
messageIds: bigint[]; messageIds: bigint[];
@@ -109,13 +121,21 @@ async function sendWithRetry(
// Stall or timeout — retry with a cooldown // Stall or timeout — retry with a cooldown
const errMsg = err instanceof Error ? err.message : ""; const errMsg = err instanceof Error ? err.message : "";
if ((errMsg.includes("stalled") || errMsg.includes("timed out")) && !isLastAttempt) { if (errMsg.includes("stalled") || errMsg.includes("timed out")) {
log.warn( if (!isLastAttempt) {
{ fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES }, log.warn(
"Upload stalled/timed out — retrying" { fileName, attempt: attempt + 1, maxRetries: MAX_UPLOAD_RETRIES },
"Upload stalled/timed out — retrying"
);
await sleep(10_000);
continue;
}
// All stall retries exhausted — throw UploadStallError so the caller
// knows the TDLib client's event stream is likely degraded and can
// recreate the client before continuing.
throw new UploadStallError(
`Upload stalled after ${MAX_UPLOAD_RETRIES} retries for ${fileName}`
); );
await sleep(10_000);
continue;
} }
throw err; throw err;
@@ -166,8 +186,10 @@ async function sendAndWaitForUpload(
} }
}, timeoutMs); }, timeoutMs);
// Stall detection: no progress for 5 minutes after upload started → reject // Stall detection: no progress for 3 minutes after upload started → reject
const STALL_TIMEOUT_MS = 5 * 60_000; // (reduced from 5min — once data is fully sent, confirmation should arrive quickly;
// a 3min silence strongly indicates a degraded TDLib event stream)
const STALL_TIMEOUT_MS = 3 * 60_000;
const stallChecker = setInterval(() => { const stallChecker = setInterval(() => {
if (settled || !uploadStarted) return; if (settled || !uploadStarted) return;
const stallMs = Date.now() - lastProgressTime; const stallMs = Date.now() - lastProgressTime;

View File

@@ -10,6 +10,8 @@ export const config = {
/** Maximum file part size for Telegram upload (in MiB). Default 1950 (under 2GB non-Premium limit). /** Maximum file part size for Telegram upload (in MiB). Default 1950 (under 2GB non-Premium limit).
* Set to 3900 for Premium accounts (under 4GB limit). */ * Set to 3900 for Premium accounts (under 4GB limit). */
maxPartSizeMB: parseInt(process.env.MAX_PART_SIZE_MB ?? "1950", 10), maxPartSizeMB: parseInt(process.env.MAX_PART_SIZE_MB ?? "1950", 10),
/** Time window for auto-grouping ungrouped packages from the same channel (minutes). 0 = disabled. */
autoGroupTimeWindowMinutes: parseInt(process.env.AUTO_GROUP_TIME_WINDOW_MINUTES ?? "5", 10),
/** Maximum jitter added to scheduler interval (in minutes) */ /** Maximum jitter added to scheduler interval (in minutes) */
jitterMinutes: 5, jitterMinutes: 5,
/** Maximum time span for multipart archive parts (in hours). 0 = no limit. */ /** Maximum time span for multipart archive parts (in hours). 0 = no limit. */

View File

@@ -2,39 +2,66 @@ import { childLogger } from "./logger.js";
const log = childLogger("mutex"); const log = childLogger("mutex");
let locked = false;
let holder = "";
const queue: Array<{ resolve: () => void; reject: (err: Error) => void; label: string }> = [];
/**
* Maximum time to wait for the TDLib mutex (ms).
* If the mutex is not available within this time, the operation is rejected.
* Default: 30 minutes (long enough for large downloads, short enough to detect hangs).
*/
const MUTEX_WAIT_TIMEOUT_MS = 30 * 60 * 1000; const MUTEX_WAIT_TIMEOUT_MS = 30 * 60 * 1000;
const locks = new Map<string, boolean>();
const holders = new Map<string, string>();
const queues = new Map<
string,
Array<{ resolve: () => void; reject: (err: Error) => void; label: string }>
>();
/** /**
* Ensures only one TDLib client runs at a time across the entire worker process. * Force-release a stuck mutex.
* Both the scheduler (auth, ingestion) and the fetch listener acquire this * This should only be called when the holder is known to be stuck (e.g. after
* before creating any TDLib client. * a cycle timeout). It releases the lock and lets the next queued waiter proceed.
*/
export function forceReleaseMutex(key: string): void {
if (!locks.has(key)) return;
const holder = holders.get(key);
log.warn({ key, holder }, "Force-releasing stuck TDLib mutex");
locks.delete(key);
holders.delete(key);
const next = queues.get(key)?.shift();
if (next) {
log.info({ key, next: next.label }, "TDLib mutex force-released to next waiter");
next.resolve();
} else {
queues.delete(key);
log.info({ key }, "TDLib mutex force-released (no waiters)");
}
}
/**
* Ensures only one TDLib operation runs at a time FOR THE SAME KEY.
* Different keys run concurrently — this allows two accounts to ingest in parallel
* while still preventing concurrent use of the same account's TDLib state dir.
* *
* Includes a wait timeout to prevent indefinite blocking if the current holder hangs. * key: the account phone number for account-specific ops (auth, ingest),
* or 'global' for ops that don't belong to a specific account.
* label: human-readable name for logging.
*/ */
export async function withTdlibMutex<T>( export async function withTdlibMutex<T>(
key: string,
label: string, label: string,
fn: () => Promise<T> fn: () => Promise<T>
): Promise<T> { ): Promise<T> {
if (locked) { if (locks.get(key)) {
log.info({ waiting: label, holder }, "Waiting for TDLib mutex"); log.info({ waiting: label, key, holder: holders.get(key) }, "Waiting for TDLib mutex");
await new Promise<void>((resolve, reject) => { await new Promise<void>((resolve, reject) => {
const timer = setTimeout(() => { const timer = setTimeout(() => {
const idx = queue.indexOf(entry); const q = queues.get(key) ?? [];
const idx = q.indexOf(entry);
if (idx !== -1) { if (idx !== -1) {
queue.splice(idx, 1); q.splice(idx, 1);
reject(new Error( reject(
`TDLib mutex wait timeout after ${MUTEX_WAIT_TIMEOUT_MS / 60_000}min ` + new Error(
`(waiting: ${label}, holder: ${holder})` `TDLib mutex wait timeout after ${MUTEX_WAIT_TIMEOUT_MS / 60_000}min ` +
)); `(waiting: ${label}, key: ${key}, holder: ${holders.get(key)})`
)
);
} }
}, MUTEX_WAIT_TIMEOUT_MS); }, MUTEX_WAIT_TIMEOUT_MS);
@@ -46,25 +73,28 @@ export async function withTdlibMutex<T>(
reject, reject,
label, label,
}; };
queue.push(entry);
if (!queues.has(key)) queues.set(key, []);
queues.get(key)!.push(entry);
}); });
} }
locked = true; locks.set(key, true);
holder = label; holders.set(key, label);
log.debug({ label }, "TDLib mutex acquired"); log.debug({ key, label }, "TDLib mutex acquired");
try { try {
return await fn(); return await fn();
} finally { } finally {
locked = false; locks.delete(key);
holder = ""; holders.delete(key);
const next = queue.shift(); const next = queues.get(key)?.shift();
if (next) { if (next) {
log.debug({ next: next.label }, "TDLib mutex releasing to next waiter"); log.debug({ key, next: next.label }, "TDLib mutex releasing to next waiter");
next.resolve(); next.resolve();
} else { } else {
log.debug({ label }, "TDLib mutex released"); queues.delete(key);
log.debug({ key, label }, "TDLib mutex released");
} }
} }
} }

View File

@@ -2,13 +2,14 @@ import path from "path";
import { unlink, readdir, mkdir, rm } from "fs/promises"; import { unlink, readdir, mkdir, rm } from "fs/promises";
import { config } from "./util/config.js"; import { config } from "./util/config.js";
import { childLogger } from "./util/logger.js"; import { childLogger } from "./util/logger.js";
import { tryAcquireLock, releaseLock } from "./db/locks.js"; import { tryAcquireLock, releaseLock, tryAcquireHashLock, releaseHashLock } from "./db/locks.js";
import { import {
getSourceChannelMappings, getSourceChannelMappings,
getGlobalDestinationChannel, getGlobalDestinationChannel,
packageExistsByHash, packageExistsByHash,
packageExistsBySourceMessage, packageExistsBySourceMessage,
createPackageWithFiles, createPackageStub,
updatePackageWithMetadata,
createIngestionRun, createIngestionRun,
completeIngestionRun, completeIngestionRun,
failIngestionRun, failIngestionRun,
@@ -46,8 +47,9 @@ import { readZipCentralDirectory } from "./archive/zip-reader.js";
import { readRarContents } from "./archive/rar-reader.js"; import { readRarContents } from "./archive/rar-reader.js";
import { read7zContents } from "./archive/sevenz-reader.js"; import { read7zContents } from "./archive/sevenz-reader.js";
import { byteLevelSplit, concatenateFiles } from "./archive/split.js"; import { byteLevelSplit, concatenateFiles } from "./archive/split.js";
import { uploadToChannel } from "./upload/channel.js"; import { uploadToChannel, UploadStallError } from "./upload/channel.js";
import { processAlbumGroups, type IndexedPackageRef } from "./grouping.js"; import { processAlbumGroups, processRuleBasedGroups, processTimeWindowGroups, processPatternGroups, processCreatorGroups, processZipPathGroups, processReplyChainGroups, processCaptionGroups, detectGroupingConflicts, type IndexedPackageRef } from "./grouping.js";
import { db } from "./db/client.js";
import type { TelegramAccount, TelegramChannel } from "@prisma/client"; import type { TelegramAccount, TelegramChannel } from "@prisma/client";
import type { Client } from "tdl"; import type { Client } from "tdl";
@@ -72,10 +74,10 @@ export async function authenticateAccount(
let client: Client | undefined; let client: Client | undefined;
try { try {
client = await createTdlibClient({ client = (await createTdlibClient({
id: account.id, id: account.id,
phone: account.phone, phone: account.phone,
}); })).client;
aLog.info("Authentication successful"); aLog.info("Authentication successful");
// Auto-fetch channels and create a fetch request result // Auto-fetch channels and create a fetch request result
@@ -130,7 +132,7 @@ export async function processFetchRequest(requestId: string): Promise<void> {
await updateFetchRequestStatus(requestId, "IN_PROGRESS"); await updateFetchRequestStatus(requestId, "IN_PROGRESS");
aLog.info({ accountId: request.accountId }, "Processing fetch request"); aLog.info({ accountId: request.accountId }, "Processing fetch request");
const client = await createTdlibClient({ const { client } = await createTdlibClient({
id: request.account.id, id: request.account.id,
phone: request.account.phone, phone: request.account.phone,
}); });
@@ -284,6 +286,7 @@ interface PipelineContext {
client: Client; client: Client;
runId: string; runId: string;
accountId: string; accountId: string;
accountPhone: string;
channelTitle: string; channelTitle: string;
channel: TelegramChannel; channel: TelegramChannel;
destChannelTelegramId: bigint; destChannelTelegramId: bigint;
@@ -300,6 +303,9 @@ interface PipelineContext {
/** Forum topic ID (null for non-forum). */ /** Forum topic ID (null for non-forum). */
sourceTopicId: bigint | null; sourceTopicId: bigint | null;
accountLog: ReturnType<typeof childLogger>; accountLog: ReturnType<typeof childLogger>;
maxUploadSize: bigint;
/** How many consecutive upload stalls have occurred (resets on success). */
consecutiveStalls: number;
} }
/** /**
@@ -335,10 +341,14 @@ export async function runWorkerForAccount(
currentStep: "connecting", currentStep: "connecting",
}); });
const client = await createTdlibClient({ // Use let so the client can be replaced on TDLib recreation after stalls
let { client, isPremium } = await createTdlibClient({
id: account.id, id: account.id,
phone: account.phone, phone: account.phone,
}); });
const maxUploadSize = isPremium
? 3950n * 1024n * 1024n
: BigInt(config.maxPartSizeMB) * 1024n * 1024n;
// Load all chats into TDLib's local cache using loadChats (the recommended API). // Load all chats into TDLib's local cache using loadChats (the recommended API).
// Without this, getChat/searchChatMessages fail with "Chat not found". // Without this, getChat/searchChatMessages fail with "Chat not found".
@@ -442,6 +452,7 @@ export async function runWorkerForAccount(
client, client,
runId: activeRunId, runId: activeRunId,
accountId: account.id, accountId: account.id,
accountPhone: account.phone,
channelTitle: channel.title, channelTitle: channel.title,
channel, channel,
destChannelTelegramId: destChannel.telegramId, destChannelTelegramId: destChannel.telegramId,
@@ -451,6 +462,8 @@ export async function runWorkerForAccount(
topicCreator: null, topicCreator: null,
sourceTopicId: null, sourceTopicId: null,
accountLog, accountLog,
maxUploadSize,
consecutiveStalls: 0,
}; };
if (forum) { if (forum) {
@@ -525,6 +538,15 @@ export async function runWorkerForAccount(
{ channelId: channel.id, topic: topic.name, totalScanned: scanResult.totalScanned }, { channelId: channel.id, topic: topic.name, totalScanned: scanResult.totalScanned },
"No new archives in topic" "No new archives in topic"
); );
// Still advance topic watermark so we don't re-scan these messages next cycle
if (scanResult.maxScannedMessageId) {
await upsertTopicProgress(
mapping.id,
topic.topicId,
topic.name,
scanResult.maxScannedMessageId
);
}
continue; continue;
} }
@@ -539,14 +561,17 @@ export async function runWorkerForAccount(
pipelineCtx.channelTitle = `${channel.title} ${topic.name}`; pipelineCtx.channelTitle = `${channel.title} ${topic.name}`;
const maxProcessedId = await processArchiveSets(pipelineCtx, scanResult, run.id, progress?.lastProcessedMessageId); const maxProcessedId = await processArchiveSets(pipelineCtx, scanResult, run.id, progress?.lastProcessedMessageId);
// Sync client back in case it was recreated during upload stall recovery
client = pipelineCtx.client;
// Only advance progress to the highest successfully processed message // Advance progress: use archive watermark if available, fall back to scan watermark
if (maxProcessedId) { const topicWatermark = maxProcessedId ?? scanResult.maxScannedMessageId;
if (topicWatermark) {
await upsertTopicProgress( await upsertTopicProgress(
mapping.id, mapping.id,
topic.topicId, topic.topicId,
topic.name, topic.name,
maxProcessedId topicWatermark
); );
} }
} catch (topicErr) { } catch (topicErr) {
@@ -596,6 +621,11 @@ export async function runWorkerForAccount(
if (scanResult.archives.length === 0) { if (scanResult.archives.length === 0) {
accountLog.info({ channelId: channel.id, title: channel.title, totalScanned: scanResult.totalScanned }, "No new archives in channel"); accountLog.info({ channelId: channel.id, title: channel.title, totalScanned: scanResult.totalScanned }, "No new archives in channel");
// Still advance watermark to highest scanned message so we don't
// re-scan these messages next cycle
if (scanResult.maxScannedMessageId) {
await updateLastProcessedMessage(mapping.id, scanResult.maxScannedMessageId);
}
continue; continue;
} }
@@ -610,10 +640,13 @@ export async function runWorkerForAccount(
pipelineCtx.channelTitle = channel.title; pipelineCtx.channelTitle = channel.title;
const maxProcessedId = await processArchiveSets(pipelineCtx, scanResult, run.id, mapping.lastProcessedMessageId); const maxProcessedId = await processArchiveSets(pipelineCtx, scanResult, run.id, mapping.lastProcessedMessageId);
// Sync client back in case it was recreated during upload stall recovery
client = pipelineCtx.client;
// Only advance progress to the highest successfully processed message // Advance progress: use archive watermark if available, fall back to scan watermark
if (maxProcessedId) { const channelWatermark = maxProcessedId ?? scanResult.maxScannedMessageId;
await updateLastProcessedMessage(mapping.id, maxProcessedId); if (channelWatermark) {
await updateLastProcessedMessage(mapping.id, channelWatermark);
} }
} }
} catch (channelErr) { } catch (channelErr) {
@@ -753,12 +786,68 @@ async function processArchiveSets(
if (setMaxId > (maxProcessedId ?? 0n)) { if (setMaxId > (maxProcessedId ?? 0n)) {
maxProcessedId = setMaxId; maxProcessedId = setMaxId;
} }
// Reset stall counter on any successful upload
ctx.consecutiveStalls = 0;
} catch (setErr) { } catch (setErr) {
// If a set fails, do NOT advance the watermark past it // If a set fails, do NOT advance the watermark past it
accountLog.warn( accountLog.warn(
{ err: setErr, baseName: archiveSets[setIdx].baseName }, { err: setErr, baseName: archiveSets[setIdx].baseName },
"Archive set failed, watermark will not advance past this set" "Archive set failed, watermark will not advance past this set"
); );
// ── TDLib client recreation on repeated upload stalls ──
// When the TDLib event stream degrades, uploads complete (bytes sent)
// but confirmations never arrive. Retrying with the same broken client
// is futile. Recreate the client to get a fresh connection.
if (setErr instanceof UploadStallError) {
ctx.consecutiveStalls++;
accountLog.warn(
{ consecutiveStalls: ctx.consecutiveStalls },
"Upload stall detected — TDLib event stream may be degraded"
);
// After 1 stalled set (= 3 failed retry attempts already), recreate the client
if (ctx.consecutiveStalls >= 1) {
accountLog.info("Recreating TDLib client after consecutive upload stalls");
try {
await closeTdlibClient(ctx.client);
} catch (closeErr) {
accountLog.warn({ err: closeErr }, "Error closing stale TDLib client");
}
try {
const { client: newClient } = await createTdlibClient({
id: ctx.accountId,
phone: ctx.accountPhone,
});
ctx.client = newClient;
// Reload chats so the new client can access channels
try {
for (let page = 0; page < 500; page++) {
await newClient.invoke({
_: "loadChats",
chat_list: { _: "chatListMain" },
limit: 100,
});
}
} catch {
// 404 = all loaded (expected)
}
ctx.consecutiveStalls = 0;
accountLog.info("TDLib client recreated successfully — continuing ingestion");
} catch (recreateErr) {
accountLog.error(
{ err: recreateErr },
"Failed to recreate TDLib client — aborting remaining uploads"
);
break;
}
}
}
// Record the failure for visibility in the UI // Record the failure for visibility in the UI
try { try {
const archiveSet = archiveSets[setIdx]; const archiveSet = archiveSets[setIdx];
@@ -776,6 +865,22 @@ async function processArchiveSets(
partCount: archiveSet.parts.length, partCount: archiveSet.parts.length,
accountId: ctx.accountId, accountId: ctx.accountId,
}); });
// Also create a persistent notification
await db.systemNotification.create({
data: {
type: inferSkipReason(errMsg) === "UPLOAD_FAILED" ? "UPLOAD_FAILED" : "DOWNLOAD_FAILED",
severity: "WARNING",
title: `Failed to process ${archiveSet.parts[0].fileName}`,
message: errMsg,
context: {
fileName: archiveSet.parts[0].fileName,
sourceChannelId: ctx.channel.id,
sourceMessageId: Number(archiveSet.parts[0].id),
channelTitle: ctx.channelTitle,
reason: inferSkipReason(errMsg),
},
},
});
} catch { } catch {
// Best-effort — don't fail the run if skip recording fails // Best-effort — don't fail the run if skip recording fails
} }
@@ -790,6 +895,38 @@ async function processArchiveSets(
indexedPackageRefs, indexedPackageRefs,
scanResult.photos scanResult.photos
); );
// Auto-grouping passes (gated by per-channel flag)
const channelRecord = await db.telegramChannel.findUnique({
where: { id: channel.id },
select: { autoGroupEnabled: true },
});
if (channelRecord?.autoGroupEnabled !== false) {
// Learned rule-based grouping (from manual overrides)
await processRuleBasedGroups(channel.id, indexedPackageRefs);
// Time-window grouping for remaining ungrouped packages
await processTimeWindowGroups(channel.id, indexedPackageRefs);
// Pattern-based grouping (date patterns, project slugs)
await processPatternGroups(channel.id, indexedPackageRefs);
// Creator-based grouping (3+ files from same creator)
await processCreatorGroups(channel.id, indexedPackageRefs);
// ZIP path prefix grouping (shared root folder inside archives)
await processZipPathGroups(channel.id, indexedPackageRefs);
// Reply chain grouping (messages replying to same root)
await processReplyChainGroups(channel.id, indexedPackageRefs);
// Caption fuzzy match grouping
await processCaptionGroups(channel.id, indexedPackageRefs);
}
// Check for potential grouping conflicts
await detectGroupingConflicts(channel.id, indexedPackageRefs);
} }
return maxProcessedId; return maxProcessedId;
@@ -978,6 +1115,35 @@ async function processOneArchiveSet(
return null; return null;
} }
// ── Hash lock: prevent concurrent workers racing on shared-channel archives ──
const hashLockAcquired = await tryAcquireHashLock(contentHash);
if (!hashLockAcquired) {
counters.zipsDuplicate++;
accountLog.info(
{ fileName: archiveName, hash: contentHash.slice(0, 16) },
"Hash lock held by another worker — skipping concurrent duplicate"
);
return null;
}
let entries: { path: string; fileName: string; extension: string | null; compressedSize: bigint; uncompressedSize: bigint; crc32: string | null }[] = [];
let creator: string | null = null;
const tags: string[] = [];
let stub: { id: string } | null = null;
try {
// Re-check after acquiring lock: another worker may have finished between
// the first check above and this point.
const existsAfterLock = await packageExistsByHash(contentHash);
if (existsAfterLock) {
counters.zipsDuplicate++;
accountLog.debug(
{ fileName: archiveName, hash: contentHash.slice(0, 16) },
"Duplicate detected after acquiring hash lock — skipping"
);
return null;
}
// ── Reading metadata ── // ── Reading metadata ──
await updateRunActivity(runId, { await updateRunActivity(runId, {
currentActivity: `Reading file list from ${archiveName}`, currentActivity: `Reading file list from ${archiveName}`,
@@ -988,7 +1154,6 @@ async function processOneArchiveSet(
totalFiles: totalSets, totalFiles: totalSets,
}); });
let entries: { path: string; fileName: string; extension: string | null; compressedSize: bigint; uncompressedSize: bigint; crc32: string | null }[] = [];
try { try {
if (archiveSet.type === "ZIP") { if (archiveSet.type === "ZIP") {
entries = await readZipCentralDirectory(tempPaths); entries = await readZipCentralDirectory(tempPaths);
@@ -1020,7 +1185,7 @@ async function processOneArchiveSet(
(sum, p) => sum + p.fileSize, (sum, p) => sum + p.fileSize,
0n 0n
); );
const MAX_UPLOAD_SIZE = BigInt(config.maxPartSizeMB) * 1024n * 1024n; const MAX_UPLOAD_SIZE = ctx.maxUploadSize;
const hasOversizedPart = archiveSet.parts.some((p) => p.fileSize > MAX_UPLOAD_SIZE); const hasOversizedPart = archiveSet.parts.some((p) => p.fileSize > MAX_UPLOAD_SIZE);
if (hasOversizedPart) { if (hasOversizedPart) {
@@ -1035,7 +1200,7 @@ async function processOneArchiveSet(
}); });
const concatPath = path.join(setDir, `${archiveSet.baseName}.concat`); const concatPath = path.join(setDir, `${archiveSet.baseName}.concat`);
await concatenateFiles(tempPaths, concatPath); await concatenateFiles(tempPaths, concatPath);
splitPaths = await byteLevelSplit(concatPath); splitPaths = await byteLevelSplit(concatPath, ctx.maxUploadSize);
uploadPaths = splitPaths; uploadPaths = splitPaths;
// Clean up the concat intermediate file // Clean up the concat intermediate file
await unlink(concatPath).catch(() => {}); await unlink(concatPath).catch(() => {});
@@ -1049,48 +1214,153 @@ async function processOneArchiveSet(
currentFileNum: setIdx + 1, currentFileNum: setIdx + 1,
totalFiles: totalSets, totalFiles: totalSets,
}); });
splitPaths = await byteLevelSplit(tempPaths[0]); splitPaths = await byteLevelSplit(tempPaths[0], ctx.maxUploadSize);
uploadPaths = splitPaths; uploadPaths = splitPaths;
} }
// ── Uploading ── // ── Hash verification after split ──
// Check if a prior run already uploaded this file (orphaned upload scenario: // If we split/repacked, verify the split parts hash matches the original
// file reached Telegram but DB write failed or worker crashed before indexing) if (splitPaths.length > 0) {
const existingUpload = await getUploadedPackageByHash(contentHash); const splitHash = await hashParts(splitPaths);
let destResult: { messageId: bigint; messageIds: bigint[] }; if (splitHash !== contentHash) {
accountLog.error(
if (existingUpload && existingUpload.destMessageId) { { fileName: archiveName, originalHash: contentHash, splitHash, parts: splitPaths.length },
accountLog.info( "Hash mismatch after split — file may be corrupted"
{ fileName: archiveName, destMessageId: Number(existingUpload.destMessageId) }, );
"Reusing existing upload (file already on destination channel)" // Record notification for visibility
); try {
destResult = { await db.systemNotification.create({
messageId: existingUpload.destMessageId, data: {
messageIds: existingUpload.destMessageIds?.length type: "HASH_MISMATCH",
? (existingUpload.destMessageIds as bigint[]) severity: "ERROR",
: [existingUpload.destMessageId], title: `Hash mismatch after splitting ${archiveName}`,
}; message: `Expected ${contentHash.slice(0, 16)}… but got ${splitHash.slice(0, 16)}… after splitting into ${splitPaths.length} parts`,
} else { context: {
const uploadLabel = uploadPaths.length > 1 fileName: archiveName,
? ` (${uploadPaths.length} parts)` originalHash: contentHash,
: ""; splitHash,
await updateRunActivity(runId, { partCount: splitPaths.length,
currentActivity: `Uploading ${archiveName} to archive channel${uploadLabel}`, sourceChannelId: channel.id,
currentStep: "uploading", },
currentChannel: channelTitle, },
currentFile: archiveName, });
currentFileNum: setIdx + 1, } catch {
totalFiles: totalSets, // Best-effort notification
}); }
throw new Error(`Hash mismatch after split for ${archiveName}: expected ${contentHash}, got ${splitHash}`);
destResult = await uploadToChannel( }
client, accountLog.debug(
destChannelTelegramId, { fileName: archiveName, hash: contentHash.slice(0, 16), parts: splitPaths.length },
uploadPaths "Split hash verified — matches original"
); );
} }
// ── Uploading ──
// Check if a prior run already uploaded this file (orphaned upload scenario:
// file reached Telegram but DB write failed or worker crashed before indexing)
const existingUpload = await getUploadedPackageByHash(contentHash);
let destResult: { messageId: bigint; messageIds: bigint[] };
if (existingUpload && existingUpload.destMessageId) {
accountLog.info(
{ fileName: archiveName, destMessageId: Number(existingUpload.destMessageId) },
"Reusing existing upload (file already on destination channel)"
);
destResult = {
messageId: existingUpload.destMessageId,
messageIds: existingUpload.destMessageIds?.length
? (existingUpload.destMessageIds as bigint[])
: [existingUpload.destMessageId],
};
} else {
const uploadLabel = uploadPaths.length > 1
? ` (${uploadPaths.length} parts)`
: "";
await updateRunActivity(runId, {
currentActivity: `Uploading ${archiveName} to archive channel${uploadLabel}`,
currentStep: "uploading",
currentChannel: channelTitle,
currentFile: archiveName,
currentFileNum: setIdx + 1,
totalFiles: totalSets,
});
destResult = await uploadToChannel(
client,
destChannelTelegramId,
uploadPaths
);
}
// ── Post-upload integrity check ──
// Verify the files on disk still match before we index
if (uploadPaths.length > 0 && !existingUpload) {
try {
const postUploadHash = await hashParts(uploadPaths);
if (splitPaths.length > 0) {
// Split files — hash should match the split hash (already verified above)
// No additional check needed since we verified split hash = original hash
} else if (postUploadHash !== contentHash) {
accountLog.error(
{ fileName: archiveName, originalHash: contentHash, postUploadHash },
"Hash changed between hashing and upload — possible disk corruption"
);
await db.systemNotification.create({
data: {
type: "HASH_MISMATCH",
severity: "ERROR",
title: `Post-upload hash mismatch: ${archiveName}`,
message: `Hash changed between download and upload. Original: ${contentHash.slice(0, 16)}…, post-upload: ${postUploadHash.slice(0, 16)}`,
context: { fileName: archiveName, originalHash: contentHash, postUploadHash, sourceChannelId: channel.id },
},
});
}
} catch {
// Best-effort — don't fail the ingestion
}
}
// ── Phase 1: Stub record — persisted immediately after upload ──
await deleteOrphanedPackageByHash(contentHash);
creator =
topicCreator ??
extractCreatorFromFileName(archiveName) ??
extractCreatorFromChannelTitle(channelTitle) ??
null;
if (channel.category) {
tags.push(channel.category);
}
stub = await createPackageStub({
contentHash,
fileName: archiveName,
fileSize: totalSize,
archiveType: archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type,
sourceChannelId: channel.id,
sourceMessageId: archiveSet.parts[0].id,
sourceTopicId,
destChannelId,
destMessageId: destResult.messageId,
destMessageIds: destResult.messageIds,
isMultipart: archiveSet.parts.length > 1 || uploadPaths.length > 1,
partCount: uploadPaths.length,
ingestionRunId,
creator,
tags,
});
counters.zipsIngested++;
await deleteSkippedPackage(channel.id, archiveSet.parts[0].id);
} finally {
await releaseHashLock(contentHash);
}
if (!stub) return null;
// ── Preview thumbnail ── // ── Preview thumbnail ──
// (moved here from before stub creation — lock is released, preview doesn't need it)
let previewData: Buffer | null = null; let previewData: Buffer | null = null;
let previewMsgId: bigint | null = null; let previewMsgId: bigint | null = null;
const matchedPhoto = previewMatches.get(archiveSet.baseName); const matchedPhoto = previewMatches.get(archiveSet.baseName);
@@ -1104,8 +1374,6 @@ async function processOneArchiveSet(
totalFiles: totalSets, totalFiles: totalSets,
}); });
previewData = await downloadPhotoThumbnail(client, matchedPhoto.fileId); previewData = await downloadPhotoThumbnail(client, matchedPhoto.fileId);
// Only set previewMsgId if we actually got the image data —
// otherwise the UI thinks there's a preview but the API returns 404
if (previewData) { if (previewData) {
previewMsgId = matchedPhoto.id; previewMsgId = matchedPhoto.id;
} }
@@ -1128,13 +1396,7 @@ async function processOneArchiveSet(
} }
} }
// ── Resolve creator: topic name > filename extraction > channel title > null ── // ── Phase 2: Update stub with file entries and preview ──
const creator = topicCreator
?? extractCreatorFromFileName(archiveName)
?? extractCreatorFromChannelTitle(channelTitle)
?? null;
// ── Indexing ──
await updateRunActivity(runId, { await updateRunActivity(runId, {
currentActivity: `Saving metadata for ${archiveName} (${entries.length} files)`, currentActivity: `Saving metadata for ${archiveName} (${entries.length} files)`,
currentStep: "indexing", currentStep: "indexing",
@@ -1144,41 +1406,12 @@ async function processOneArchiveSet(
totalFiles: totalSets, totalFiles: totalSets,
}); });
// Clean up any orphaned record (same hash but no dest upload) before creating await updatePackageWithMetadata(stub.id, {
await deleteOrphanedPackageByHash(contentHash); files: entries,
// Auto-inherit source channel category as initial tag
const tags: string[] = [];
if (channel.category) {
tags.push(channel.category);
}
const pkg = await createPackageWithFiles({
contentHash,
fileName: archiveName,
fileSize: totalSize,
archiveType: archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type,
sourceChannelId: channel.id,
sourceMessageId: archiveSet.parts[0].id,
sourceTopicId,
destChannelId,
destMessageId: destResult.messageId,
destMessageIds: destResult.messageIds,
isMultipart:
archiveSet.parts.length > 1 || uploadPaths.length > 1,
partCount: uploadPaths.length,
ingestionRunId,
creator,
tags,
previewData, previewData,
previewMsgId, previewMsgId,
files: entries,
}); });
counters.zipsIngested++;
// Clean up any prior skip record for this archive
await deleteSkippedPackage(channel.id, archiveSet.parts[0].id);
await updateRunActivity(runId, { await updateRunActivity(runId, {
currentActivity: `Ingested ${archiveName} (${entries.length} files indexed)`, currentActivity: `Ingested ${archiveName} (${entries.length} files indexed)`,
currentStep: "complete", currentStep: "complete",
@@ -1194,7 +1427,7 @@ async function processOneArchiveSet(
"Archive ingested" "Archive ingested"
); );
return pkg.id; return stub.id;
} finally { } finally {
// ALWAYS delete temp files and the set directory // ALWAYS delete temp files and the set directory
await deleteFiles([...tempPaths, ...splitPaths]); await deleteFiles([...tempPaths, ...splitPaths]);