docs: add package grouping design spec and implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-10 22:01:16 +00:00 · 2026-03-25 21:40:13 +01:00
parent d6386209be
commit 6eb7129637
4 changed files with 3410 additions and 0 deletions
--- a/docs/superpowers/plans/2026-03-24-search-indicators-size-limit-skipped-files.md
+++ b/docs/superpowers/plans/2026-03-24-search-indicators-size-limit-skipped-files.md
--- a/docs/superpowers/plans/2026-03-25-package-grouping.md
+++ b/docs/superpowers/plans/2026-03-25-package-grouping.md
--- a/docs/superpowers/specs/2026-03-24-search-indicators-size-limit-skipped-files-design.md
+++ b/docs/superpowers/specs/2026-03-24-search-indicators-size-limit-skipped-files-design.md
@@ -0,0 +1,241 @@
+# Design: Search Match Indicators, Size Limit Increase, Skipped/Failed Files Overview
+
+**Date:** 2026-03-24
+**Status:** Approved
+
+## Overview
+
+Three related improvements to the STL packages system:
+
+1. **Search match indicators** — Show which internal files matched a search query, with highlighted files in the drawer
+2. **Size limit increase** — Raise the ingestion limit from 4 GB to 200 GB so large multipart archives aren't skipped
+3. **Skipped/failed files overview** — Track and display archives that were skipped or failed, with retry capability
+
+---
+
+## Feature 1: Size Limit Increase
+
+### Change
+
+`worker/src/util/config.ts` line 6 — change default from `"4096"` to `"204800"`.
+
+One-line change. The split/upload pipeline already handles arbitrary sizes. The 2 GB per-part Telegram API limit is a separate hard-coded constant and stays as-is.
+
+### Impact
+
+- Archives up to 200 GB will now be attempted
+- Multipart archives where individual parts are under 2 GB (but total exceeds 4 GB) will no longer be skipped — these upload directly without any splitting
+- Single files over 2 GB are automatically split into 2 GB parts (existing behavior)
+- Temp disk usage during processing can now reach up to ~200 GB per archive
+
+---
+
+## Feature 2: Search Match Indicators
+
+### Backend Changes
+
+**File:** `src/lib/telegram/queries.ts` — `searchPackages()`
+
+When `searchIn` is `"files"` or `"both"`, change the PackageFile query from `distinct` to a **grouped count**:
+
+```typescript
+// Current: findMany with select: { packageId }, distinct: ["packageId"]
+// New: groupBy packageId with _count
+const fileMatches = await prisma.packageFile.groupBy({
+  by: ["packageId"],
+  where: {
+    OR: [
+      { fileName: { contains: q, mode: "insensitive" } },
+      { path: { contains: q, mode: "insensitive" } },
+    ],
+  },
+  _count: { _all: true },
+});
+```
+
+This returns `{ packageId: string, _count: { _all: number } }[]`.
+
+Note: `PackageRow` in `package-columns.tsx` mirrors `PackageListItem` and must also receive the two new fields.
+
+**File:** `src/lib/telegram/types.ts` — `PackageListItem`
+
+Add two fields:
+- `matchedFileCount: number` — how many files inside matched (0 if matched by package name only)
+- `matchedByContent: boolean` — true if any files inside matched
+
+### Frontend Changes
+
+**File:** `src/app/(app)/stls/page.tsx`
+
+Pass the search term to `StlTable` as a new prop.
+
+**File:** `src/app/(app)/stls/_components/stl-table.tsx`
+
+Pass search term to columns via TanStack Table column meta.
+
+**File:** `src/app/(app)/stls/_components/package-columns.tsx`
+
+When search is active and `matchedByContent` is true, render a clickable badge below the filename: e.g., "3 file matches". Clicking opens the `PackageFilesDrawer` with a `highlightTerm` prop set to the search term.
+
+**File:** `src/app/(app)/stls/_components/package-files-drawer.tsx`
+
+- Accept optional `highlightTerm: string` prop
+- Render full file tree as normal (all files visible)
+- Files whose `fileName` or `path` case-insensitively contains `highlightTerm` get a subtle highlight (amber/yellow background on the row)
+- Auto-expand folders that contain highlighted files
+- The drawer's own search input remains independent
+
+### Data Flow
+
+1. User types search term in STL table search input
+2. URL updates with `?search=value`, page reloads
+3. `page.tsx` calls `searchPackages()` with `searchIn: "both"`
+4. Query returns packages with `matchedFileCount` and `matchedByContent`
+5. Table renders "N file matches" badge on content-matched rows
+6. User clicks badge -> drawer opens with full tree, matching files highlighted
+7. Folders containing matches auto-expanded
+
+---
+
+## Feature 3: Skipped/Failed Files Overview
+
+### Database Schema
+
+New model in `prisma/schema.prisma`:
+
+```prisma
+enum SkipReason {
+  SIZE_LIMIT
+  DOWNLOAD_FAILED
+  EXTRACT_FAILED
+  UPLOAD_FAILED
+}
+
+model SkippedPackage {
+  id              String           @id @default(cuid())
+  fileName        String
+  fileSize        BigInt
+  reason          SkipReason
+  errorMessage    String?
+  sourceChannelId String
+  sourceChannel   TelegramChannel  @relation(fields: [sourceChannelId], references: [id], onDelete: Cascade)
+  sourceMessageId BigInt
+  sourceTopicId   BigInt?
+  isMultipart     Boolean          @default(false)
+  partCount       Int              @default(1)
+  accountId       String
+  account         TelegramAccount  @relation(fields: [accountId], references: [id], onDelete: Cascade)
+  createdAt       DateTime         @default(now())
+
+  @@unique([sourceChannelId, sourceMessageId])
+  @@index([reason])
+  @@index([accountId])
+  @@map("skipped_packages")
+}
+```
+
+Reverse relations must be added to `TelegramChannel` and `TelegramAccount` models:
+```prisma
+// In TelegramChannel:
+skippedPackages SkippedPackage[]
+
+// In TelegramAccount:
+skippedPackages SkippedPackage[]
+```
+
+### Worker Changes
+
+**File:** `worker/src/worker.ts`
+
+Extend `PipelineContext` interface to include `accountId` (derived from the ingestion run's account).
+
+At each skip/failure point, upsert a `SkippedPackage` record:
+
+- **Size limit skip** (line 784): reason `SIZE_LIMIT`, no error message
+- **Download failure** (catch in download loop): reason `DOWNLOAD_FAILED` + error text
+- **Extract/metadata failure** (catch in extract): reason `EXTRACT_FAILED` + error text
+- **Upload failure** (catch in upload): reason `UPLOAD_FAILED` + error text
+
+On **successful ingestion** of a package, delete any existing `SkippedPackage` with the same `(sourceChannelId, sourceMessageId)` — so successful retries clean up after themselves.
+
+**File:** `worker/src/db/queries.ts`
+
+Add functions:
+- `upsertSkippedPackage(data)` — create or update skip record
+- `deleteSkippedPackage(sourceChannelId, sourceMessageId)` — remove on success
+
+### Retry Mechanism
+
+Retrying a skipped package:
+1. Delete the `SkippedPackage` record
+2. Find the `AccountChannelMap` record using both `accountId` and `sourceChannelId`, then reset its `lastProcessedMessageId` to `sourceMessageId - 1` (only if less than current watermark)
+3. If `sourceTopicId` is non-null, also reset the corresponding `TopicProgress.lastProcessedMessageId` for that topic
+4. The next ingestion cycle picks up the message and re-attempts processing
+
+For "Retry All" (e.g., all `SIZE_LIMIT` skips after raising the limit):
+- Delete all matching `SkippedPackage` records
+- For each affected (account, channel) pair, reset `AccountChannelMap` watermark to the minimum `sourceMessageId - 1` among deleted records
+- For each affected (account, channel, topic) triple, reset `TopicProgress` watermark similarly
+
+**Note on behavioral distinction:** `DOWNLOAD_FAILED`, `EXTRACT_FAILED`, and `UPLOAD_FAILED` archives already naturally retry because the worker does not advance the watermark past failed sets. The `SkippedPackage` record provides visibility into these failures. The explicit retry/watermark reset is only strictly needed for `SIZE_LIMIT` skips (where the watermark does advance past the skipped message). The UI should present both types but the retry button is most impactful for `SIZE_LIMIT` skips.
+
+**Performance note:** "Retry All" can cause the worker to re-scan large message ranges. The existing dedup logic (`packageExistsBySourceMessage`) ensures already-ingested packages are skipped quickly, but there is a scanning cost proportional to the number of messages between the reset watermark and the current position.
+
+### Frontend Changes
+
+**File:** `src/app/(app)/stls/_components/stl-table.tsx`
+
+Add a "Skipped / Failed" tab alongside the main packages table.
+
+**New file:** `src/app/(app)/stls/_components/skipped-packages-tab.tsx`
+
+Table columns:
+- **fileName** — archive name
+- **fileSize** — formatted size
+- **reason** — color-coded badge: `SIZE_LIMIT` (yellow), `DOWNLOAD_FAILED` (red), `EXTRACT_FAILED` (red), `UPLOAD_FAILED` (red)
+- **errorMessage** — truncated with expandable tooltip/popover for full text
+- **channel** — source channel title
+- **createdAt** — when the skip/failure was recorded
+
+Actions:
+- **Retry** button per row — server action that deletes record + resets watermark
+- **Retry All** button in the header — bulk retry, filterable by reason
+
+**File:** `src/app/(app)/stls/page.tsx`
+
+Fetch skipped packages count (for tab badge) alongside existing queries.
+
+**File:** `src/data/` or `src/lib/telegram/queries.ts`
+
+Add query functions:
+- `listSkippedPackages(options)` — paginated list with reason filter
+- `countSkippedPackages()` — for tab badge
+- `retrySkippedPackage(id)` — delete record + reset watermark
+- `retryAllSkippedPackages(reason?)` — bulk retry
+
+**File:** `src/app/(app)/stls/actions.ts`
+
+Add server actions:
+- `retrySkippedPackageAction(id)`
+- `retryAllSkippedPackagesAction(reason?)`
+
+---
+
+## Files to Create/Modify
+
+### Create
+- `src/app/(app)/stls/_components/skipped-packages-tab.tsx` — skipped packages table UI
+- Prisma migration for `SkippedPackage` model
+
+### Modify
+- `worker/src/util/config.ts` — raise default max size
+- `worker/src/worker.ts` — record skips/failures, clean up on success
+- `worker/src/db/queries.ts` — add skip record CRUD functions
+- `prisma/schema.prisma` — add `SkippedPackage` model and `SkipReason` enum
+- `src/lib/telegram/queries.ts` — modify `searchPackages()` for match counts, add skipped package queries
+- `src/lib/telegram/types.ts` — add `matchedFileCount`/`matchedByContent` to `PackageListItem`, add skipped package types
+- `src/app/(app)/stls/page.tsx` — pass search term, fetch skipped count, add tab
+- `src/app/(app)/stls/_components/stl-table.tsx` — accept search prop, render tabs
+- `src/app/(app)/stls/_components/package-columns.tsx` — render match badge
+- `src/app/(app)/stls/_components/package-files-drawer.tsx` — accept highlightTerm, highlight matching files, auto-expand matched folders
+- `src/app/(app)/stls/actions.ts` — add retry server actions
--- a/docs/superpowers/specs/2026-03-25-package-grouping-design.md
+++ b/docs/superpowers/specs/2026-03-25-package-grouping-design.md
@@ -0,0 +1,246 @@
+# Package Grouping Design
+
+## Overview
+
+Add the ability to group related packages that were posted together in a Telegram channel (e.g., "DUNGEON BLOCKS - Colossal Dungeon" with 6 separate archive files). Groups appear as collapsible rows in the STL files table, with support for both automatic detection via Telegram album IDs and manual grouping through the UI.
+
+## Goals
+
+- Automatically detect and group files posted together in Telegram (same `media_album_id`)
+- Display groups as collapsed rows in the STL table with aggregated metadata
+- Allow manual grouping/ungrouping of packages via the UI
+- Support editable group names and preview images
+- Enable "Send All" to deliver every package in a group via the bot
+
+## Non-Goals
+
+- Merging grouped packages into a single Package record (each stays independent)
+- Time-proximity heuristics for grouping (too error-prone)
+- Grouping across different source channels
+
+---
+
+## Data Model
+
+### New `PackageGroup` Table
+
+```prisma
+model PackageGroup {
+  id              String           @id @default(cuid())
+  name            String
+  mediaAlbumId    String?
+  sourceChannelId String
+  previewData     Bytes?
+  createdAt       DateTime         @default(now())
+  updatedAt       DateTime         @updatedAt
+
+  packages        Package[]
+  sourceChannel   TelegramChannel  @relation(fields: [sourceChannelId], references: [id], onDelete: Cascade)
+
+  @@unique([mediaAlbumId, sourceChannelId])
+  @@index([sourceChannelId])
+  @@map("package_groups")
+}
+```
+
+### Package Model Changes
+
+Add optional group membership:
+
+```prisma
+model Package {
+  // ... existing fields ...
+  packageGroupId  String?
+  packageGroup    PackageGroup?    @relation(fields: [packageGroupId], references: [id], onDelete: SetNull)
+
+  @@index([packageGroupId])
+}
+```
+
+### TelegramChannel Model Changes
+
+Add back-relation for the new `PackageGroup` model:
+
+```prisma
+model TelegramChannel {
+  // ... existing fields and relations ...
+  packageGroups   PackageGroup[]
+}
+```
+
+### Key Decisions
+
+- `mediaAlbumId` is `String?` (TDLib int64 stringified) — only used for dedup lookups, avoids BigInt complexity
+- `@@unique([mediaAlbumId, sourceChannelId])` prevents duplicate album-derived groups when re-scanning. PostgreSQL treats NULLs as distinct in unique constraints, so manually-created groups (with `mediaAlbumId = null`) are not constrained by this — which is correct behavior
+- Idempotency for album groups uses `findFirst({ where: { mediaAlbumId, sourceChannelId } })` + conditional `create`, not `upsert`, because Prisma does not support `upsert` on compound unique keys with nullable fields
+- `onDelete: SetNull` on `Package.packageGroup` means dissolving a group automatically unlinks all members
+- `onDelete: Cascade` on `PackageGroup.sourceChannel` means deleting a channel cleans up its groups
+- `sourceTopicId` is omitted from `PackageGroup` — it can be inferred from member packages, and manual groups may span topics
+- `@@map("package_groups")` follows the project's snake_case table naming convention
+- `previewData` stores JPEG thumbnail bytes directly on the group (same pattern as Package)
+
+---
+
+## Worker Changes
+
+### TelegramMessage Interface
+
+Add optional `mediaAlbumId` field:
+
+```typescript
+export interface TelegramMessage {
+  id: bigint;
+  fileName: string;
+  fileId: string;
+  fileSize: bigint;
+  date: Date;
+  mediaAlbumId?: string;  // Absent or "0" when not part of an album
+}
+```
+
+The field is optional to minimize call-site changes. The grouping step treats `undefined` and `"0"` equivalently as "not part of an album."
+
+### TelegramPhoto Interface
+
+Add optional `mediaAlbumId` field:
+
+```typescript
+export interface TelegramPhoto {
+  id: bigint;
+  date: Date;
+  caption: string;
+  fileId: string;
+  fileSize: number;
+  mediaAlbumId?: string;  // For album-to-preview correlation
+}
+```
+
+### Channel Scanning
+
+In `getChannelMessages()`, read `media_album_id` from the TDLib message object (already present in TDLib responses, just not captured today). Add `media_album_id?: string` to the `TdMessage` interface and pass through to both `TelegramMessage` and `TelegramPhoto`.
+
+The document pass and photo pass already run as separate loops over `searchChatMessages`. Both loops capture `media_album_id` independently. Correlation happens at grouping time: album photos are matched to album documents by comparing their `mediaAlbumId` values, not at scan time.
+
+### Group Creation (Post-Processing)
+
+After each scan cycle's packages are individually processed (downloaded, hashed, uploaded, indexed), a post-processing step handles grouping:
+
+1. Collect all packages from the current scan batch that share the same non-zero `mediaAlbumId`
+2. For each distinct `mediaAlbumId`, check if a `PackageGroup` already exists via `findFirst({ where: { mediaAlbumId, sourceChannelId } })`
+3. If no group exists, create one:
+   - **Name:** caption of the first message in the album (falls back to first file's base name)
+   - **Preview:** find a `TelegramPhoto` from the scan's `photos[]` array with the same `mediaAlbumId`. If found, download via `downloadPhotoThumbnail`. If not, the group starts with no preview (can be added in UI later)
+4. Link all member packages via an idempotent `updateMany` — sets `packageGroupId` on all packages whose `sourceMessageId` is in the album's message set. This handles both newly-indexed packages and previously-indexed ones that were created in an earlier partial scan (e.g., if one package failed and was retried later)
+
+The per-package pipeline is unchanged — each file is still downloaded, hashed, deduped, split, uploaded, and indexed independently. Grouping is a layer on top.
+
+---
+
+## Query Layer
+
+### Paginated Listing with Groups
+
+The STL table shows "display items" — either a group (collapsed) or a standalone package. Pagination operates on display items so that a group occupies exactly one slot regardless of member count.
+
+**Two-step query approach** (handles filters correctly):
+
+**Step 1 — Find matching display item IDs:**
+
+```sql
+-- Find all group IDs and standalone package IDs where at least one member matches filters
+SELECT DISTINCT COALESCE(p."packageGroupId", p.id) AS display_id,
+       CASE WHEN p."packageGroupId" IS NOT NULL THEN 'group' ELSE 'package' END AS display_type,
+       MAX(p."indexedAt") AS sort_date
+FROM packages p
+LEFT JOIN package_groups pg ON pg.id = p."packageGroupId"
+WHERE 1=1
+  -- Optional filters applied here (creator, tags, search text, channelId)
+GROUP BY COALESCE(p."packageGroupId", p.id),
+         CASE WHEN p."packageGroupId" IS NOT NULL THEN 'group' ELSE 'package' END
+ORDER BY sort_date DESC
+LIMIT $1 OFFSET $2
+```
+
+**Step 2 — Fetch full data:**
+
+For groups on the current page, fetch all member packages (including those that didn't match filters — the group appears because at least one member matched, but the expanded view shows all members). For standalone packages, fetch the full package data.
+
+**Count query** (for pagination total):
+
+```sql
+SELECT COUNT(*) FROM (
+  SELECT DISTINCT COALESCE(p."packageGroupId", p.id)
+  FROM packages p
+  WHERE 1=1
+  -- Same filters as step 1
+) AS display_items
+```
+
+### Group Row Aggregates
+
+Computed in the step 2 fetch: total file size (sum), total file count (sum), combined tags (array union), member package count per group. These populate the collapsed group row.
+
+### Search
+
+`searchPackages` adds `PackageGroup.name` to search targets via a `LEFT JOIN` to `package_groups`. If any package in a group matches by name/file content, or the group name matches, the whole group appears.
+
+### Filtering
+
+Creator/tag filters apply to member packages. A group appears if any member matches the filter. The group row shows aggregates of all members (not just matching ones).
+
+### New Query Functions
+
+| Function | Purpose |
+|----------|---------|
+| `listDisplayItems(page, limit, filters)` | Two-step paginated query returning groups + standalone packages |
+| `getDisplayItemCount(filters)` | Count of display items for pagination total |
+| `getPackageGroup(groupId)` | Group metadata + all member packages |
+| `updatePackageGroupName(groupId, name)` | Rename group |
+| `updatePackageGroupPreview(groupId, previewData)` | Replace group preview |
+| `addPackagesToGroup(packageIds, groupId)` | Manual grouping — add to existing group |
+| `removePackageFromGroup(packageId)` | Ungroup single package |
+| `createManualGroup(name, packageIds)` | Create new group from UI |
+| `dissolveGroup(groupId)` | Ungroup all members, delete group record |
+
+For manual grouping of packages that already belong to different groups: the UI first dissolves empty source groups (groups where all members were moved), then links the selected packages to the target group. Non-selected members of source groups remain in their original group.
+
+---
+
+## UI Changes
+
+### STL Table — Group Rows
+
+- **Collapsed (default):** Single row showing preview thumbnail, group name (editable inline), archive type badge ("Mixed" if heterogeneous), combined size, combined file count, combined tags (editable), source channel, latest `indexedAt`, actions
+- **Expanded:** Chevron toggle reveals member packages as indented sub-rows with their existing columns and per-package actions
+- Chevron icon on the left of the row toggles expand/collapse
+
+**Loading strategy:** Member packages for all groups on the current page are prefetched in a single batched query during the step 2 fetch. This means expand/collapse is instant (no on-demand loading) and avoids per-row loading states.
+
+### Group Row Actions
+
+- **Send All** — Queues bot send requests for every package in the group. Checks for existing PENDING/SENDING requests per package to avoid duplicates.
+- **View Files** — Opens file drawer showing all member packages' files, separated by package name headers
+- **Dissolve Group** — Ungroups all members (confirmation required)
+
+### Individual Package Actions (Within a Group)
+
+- Existing: Send, View Files
+- New: "Remove from group" in dropdown menu
+
+### Manual Grouping
+
+- Checkbox selection column on package rows
+- When 2+ packages selected, a "Group Selected" button appears in the table toolbar
+- Prompts for a group name, creates the group
+- If selected packages belong to existing groups, those packages are moved to the new group. Source groups that become empty are automatically dissolved.
+
+### Preview Editing
+
+- Click the group's preview thumbnail to upload a replacement image
+- Same upload flow as individual packages (existing component reuse)
+
+### No Changes To
+
+- Skipped/failed packages tab
+- Package detail drawer internals
+- Search UI (just broader matching behind the scenes)