Audit of every TDLib call site against the live 1.8.64 schema in
node_modules/@prebuilt-tdlib/types/tdlib-types.d.ts surfaced three
additional silent breakages beyond the getForumTopics fix in 106700b.
1. searchChatMessages parameter restructure
The top-level `message_thread_id` and `saved_messages_topic_id`
request fields were collapsed into a single tagged-union
`topic_id: MessageTopic$Input`. Three call sites affected:
- topics.ts getTopicMessages — was passing message_thread_id, now
sends topic_id with the messageTopicForum variant carrying
forum_topic_id. Without this the topic scan returns the whole
channel (or nothing) instead of just the topic.
- download.ts getChannelMessages — used to pass message_thread_id: 0;
just omit the topic_id field entirely for a flat scan.
- rebuild.ts — same treatment.
2. message.reply_to_message_id replaced with reply_to tagged union
On incoming messages, the flat `reply_to_message_id` field was
replaced with `reply_to: MessageReplyTo` (messageReplyToMessage or
messageReplyToStory). Our reply-chain grouping needs the message-ID
case.
Added extractReplyToMessageId() that reads both old and new shapes
so a transition build or future downgrade still works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the TDLib upgrade in 18a0efb, getForumTopicList returned 0 topics
for every forum channel. Confirmed in production logs:
"title":"Model Printing Emporium","topicCount":0
"title":"GB_Butler_Bot2","topicCount":0
"title":"Darnascus 2 : Flamigos Miniatures","topicCount":0
Cycle results: messagesScanned=0, zipsFound=0 — main account's entire
ingestion pipeline was a no-op because all source channels are forums.
Root cause: TDLib 1.8.64 renamed three fields without bumping the
breaking-change indicator we'd notice:
Request offset_message_thread_id → offset_forum_topic_id
Response next_offset_message_thread_id → next_offset_forum_topic_id
Response topics[].info.message_thread_id → topics[].info.forum_topic_id
The old field names became no-ops in the new TDLib, so every request
came back with an empty topic list and the "stuck pagination" detection
correctly bailed out.
Fix: send the new field name on the request side, read both old and
new names on the response side (so a future TDLib version change in
either direction stays handled).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fileName + size repost detection from ff4e150 works but has a
theoretical false-positive: two unrelated files in the same channel
with identical names and identical total sizes get treated as duplicates.
TDLib's document.remote.unique_id is a stable identifier per file
content — every repost of the exact same file across messages keeps
the same unique_id. Using it as the first dedup check eliminates the
false-positive risk entirely.
Schema:
- Package.remoteUniqueId (nullable, since existing rows lack it)
- Index on (sourceChannelId, remoteUniqueId)
Pipeline:
1. Capture remoteUniqueId in getChannelMessages + getTopicMessages
2. Pass through TelegramMessage type
3. processOneArchiveSet checks findPackageByRemoteUniqueId FIRST
(before packageExistsBySourceMessage / findRepostedPackage)
4. createPackageStub stores it on the new Package row
Existing 19,952 Packages have remoteUniqueId = NULL — they fall through
to the existing checks (source-msg-id, name+size, content-hash). New
ingestions populate it and benefit from the strong signal immediately.
Old Packages get backfilled organically when their content is
re-encountered and a new Package would otherwise be created.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, channels/topics with no new archives never had their
watermark updated. This meant every cycle re-scanned all messages from
scratch just to discover nothing new — especially costly for the 1079-
topic Model Printing Emporium forum.
- Add maxScannedMessageId to ChannelScanResult (highest msg ID seen)
- Set channel watermark to scan boundary when no archives are found
- Set topic watermark to scan boundary when no archives are found
- Fall back to scan watermark when archive processing doesn't advance it
After one full cycle, subsequent cycles will skip already-scanned
messages via the early-exit boundary check, dramatically reducing
TDLib API calls on channels with mostly non-archive content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add invokeWithTimeout wrapper for TDLib API calls (2min timeout per call)
- Add stuck detection to getChannelMessages: break if from_message_id doesn't advance
- Add stuck detection to getTopicMessages: same protection for topic scanning
- Add stuck detection to getForumTopicList: break if pagination offsets don't advance
- Add max page limit (5000) to all scanning loops to prevent infinite pagination
- Add mutex wait timeout (30min) to prevent indefinite blocking when holder hangs
- Add cycle timeout (4h default, configurable via WORKER_CYCLE_TIMEOUT_MINUTES)
- Fix end-of-page detection to use actual limit value instead of hardcoded 100
Co-authored-by: xCyanGrizzly <53275238+xCyanGrizzly@users.noreply.github.com>
- Add progress callbacks to getChannelMessages and getTopicMessages that
fire after each page of messages is fetched
- Worker now shows channel progress (e.g. "[2/5] Channel Name") when
processing multiple source channels
- Worker now shows topic progress (e.g. "topic 3/12") when scanning forums
- Worker now shows live message scanning count during channel/topic scans
(e.g. "Scanning Channel — 300 messages scanned")
- UI stats line now always shows messagesScanned count
- messagesScanned counter now increments during the scanning phase, not
just during archive processing
Co-authored-by: xCyanGrizzly <53275238+xCyanGrizzly@users.noreply.github.com>
Adds full Telegram ZIP ingestion pipeline: TDLib worker service scans source
channels for archive files, deduplicates by content hash, extracts metadata,
uploads to archive channel, and indexes in Postgres. Forum supergroups are
scanned per-topic with topic names used as creator. Filename-based creator
extraction (e.g. "Mammoth Factory - 2026-01.zip") serves as fallback.
Includes admin UI for managing accounts/channels, simplified account setup
(API credentials via env vars), auth code/password submission dialog,
package browser with creator column, and live ingestion activity tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>