mirror of
https://github.com/xCyanGrizzly/DragonsStash.git
synced 2026-06-13 12:41:16 +00:00
feat(worker): use TDLib remote.unique_id as zero-false-positive dedup signal
The fileName + size repost detection from ff4e150 works but has a
theoretical false-positive: two unrelated files in the same channel
with identical names and identical total sizes get treated as duplicates.
TDLib's document.remote.unique_id is a stable identifier per file
content — every repost of the exact same file across messages keeps
the same unique_id. Using it as the first dedup check eliminates the
false-positive risk entirely.
Schema:
- Package.remoteUniqueId (nullable, since existing rows lack it)
- Index on (sourceChannelId, remoteUniqueId)
Pipeline:
1. Capture remoteUniqueId in getChannelMessages + getTopicMessages
2. Pass through TelegramMessage type
3. processOneArchiveSet checks findPackageByRemoteUniqueId FIRST
(before packageExistsBySourceMessage / findRepostedPackage)
4. createPackageStub stores it on the new Package row
Existing 19,952 Packages have remoteUniqueId = NULL — they fall through
to the existing checks (source-msg-id, name+size, content-hash). New
ingestions populate it and benefit from the strong signal immediately.
Old Packages get backfilled organically when their content is
re-encountered and a new Package would otherwise be created.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -472,6 +472,11 @@ model Package {
|
||||
sourceChannelId String
|
||||
sourceMessageId BigInt
|
||||
sourceTopicId BigInt?
|
||||
/// TDLib's `remote.unique_id` for the FIRST part's file. Stable across
|
||||
/// reposts of identical content in the same channel — used as the
|
||||
/// strongest pre-download dedup signal (no false positives unlike
|
||||
/// fileName + size matching).
|
||||
remoteUniqueId String?
|
||||
destChannelId String?
|
||||
destMessageId BigInt?
|
||||
destMessageIds BigInt[] @default([])
|
||||
@@ -503,6 +508,7 @@ model Package {
|
||||
@@index([archiveType])
|
||||
@@index([creator])
|
||||
@@index([packageGroupId])
|
||||
@@index([sourceChannelId, remoteUniqueId])
|
||||
@@map("packages")
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user