fix(worker): skip integrity test for multipart ZIPs — unzip -t can't span them
All checks were successful
continuous-integration/drone/push Build is passing

Diagnosed from production: main downloaded several 28 GB ZIP sets
(CA 3D STUDIOS 2023-07.zip.001..007, 2023-08.zip.001..006, ...) and
rejected every one of them with:

  "Archive integrity check failed: Command failed:
   unzip -tqq /tmp/zips/.../CA 3D STUDIOS 2023-07.zip.001"

Root cause: the integrity test I added in 04effed passed `uploadPaths[0]`
to the archive tester. For byte-split multipart ZIPs (`.zip.001`,
`.zip.002`, ...), the first chunk isn't a valid ZIP on its own — the
central directory only exists at the END of the assembled archive.
unzip's spanned-ZIP support uses `.z01/.z02/.../.zip` naming, not
`.zip.001/.002`, so even pointing at the assembled-form parts wouldn't
help.

Three correctness changes:

  1. Test runs on `tempPaths[0]` (the original downloaded file) instead
     of `uploadPaths[0]` (which may be byte-split chunks we created).
     For single-file ZIPs we re-split, this still tests the unsplit
     original.

  2. Skip the test entirely when archiveType=ZIP AND tempPaths.length>1
     — these are source multipart ZIPs we can't validate without
     concatenating, and the hash check + central-directory parse we
     already do are sufficient structural signals.

  3. RAR and 7Z multipart still ARE tested — `unrar t` and `7z t` both
     auto-discover sibling parts when pointed at the first one.

This unblocks all multipart-ZIP ingestion for the main account. Hours
of downloaded archives that were being rejected will now pass through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 11:15:07 +02:00
parent 166dc556c9
commit 25a6196262

View File

@@ -1682,16 +1682,28 @@ async function processOneArchiveSet(
}
// ── Pre-upload integrity test ──
// Catch broken/encrypted archives before we burn upload bandwidth on
// them. Cheap (unzip -t / unrar t / 7z t) compared to a multi-GB upload.
// Skipped when we're reusing an existing upload — no point testing the
// file again.
const integrity = await testArchiveIntegrity(
archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type,
uploadPaths[0]
);
if (!integrity.ok) {
throw new Error(`Archive integrity check failed: ${integrity.reason}`);
// Catch broken/encrypted archives before we burn upload bandwidth.
//
// Important nuance: ZIP multipart archives use byte-level chunk naming
// (`.zip.001`, `.zip.002`, ...). Individual chunks aren't valid ZIPs
// — the central directory only exists in the last chunk and unzip can't
// span the `.zip.001` naming convention. Testing the first chunk alone
// always fails with "no central directory found". Skip the test for
// those.
//
// RAR and 7z CLI tools auto-discover sibling parts when pointed at the
// first part, so `unrar t` / `7z t` work for multipart RAR/7z.
//
// Single-file archives (regardless of whether WE re-split them for
// upload size limits) are always testable on the original tempPaths[0]
// since that's the unsplit downloaded file.
const archType = archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type;
const isMultipartZip = archType === "ZIP" && tempPaths.length > 1;
if (!isMultipartZip) {
const integrity = await testArchiveIntegrity(archType, tempPaths[0]);
if (!integrity.ok) {
throw new Error(`Archive integrity check failed: ${integrity.reason}`);
}
}
// ── Uploading ──