From 25a619626236aeabd6b023636c7a4f1592cd35f3 Mon Sep 17 00:00:00 2001 From: xCyanGrizzly Date: Tue, 26 May 2026 11:15:07 +0200 Subject: [PATCH] =?UTF-8?q?fix(worker):=20skip=20integrity=20test=20for=20?= =?UTF-8?q?multipart=20ZIPs=20=E2=80=94=20unzip=20-t=20can't=20span=20them?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Diagnosed from production: main downloaded several 28 GB ZIP sets (CA 3D STUDIOS 2023-07.zip.001..007, 2023-08.zip.001..006, ...) and rejected every one of them with: "Archive integrity check failed: Command failed: unzip -tqq /tmp/zips/.../CA 3D STUDIOS 2023-07.zip.001" Root cause: the integrity test I added in 04effed passed `uploadPaths[0]` to the archive tester. For byte-split multipart ZIPs (`.zip.001`, `.zip.002`, ...), the first chunk isn't a valid ZIP on its own — the central directory only exists at the END of the assembled archive. unzip's spanned-ZIP support uses `.z01/.z02/.../.zip` naming, not `.zip.001/.002`, so even pointing at the assembled-form parts wouldn't help. Three correctness changes: 1. Test runs on `tempPaths[0]` (the original downloaded file) instead of `uploadPaths[0]` (which may be byte-split chunks we created). For single-file ZIPs we re-split, this still tests the unsplit original. 2. Skip the test entirely when archiveType=ZIP AND tempPaths.length>1 — these are source multipart ZIPs we can't validate without concatenating, and the hash check + central-directory parse we already do are sufficient structural signals. 3. RAR and 7Z multipart still ARE tested — `unrar t` and `7z t` both auto-discover sibling parts when pointed at the first one. This unblocks all multipart-ZIP ingestion for the main account. Hours of downloaded archives that were being rejected will now pass through. Co-Authored-By: Claude Opus 4.7 (1M context) --- worker/src/worker.ts | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/worker/src/worker.ts b/worker/src/worker.ts index 8356b2e..63fc00b 100644 --- a/worker/src/worker.ts +++ b/worker/src/worker.ts @@ -1682,16 +1682,28 @@ async function processOneArchiveSet( } // ── Pre-upload integrity test ── - // Catch broken/encrypted archives before we burn upload bandwidth on - // them. Cheap (unzip -t / unrar t / 7z t) compared to a multi-GB upload. - // Skipped when we're reusing an existing upload — no point testing the - // file again. - const integrity = await testArchiveIntegrity( - archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type, - uploadPaths[0] - ); - if (!integrity.ok) { - throw new Error(`Archive integrity check failed: ${integrity.reason}`); + // Catch broken/encrypted archives before we burn upload bandwidth. + // + // Important nuance: ZIP multipart archives use byte-level chunk naming + // (`.zip.001`, `.zip.002`, ...). Individual chunks aren't valid ZIPs + // — the central directory only exists in the last chunk and unzip can't + // span the `.zip.001` naming convention. Testing the first chunk alone + // always fails with "no central directory found". Skip the test for + // those. + // + // RAR and 7z CLI tools auto-discover sibling parts when pointed at the + // first part, so `unrar t` / `7z t` work for multipart RAR/7z. + // + // Single-file archives (regardless of whether WE re-split them for + // upload size limits) are always testable on the original tempPaths[0] + // since that's the unsplit downloaded file. + const archType = archiveSet.type === "7Z" ? "SEVEN_Z" : archiveSet.type; + const isMultipartZip = archType === "ZIP" && tempPaths.length > 1; + if (!isMultipartZip) { + const integrity = await testArchiveIntegrity(archType, tempPaths[0]); + if (!integrity.ok) { + throw new Error(`Archive integrity check failed: ${integrity.reason}`); + } } // ── Uploading ──