← Back to Blog

What Survives the Zip? A Practical Guide to Metadata in Archives

Files don’t travel alone when you compress them—they bring names, timestamps, permissions, and other metadata. This guide explains what ZIP (and a few other formats) actually preserve, where things get lost in translation across operating systems, and how to avoid surprises.

What Survives the Zip? A Practical Guide to Metadata in Archives - Image 1 What Survives the Zip? A Practical Guide to Metadata in Archives - Image 2

Why metadata matters more than you think

When you archive a folder, you’re packaging more than file contents. Timestamps tell build systems what to rebuild. Executable bits decide whether a script can run. Filenames and encodings determine whether files even unpack correctly on another machine. If you’re moving code between macOS and Linux, sending design assets to a Windows team, or creating backups, understanding which metadata survives the trip can save hours of head-scratching. ZIP is ubiquitous because it’s easy and well-supported, but its handling of metadata varies by tool and platform. Knowing the rules—and their exceptions—helps you choose the right format and settings for your task.

Time is tricky: timestamps and time zones in ZIP

ZIP historically stores a DOS-style timestamp with two-second precision, which is coarse compared to modern filesystems. Many tools add extended fields with higher-resolution timestamps or UTC times, but not all extractors read them. As a result, files can emerge with times rounded, shifted by time zone, or flattened to the moment of extraction. This matters for build systems and reproducible workflows because any clock drift changes outcomes. To reduce surprises, keep source times consistent (e.g., set a stable modification time during packaging), avoid mixing time zones across machines, and prefer tools that write and read UTC-capable extended timestamps. If exact timing matters, verify after extraction—especially when moving between Windows (NTFS), macOS (APFS/HFS+), and Linux (ext4, XFS), which track time with different resolutions.

Names and encodings: the cross-platform minefield

Filenames are more than characters; they’re encodings and normalization rules. Classic ZIP assumes CP437 for names unless a UTF‑8 flag is set. Modern tools typically set that flag, but older extractors may misread non‑ASCII characters. macOS often stores names in a decomposed form (NFD), while most other systems prefer composed (NFC). A name that looks identical can differ at the byte level, causing duplicates or extraction errors in edge cases. Case-sensitivity is another trap: Linux treats “Readme” and “README” as separate files; Windows usually collapses them. Long paths can also fail on older Windows setups. To maximize compatibility, stick to UTF‑8, avoid look‑alike names that differ only in case or normalization, keep path lengths reasonable, and test on the target OS. If you’re archiving macOS packages or design asset folders, be aware of macOS-specific resource data that may produce extra “._” files when unpacked elsewhere.

Permissions, symlinks, and special files: what gets preserved

ZIP can carry POSIX mode bits (including the executable bit) via extra fields, but not all tools write or interpret them. It’s common to lose executable permissions when moving archives to Windows and then back to Linux unless the archiver preserves them and the extractor honors them. Symbolic links are supported by many UNIX-aware ZIP tools as special entries, yet some extractors on Windows may dereference or convert them to regular files, changing behavior. Advanced metadata like extended ACLs, device files, and Windows alternate data streams generally don’t survive in plain ZIP. If you need to preserve full UNIX metadata—owner, group, permissions, symlinks—a tar-based workflow (e.g., .tar.gz or .tar.zst) is often more reliable across UNIX-like systems. Use ZIP when broad end-user compatibility is the priority; use tar-based archives when faithful reproduction of a POSIX tree is required.

Reproducible and verifiable archives

Reproducible archives unpack identically every time, a key property for build pipelines, cache hits, and supply-chain verification. Non-determinism usually comes from variable timestamps, differing file order, or tool-specific extra fields. To get closer to reproducible results, normalize modification times (e.g., set all files to a fixed UTC time), sort inputs by a stable rule before archiving, and avoid embedding volatile metadata. After creation, verify by hashing the archive or, better yet, unpacking and hashing the directory tree to confirm content identity across runs. If you distribute artifacts, consider signing the archive and publishing a checksum. The goal isn’t just bit-for-bit identical ZIP files, but reliably identical contents when extracted on the target systems.