2

I've just started using System.IO.Compression in .NET 4.5, and found a problem. It stores the files with a local modification time, not Universal UTC time.

So if you zip files in one time zone and unzip them in another, it uses the local modification time (say 1 PM) from the original file, and extracts the file with the same modification time (also 1 PM), even though it should be hours earlier or later.

I assume the same problem would exist with files zipped in Standard Time or Daylight Savings Time, and unzipped later in the other.

It appears that there is a missing setting during zipping, since other methods of unzipping (WinZip, compressed folder extract) produce the same wrong modification time.

I've tested using WinZip to zip and unzip files in different time zones, and it doesn't have this problem. It must use UTC internally for the modification times.

Is there any way around this other than building my own time-shifting routines during Zip and Unzip?

This project can't use any external apps or libraries. We are limited to using just .NET functions.

Kraang Prime
  • 9,981
  • 10
  • 58
  • 124
  • Can you post short sample code that reproduces and demonstrates the problem? – Peter Ritchie Sep 25 '14 at 18:50
  • Only thing I can see that you have access to (in .NET 4.5) is `ZipArchiveEntry.LastWriteTime` which is of type `DateTimeOffset`. So, the `ZipArchive` code should have all it needs to write out the information correctly. Without some way of overriding how that particular information gets written to disk, I don't see any way around it. – Peter Ritchie Sep 25 '14 at 19:32
  • Mind you, this http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers seems to show that the date/time is just 2 bytes; I don't see how it could write out something that could be displayed in local time in different locales. Writing out a UTC date/time doesn't cause the date time to be displayed properly, it just displays the UTC time (e.g. 19:38 at 3:38pm my local time). – Peter Ritchie Sep 25 '14 at 19:39
  • 1
    The ZIP archive format is ancient. Those time and date fields are encoded in MS-DOS format. DOS never had a notion of UTC, it strictly worked with local time. – Hans Passant Sep 25 '14 at 22:16
  • I did manage to add an offset value from UTC ("+07:00:00") in a file in the root of the zip file. I extract it and compare it to the target's offset ("+04:00:00"), and then built a routine which updated all the unzipped files by the offset difference. It worked! Good enough for this project. Enough timey-wimey stuff for today. – Scott Bakker Sep 26 '14 at 20:54
  • Sorry, got my signs wrong again. -07:00:00 and -04:00:00. Ah, well, that's what testing is for. – Scott Bakker Sep 26 '14 at 21:14
  • @ScottBakker for my own curiosity, you changed a field in the zip header? Which one? – Peter Ritchie Sep 27 '14 at 15:17
  • @PeterRitchie, No, I left the Zip file alone. I created a file in the root of the directory I was zipping. This is the routine "ZipAllFiles": ## Dim CurrUTCOffset As New DateTimeOffset(Now) ## File.WriteAllText(TempDir + "\.Local2UTCOffset", CurrUTCOffset.Offset.ToString) ## ZipFile.CreateFromDirectory(TempDir, ZipFileName) ## ...continued... – Scott Bakker Sep 29 '14 at 15:28
  • Then I unzipped the whole directory at the other end in "UnzipAllFiles" and retrieved the original Offset: ## Dim ZipFilesOffset As String ## ZipFile.ExtractToDirectory(ZipFileLocalName, TargetDir) ## ZipFilesOffset = File.ReadAllText(TargetDir + "\.Local2UTCOffset") ## ...continued... – Scott Bakker Sep 29 '14 at 15:34
  • And got the difference between the two Offsets: ## Dim CurrUTCOffset As New DateTimeOffset(Now) ## Dim ZipOffsetTimespan As TimeSpan ## ZipOffsetTimespan = TimeSpan.Parse(CurrUTCOffset.Offset.ToString.Replace("+", "")) ## ZipOffsetTimespan = ZipOffsetTimespan.Subtract(TimeSpan.Parse(ZipFilesOffset.Replace("+", ""))) ## ...continued... – Scott Bakker Sep 29 '14 at 15:34
  • Finally applying it to every unzipped file, spinning through all files and subdirectories in "UpdateFileDateTimeRecursive": ## Dim UpdatedLastWriteTimeUTC As DateTimeOffset ## UpdatedLastWriteTimeUTC = DateTimeOffset.Parse(CurrSourceFile.LastWriteTimeUtc.ToString + " +00:00") ## UpdatedLastWriteTimeUTC = UpdatedLastWriteTimeUTC.Add(ZipOffsetTimespan) ## CurrSourceFile.LastWriteTimeUtc = CDate(UpdatedLastWriteTimeUTC.ToString) – Scott Bakker Sep 29 '14 at 15:36
  • There may have been a simpler way to do this, without all the conversion/parsing to and from strings, but this worked and I left it at that. – Scott Bakker Sep 29 '14 at 15:39
  • The thing that required the most debugging was getting the proper offsets on the two sides of the "ZipOffsetTimespan.Subtract", realizing that I had to replace the "+" in positive offsets with "" so it would parse, and putting the " +00:00" at the end of each file's UTC string. TimeSpan.Parse doesn't want "+", but DateTimeOffset.Parse does. Hope this helps! – Scott Bakker Sep 29 '14 at 15:50

1 Answers1

2

As mentioned by Hans Passant in a comment, the zip file format makes use of a MS-DOS Date & Time structure.

This structure is defined as two separate unsigned short values like so:

wFatDate

The MS-DOS date. The date is a packed value with the following format.
Bits    Description
0-4     Day of the month (1–31)
5-8     Month (1 = January, 2 = February, and so on)
9-15    Year offset from 1980 (add 1980 to get actual year)

wFatTime

The MS-DOS time. The time is a packed value with the following format.
Bits    Description
0-4     Second divided by 2
5-10    Minute (0–59)
11-15   Hour (0–23 on a 24-hour clock)

At the time MS-DOS was created, timezone was not being used on those computers (Unix already had the concept, though, since 1970.) People who used MS-DOS were often in their office or at home and did not communicate with people in other states let alone other countries via the computer. Intranet was pretty expensive too at the time.

The company that created the zip file format made the mistake of using the FAT file system date format and it stuck. So zip files are created using local time (it doesn't have to, but it's the expected behavior, at least.)

The zip format offers ways to add extensions (link from @user3342816 who posted a comment), though, including various timestamps.

0x000a        NTFS (Win9x/WinNT FileTimes)
0x000d        Unix

The NTFS block is describe like so:

     -PKWARE Win95/WinNT Extra Field:
      ==============================

      The following description covers PKWARE's "NTFS" attributes
      "extra" block, introduced with the release of PKZIP 2.50 for
      Windows. (Last Revision 20001118)

      (Note: At this time the Mtime, Atime and Ctime values may
      be used on any WIN32 system.)
      [Info-ZIP note: In the current implementations, this field has
      a fixed total data size of 32 bytes and is only stored as local
      extra field.]

      Value         Size        Description
      -----         ----        -----------
      0x000a        Short       Tag (NTFS) for this "extra" block type
      TSize         Short       Total Data Size for this block
      Reserved      Long        for future use
      Tag1          Short       NTFS attribute tag value #1
      Size1         Short       Size of attribute #1, in bytes
      (var.)        SubSize1    Attribute #1 data
      .
      .
      .
      TagN          Short       NTFS attribute tag value #N
      SizeN         Short       Size of attribute #N, in bytes
      (var.)        SubSize1    Attribute #N data

      For NTFS, values for Tag1 through TagN are as follows:
      (currently only one set of attributes is defined for NTFS)

      Tag        Size       Description
      -----      ----       -----------
      0x0001     2 bytes    Tag for attribute #1
      Size1      2 bytes    Size of attribute #1, in bytes (24)
      Mtime      8 bytes    64-bit NTFS file last modification time
      Atime      8 bytes    64-bit NTFS file last access time
      Ctime      8 bytes    64-bit NTFS file creation time

      The total length for this block is 28 bytes, resulting in a
      fixed size value of 32 for the TSize field of the NTFS block.

      The NTFS filetimes are 64-bit unsigned integers, stored in Intel
      (least significant byte first) byte order. They determine the
      number of 1.0E-07 seconds (1/10th microseconds!) past WinNT "epoch",
      which is "01-Jan-1601 00:00:00 UTC".

The Unix block includes two timestamps as well:

     -PKWARE Unix Extra Field:
      ========================

      The following is the layout of PKWARE's Unix "extra" block.
      It was introduced with the release of PKZIP for Unix 2.50.
      Note: all fields are stored in Intel low-byte/high-byte order.
      (Last Revision 19980901)

      This field has a minimum data size of 12 bytes and is only stored
      as local extra field.

      Value         Size        Description
      -----         ----        -----------
      0x000d        Short       Tag (Unix0) for this "extra" block type
      TSize         Short       Total Data Size for this block
      AcTime        Long        time of last access (UTC/GMT)
      ModTime       Long        time of last modification (UTC/GMT)
      UID           Short       Unix user ID
      GID           Short       Unix group ID
      (var)         variable    Variable length data field

      The variable length data field will contain file type
      specific data.  Currently the only values allowed are
      the original "linked to" file names for hard or symbolic
      links, and the major and minor device node numbers for
      character and block device nodes.  Since device nodes
      cannot be either symbolic or hard links, only one set of
      variable length data is stored.  Link files will have the
      name of the original file stored.  This name is NOT NULL
      terminated.  Its size can be determined by checking TSize -
      12.  Device entries will have eight bytes stored as two 4
      byte entries (in little-endian format).  The first entry
      will be the major device number, and the second the minor
      device number.

      [Info-ZIP note: The fixed part of this field has the same layout as
      Info-ZIP's abandoned "Unix1 timestamps & owner ID info" extra field;
      only the two tag bytes are different.]

As we can see, the NTFS and Unix blocks clearly define their timestamp as using UTC. The NTFS date has more precision (100ms) than the Unix timestamps (1s), it will also survive much longer since it uses 64 bits (see Year 2038 Problem for further details on the 32 bit timestamps).

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • Extended Timestamp Extra Field dates back to at least 1997. NTFS timestamps was released in 1999. https://opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld.auto.html – user3342816 Apr 15 '21 at 20:58
  • @user3342816 Oh! Great document. I added the details about the two blocks with timestamps (I suppose there are more, but that's probably the two used the most at this point). – Alexis Wilke Apr 16 '21 at 15:56