Zip Component, Email Component, Encryption Component ActiveX Control for Zip Compression .NET Components for ASP.NET
ActiveX and .NET Components for Zip Compression, Encryption, Email, XML, S/MIME, HTML Email, Character Encoding, Digital Certificates, FTP, and more ASP Email ActiveX Component


Index of Chilkat Blog Posts

December 4, 2007

Handling characters in filenames in Zip archives that are illegal in DOS/Windows

Question:
We process thousands of different zip files every week. I recently came across some ZIP files created in a non-DOS environment with some filenames that have characters that are illegal in DOS.

Currently in the CkZip classes, I can choose to either Extract() a zipfile entry into a directory, or I can Inflate() a zip entry into memory and write it out to disk. Unfortunately, neither of these work when the entry in the zip file is extremely large and contains illegal DOS characters in the filename.

The crux of my problem is that after writing the entry to disk, I need to then "process" it and move it elsewhere … if CkZip changes the name to comply with DOS naming conventions, my program has no way of figuring out what you’ve renamed it to. Second, since the file uncompressed is so large, if I decompress it into a CkByteArray, I get an out of memory error.

So I’d like to suggest perhaps adding another overload of Expand() that can take a full path and filename to decompress into — or some property on CkZipEntry to tell my program what the DOS filename might be should it be written to disk.

FWIW, the entry in the zip file has a filename of "*.*" (without quotes). I’ve also seen filenames with question marks and other illegal DOS characters in them.

Answer:
This C++ example will explore the issue of dealing with filenames within .zip archives that use characters that are illegal / invalid in DOS and Windows:

void BadFilenameCharsInZip(void)
    {
    // First, create a .zip with some invalid DOS filename characters:
    CkZip zip;
    zip.UnlockComponent("anything");
    zip.NewZip("badCharsInFilename.zip");
	
    // The question-mark character cannot be used as a filename in Windows.
    CkZipEntry *entryAdded = zip.AppendString("?abc?.txt","this is a test this is a test this is a test");
    delete entryAdded;
	
    // The asterisk character cannot be used as a filename in Windows.
    entryAdded = zip.AppendString("*xyz.txt","this is a test this is a test this is a test");
    delete entryAdded;
    zip.WriteZipAndClose();
	
    // OK, the badCharsInFilename.zip is written.  If you try to unzip it with a typical zip utility,
    // you'll definitely see the problem.
	
    // Open the .zip with CkZip:
    CkZip zip2;
    zip2.OpenZip("badCharsInFilename.zip");
    int i;
    int n = zip2.get_NumEntries();
	
    // Iterate over the entries and examine the filenames (to confirm that the
    // filenames have not changed in any way).
    for (i=0; i<n; i++)
	{
	CkZipEntry *e = zip2.GetEntryByIndex(i);
	if (e)
	    {
	    printf("%s\n",e->fileName());
	    delete e;
	    }
	}
	
    // Now try to unzip.
    int numUnzipped = zip2.Unzip("badCharsDir");
    printf("numUnzipped = %d\n",numUnzipped);
    printf("%s\n",zip2.lastErrorText());
	
    // The Unzip method returns a value of -1, and this is what we find in the log:
    /*
    ChilkatLog:
      Unzip:
	DllDate: Dec  4 2007
	Username: Chilkat
	Component: Visual C++ 6.0
	UnzipDir: badCharsDir
	UnzipFailedFilename: badCharsDir\?abc?.txt
	UnzipFailedFilename: badCharsDir\*xyz.txt
	NumUnzipped: 0
	Not all files extracted successfully.
    */
	
    // Loop over the zip entries, updating each filename in the way our app chooses:
    for (i=0; i<n; i++)
	{
	CkZipEntry *e = zip2.GetEntryByIndex(i);
	if (e)
	    {
	    CkString strFilename;
	    e->get_FileName(strFilename);
	    strFilename.replaceChar('?','Q');
	    strFilename.replaceChar('*','A');
	    e->put_FileName(strFilename.getString());
	
	    printf("%s\n",e->fileName());
	    delete e;
	    }
	}
	
    // Now try unzipping again...  everthing works!
    numUnzipped = zip2.Unzip("badCharsDir");
    printf("numUnzipped = %d\n",numUnzipped);
    printf("%s\n",zip2.lastErrorText());
	
    }
	


Privacy Statement. Copyright 2000-2011 Chilkat Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com

Components for Microsoft Windows XP, 2000, 2003 Server, Vista, Windows 7, and Windows 95/98/NT4.