acefile 0.6 API Documentation

Read/test/extract ACE 1.0 and 2.0 archives in pure python.

This single-file, pure python 3, no-dependencies implementation is intended to be used as a library, but also provides a stand-alone unace utility. As mostly pure-python implementation, it is significantly slower than native implementations, but more robust against vulnerabilities.

This implementation supports up to version 2.0 of the ACE archive format, including the EXE, DELTA, PIC and SOUND modes of ACE 2.0, password protected archives and multi-volume archives. It does not support writing to archives. It is an implementation from scratch, based on the 1998 document titled “Technical information of the archiver ACE v1.2” by Marcel Lemke, using unace 2.5 and WinAce 2.69 by Marcel Lemke as reference implementations.

For more information, API documentation, source code, packages and release notifications, refer to:

API

Typical use of acefile has the following structure:

import acefile
with acefile.open('example.ace') as f:
    # operations on AceArchive f
    for member in f:
        # operations on AceArchive f and each AceMember member

See acefile.AceArchive and acefile.AceMember for the complete descriptions of the methods supported by these two classes.

Functions

acefile.is_acefile(file, *, search=524288)

Return True iff file refers to an ACE archive by filename or seekable file-like object. If search is 0, the archive must start at position 0 in file, otherwise the first search bytes are searched for the magic bytes **ACE** that mark the ACE main header. For 1:1 compatibility with the official unace, 1024 sectors are searched by default, even though none of the SFX stubs that come with ACE compressors are that large.

acefile.open(file, mode='r', \*, search=524288)

Open archive from file, which is either a filename or seekable file-like object, and return an instance of AceArchive representing the opened archive that can function as a context manager. Only mode ‘r’ is implemented. If search is 0, the archive must start at position 0 in file, otherwise the first search bytes are searched for the magic bytes **ACE** that mark the ACE main header. For 1:1 compatibility with the official unace, 1024 sectors are searched by default, even though none of the SFX stubs that come with ACE compressors are that large.

Multi-volume archives are represented by a single AceArchive object to the caller, all operations transparently read into subsequent volumes as required. To load a multi-volume archive, either open the first volume of the series by filename, or provide a list or tuple of all file-like objects or filenames in the correct order in file.

AceArchive Class

class acefile.AceArchive(file, mode='r', *, search=524288)

Represents an ACE archive, possibly consisting of multiple volumes. AceArchive is not directly instantiated; instead, instances are returned by acefile.open().

When used as a context manager, AceArchive ensures that AceArchive.close() is called after the block. When used as an iterator, AceArchive yields instances of AceMember representing all archive members in order of appearance in the archive.

close()

Close the archive and all open files. No other methods may be called after having called AceArchive.close(), but calling AceArchive.close() multiple times is permitted.

dumpheaders(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Dump all ACE file format headers in this archive and all its volumes to file.

extract(member, *, path=None, pwd=None, restore=False)

Extract an archive member to path or the current working directory. Member can refer to an AceMember object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. Raises EncryptedArchiveError if an archive member is encrypted but no password was provided. Iff restore is True, restore mtime and atime for non-dir members, file attributes and NT security information as far as supported by the platform.

Note

For solid archives, extracting members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.

extractall(*, path=None, members=None, pwd=None, restore=False)

Extract members or all members from archive to path or the current working directory. Members can contain AceMember objects, member names or indexes into the archive member list. Password pwd is used to decrypt encrypted archive members. To extract archives that use multiple different passwords for different archive members, you must use AceArchive.extract() instead. Raises EncryptedArchiveError if an archive member is encrypted but no password was provided. Iff restore is True, restore mtime and atime for non-dir members, file attributes and NT security information as far as supported by the platform.

getmember(member)

Return an AceMember object corresponding to archive member member. Raise KeyError or IndexError if member is not found in archive. Member can refer to an AceMember object, a member name or an index into the archive member list. If member is a name and it occurs multiple times in the archive, then the last member with matching filename is returned.

getmembers()

Return a list of AceMember objects for the members of the archive. The objects are in the same order as they are in the archive. For simply iterating over the members of an archive, it is more concise and functionally equivalent to directly iterate over the AceArchive instance instead of over the list returned by AceArchive.getmembers().

getnames()

Return a list of the (file)names of all the members in the archive in the order they are in the archive.

is_locked()

Return True iff archive is locked for further modifications. Since this implementation does not support writing to archives, presence or absence of the flag in an archive does not change any behaviour of acefile.

is_multivolume()

Return True iff archive is a multi-volume archive as determined by the archive headers. When opening the last volume of a multi-volume archive, this returns True even though only a single volume was loaded.

is_solid()

Return True iff archive is a solid archive, i.e. iff the archive members are linked to each other by sharing the same LZ77 dictionary. Members of solid archives should always be read/tested/extracted in the order they appear in the archive in order to avoid costly decompression restarts from the beginning of the archive.

read(member, *, pwd=None)

Read the decompressed bytes of an archive member. Member can refer to an AceMember object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. Raises EncryptedArchiveError if the archive member is encrypted but no password was provided.

Note

For solid archives, reading members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.

Note

Using AceArchive.read() for large files is inefficient and may fail for very large files. Using AceArchive.readblocks() to write the data to disk in blocks ensures that large files can be handled efficiently.

readblocks(member, *, pwd=None)

Read the archive member by yielding blocks of decompressed bytes. Member can refer to an AceMember object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. Raises EncryptedArchiveError if the archive member is encrypted but no password was provided.

Note

For solid archives, reading members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.

test(member, *, pwd=None)

Test an archive member. Returns False if any corruption was found, True if the header and decompression was okay. Member can refer to an AceMember object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. Raises EncryptedArchiveError if the archive member is encrypted but no password was provided.

Note

For solid archives, testing members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.

testall(*, pwd=None)

Test all the members in the archive. Returns the name of the first archive member with a failing header or content CRC, or None if all members were okay. Password pwd is used to decrypt encrypted archive members. To test archives that use multiple different passwords for different archive members, use AceArchive.test() instead. Raises EncryptedArchiveError if an archive member is encrypted but no password was provided.

property advert

ACE archive advert string as str. Unregistered versions of ACE compressors communicate that they are unregistered by including an advert string of *UNREGISTERED VERSION* in archives they create. If absent, empty str.

property comment

ACE archive level comment as str. If absent, empty str.

property cversion

ACE creator version. This is equal to the major version of the ACE compressor used to create the archive, which equals the highest version of the ACE format supported by the ACE compressor which produced the archive.

property datetime

Archive timestamp as datetime.datetime object.

property eversion

ACE extractor version. This is the version of the ACE decompressor required to extract, which equals the version of the ACE format this archive is compliant with.

property filename

ACE archive filename. This is not a property of the archive but rather just the filename passed to acefile.open().

property platform

String describing the platform on which the ACE archive was created. This is derived from the host field in the archive header.

property volume

ACE archive volume number of the first volume of this ACE archive.

property volumes_loaded

Number of loaded volumes in this archives. When opening a subsequent volume of a multi-volume archive, this may be lower than the theoretical volume count.

AceMember Class

class acefile.AceMember

Represents a single archive member, potentially spanning multiple archive volumes. AceMember is not directly instantiated; instead, instances are returned by AceArchive.getmember() and AceArchive.getmembers().

is_dir()

True iff AceMember instance describes a directory.

is_enc()

True iff AceMember instance describes an encrypted archive member.

is_reg()

True iff AceMember instance describes a regular file.

property attribs

DOS/Windows file attribute bit field, as int, as produced by the Windows GetFileAttributes() API.

property comment

File-level comment, as str. If absent, empty str.

property compqual

Compression quality used; one of QUAL_NONE, QUAL_FASTEST, QUAL_FAST, QUAL_NORMAL, QUAL_GOOD or QUAL_BEST.

property comptype

Compression type used; one of COMP_STORED, COMP_LZ77 or COMP_BLOCKED.

property crc32

ACE CRC-32 checksum of decompressed data as recorded in the archive, as int. ACE CRC-32 is the bitwise inverse of standard CRC-32.

property datetime

Timestamp as recorded in the archive, as datetime.datetime instance.

property dicsize

LZ77 dictionary size required for extraction of this archive member in literal symbols, ranging from 1K to 4M.

property dicsizebits

LZ77 dictionary size bit length, i.e. the base-two logarithm of the dictionary size required for extraction of this archive member.

property filename

Sanitized filename, as str, safe for use with file operations on the current platform.

property ntsecurity

NT security descriptor as bytes, describing the owner, primary group and discretionary access control list (DACL) of the archive member, as produced by the Windows GetFileSecurity() API with the OWNER_SECURITY_INFORMATION, GROUP_SECURITY_INFORMATION and DACL_SECURITY_INFORMATION flags set. If absent, empty bytes.

property packsize

Size before decompression (packed size).

property raw_filename

Raw, unsanitized filename, as bytes, not safe for use with file operations and possibly using path syntax from other platforms.

property size

Size after decompression (original size).

Constants

acefile.COMP_STORED

The compression type constant for no compression.

acefile.COMP_LZ77

The compression type constant for ACE 1.0 LZ77 mode.

acefile.COMP_BLOCKED

The compression type constant for ACE 2.0 blocked mode.

acefile.QUAL_NONE

The compression quality constant for no compression.

acefile.QUAL_FASTEST

The compression quality constant for fastest compression.

acefile.QUAL_FAST

The compression quality constant for fast compression.

acefile.QUAL_NORMAL

The compression quality constant for normal compression.

acefile.QUAL_GOOD

The compression quality constant for good compression.

acefile.QUAL_BEST

The compression quality constant for best compression.

Exceptions

exception acefile.AceError

Base class for all acefile exceptions.

exception acefile.CorruptedArchiveError

Bases: AceError

Archive is corrupted. Either a header or data CRC check failed, an invalid value was read from the archive or the archive is truncated.

exception acefile.EncryptedArchiveError

Bases: AceError

Archive member is encrypted but either no password was provided, or decompression failed with the given password. Also raised when processing an encrypted solid archive member out of order, when any previous archive member uses a different password than the archive member currently being accessed.

Note

Due to the lack of a password verifier in the ACE file format, there is no straightforward way to distinguish a wrong password from a corrupted archive. If the CRC check of an encrypted archive member fails or an CorruptedArchiveError is encountered during decompression, it is assumed that the password was wrong and as a consequence, EncryptedArchiveError is raised.

exception acefile.MainHeaderNotFoundError

Bases: AceError

The main ACE header marked by the magic bytes **ACE** could not be found. Either the search argument was to small or the archive is not an ACE format archive.

exception acefile.MultiVolumeArchiveError

Bases: AceError

A multi-volume archive was expected but a normal archive was found, or mismatching volumes were provided, or while reading a member from a multi-volume archive, the member headers indicate that the member continues in the next volume, but no next volume was found or provided.

exception acefile.UnknownCompressionMethodError

Bases: AceError

Data was compressed using an unknown compression method and therefore cannot be decompressed using this implementation. This should not happen for ACE 1.0 or ACE 2.0 archives since this implementation implements all existing compression methods.

Examples

Extract all files in the archive, with directories, to current working dir:

import acefile
with acefile.open('example.ace') as f:
    f.extractall()

Walk all files in the archive and test each one of them:

import acefile
with acefile.open('example.ace') as f:
    for member in f:
        if member.is_dir():
            continue
        if f.test(member):
            print("CRC OK:     %s" % member.filename)
        else:
            print("CRC FAIL:   %s" % member.filename)

In-memory decompression of a specific archive member:

import acefile
import io

filelike = io.BytesIO(b'\x73\x83\x31\x00\x00\x00\x90**ACE**\x14\x14' ...)
with acefile.open(filelike) as f:
    data = f.read('example.txt')

Handle archives potentially containing large members in chunks to avoid fully reading them into memory:

import acefile

with acefile.open('large.ace') as fi:
    with open('large.iso', 'wb') as fo:
        for block in fi.readblocks('large.iso'):
            fo.write(block)

ACE File Format

Due to the lack of documentation on the ACE file format, a high-level overview over the ACE file format is given here, in the hope that it is useful.

File Structure

ACE archives are a series of headers and optional associated data. The first header is called MAIN header; it contains the magic bytes **ACE** at offset +7 and describes the archive volume. Subsequent headers are either FILE or RECOVERY headers. FILE headers describe archive members and precede the compressed data bytes, while RECOVERY headers contain error correction data. Originally, in ACE 1.0, all headers used 32 bit length fields. With ACE 2.0, alternative 64 bit versions of these headers were introduced to support files larger than 2 GB.

In multi-volume archives, each volume begins with a MAIN header that carries a volume number. When archive members span multiple volumes, each segment has it’s own FILE header. The first volume has the filename extension *.ACE, subsequent archive volumes use *.C00 to *.C99.

Archives can have a main comment and each archive member can have a file comment. Additionally, archives can have an advert string, which is used by unregistered versions of the ACE compressor to signal that the archive was created using an unregistered version by setting it to *UNREGISTERED VERSION*.

Integrity Checks

Each header contains a 16 bit checksum over the header bytes. Each archive member has a 32 bit checksum over the decompressed bytes. ACE uses a bitwise inverted version of standard CRC-32 with polynomial 0x04C11DB7 as the 32 bit checksum, and a truncated version of that for the 16 bit checksum.

Compression Methods

Archive members are compressed using one of the following methods, as indicated in their FILE header:

stored

Data is stored as-is without any compression applied.

LZ77

ACE 1.0 plain LZ77 compression over a Huffman coded symbol stream, with configurable dictionary size of 1K..4M literals.

blocked

ACE 2.0 blocked mode compresses data in separate blocks, each block using one of the following submodes with different lossless compression techniques.

LZ77

Plain LZ77 over a Huffman coded symbol stream, with configurable dictionary size of 1K..4M literals.

EXE

LZ77 over Huffman with a preprocessor that converts the relative target addresses of x86 relative JMP and CALL instructions to absolute addresses before LZ77 compression in order to achieve a higher LZ77 compression ratio for executables.

DELTA

LZ77 over Huffman with a preprocessor that rearranges chunks of data and calculates differences between byte values, resulting in a higher LZ77 compression ratio for some inputs.

SOUND

Multi-channel audio predictor over Huffman coding, resulting in a higher compression ratio for uncompressed mono/stereo 8/16 bit sound data.

PIC

Two-dimensional pixel value predictor over Huffman coding, resulting in a higher compression ratio for uncompressed picture data.

Blocks are of variable length. Mode switch instructions are encoded into each mode’s Huffman symbol stream as a mode switch symbol followed by the target mode identifier and parameters.

Solid archives use a single dictionary for the whole archive, while non-solid archives use a separate dictionary per archive member. The compression quality parameter ranging from 0 (none) to 5 (best) influences the amount of CPU cycles the compressor spends to find an optimal compression; it has no influence on decompression.

Comments are compressed using LZP over a Huffman coded symbol stream. Advert strings and other header information is uncompressed.

Encryption

Optional encryption is applied to the compressed data stream after compression. The user-supplied password of up to 50 characters is transformed into a 160 bit Blowfish encryption key using a single application of SHA-1, using non-standard block padding. Blowfish is applied in CBC mode using a constant zero IV to each archive member separately (a cryptographical design flaw). Each archive member can have a different password, but in practice most encrypted archives use a single password for all members. There is no password verifier in the file format; the only way to verify a password is to decrypt and decompress the archive member and check the CRC.