acefile 0.6 API Documentation¶
Read/test/extract ACE 1.0 and 2.0 archives in pure python.
This single-file, pure python 3, no-dependencies implementation is intended to be used as a library, but also provides a stand-alone unace utility. As mostly pure-python implementation, it is significantly slower than native implementations, but more robust against vulnerabilities.
This implementation supports up to version 2.0 of the ACE archive format, including the EXE, DELTA, PIC and SOUND modes of ACE 2.0, password protected archives and multi-volume archives. It does not support writing to archives. It is an implementation from scratch, based on the 1998 document titled “Technical information of the archiver ACE v1.2” by Marcel Lemke, using unace 2.5 and WinAce 2.69 by Marcel Lemke as reference implementations.
For more information, API documentation, source code, packages and release notifications, refer to:
- https://www.roe.ch/acefile
- https://apidoc.roe.ch/acefile
- https://github.com/droe/acefile
- https://pypi.python.org/pypi/acefile
- https://twitter.com/droethlisberger
API¶
Typical use of acefile
has the following structure:
import acefile
with acefile.open('example.ace') as f:
# operations on AceArchive f
for member in f:
# operations on AceArchive f and each AceMember member
See acefile.AceArchive
and acefile.AceMember
for the
complete descriptions of the methods supported by these two classes.
Functions¶
-
acefile.
is_acefile
(file, *, search=524288)¶ Return True iff file refers to an ACE archive by filename or seekable file-like object. If search is 0, the archive must start at position 0 in file, otherwise the first search bytes are searched for the magic bytes
**ACE**
that mark the ACE main header. For 1:1 compatibility with the official unace, 1024 sectors are searched by default, even though none of the SFX stubs that come with ACE compressors are that large.
-
acefile.
open
(file, mode='r', *, search=524288)¶ Open archive from file, which is either a filename or seekable file-like object, and return an instance of
AceArchive
representing the opened archive that can function as a context manager. Only mode ‘r’ is implemented. If search is 0, the archive must start at position 0 in file, otherwise the first search bytes are searched for the magic bytes**ACE**
that mark the ACE main header. For 1:1 compatibility with the official unace, 1024 sectors are searched by default, even though none of the SFX stubs that come with ACE compressors are that large.Multi-volume archives are represented by a single
AceArchive
object to the caller, all operations transparently read into subsequent volumes as required. To load a multi-volume archive, either open the first volume of the series by filename, or provide a list or tuple of all file-like objects or filenames in the correct order in file.
AceArchive Class¶
-
class
acefile.
AceArchive
(file, mode='r', *, search=524288)¶ Represents an ACE archive, possibly consisting of multiple volumes.
AceArchive
is not directly instantiated; instead, instances are returned byacefile.open()
.When used as a context manager,
AceArchive
ensures thatAceArchive.close()
is called after the block. When used as an iterator,AceArchive
yields instances ofAceMember
representing all archive members in order of appearance in the archive.-
close
()¶ Close the archive and all open files. No other methods may be called after having called
AceArchive.close()
, but callingAceArchive.close()
multiple times is permitted.
-
dumpheaders
(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)¶ Dump all ACE file format headers in this archive and all its volumes to file.
-
extract
(member, *, path=None, pwd=None, restore=False)¶ Extract an archive member to path or the current working directory. Member can refer to an
AceMember
object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. RaisesEncryptedArchiveError
if an archive member is encrypted but no password was provided. Iff restore is True, restore mtime and atime for non-dir members, file attributes and NT security information as far as supported by the platform.Note
For solid archives, extracting members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.
-
extractall
(*, path=None, members=None, pwd=None, restore=False)¶ Extract members or all members from archive to path or the current working directory. Members can contain
AceMember
objects, member names or indexes into the archive member list. Password pwd is used to decrypt encrypted archive members. To extract archives that use multiple different passwords for different archive members, you must useAceArchive.extract()
instead. RaisesEncryptedArchiveError
if an archive member is encrypted but no password was provided. Iff restore is True, restore mtime and atime for non-dir members, file attributes and NT security information as far as supported by the platform.
-
getmember
(member)¶ Return an
AceMember
object corresponding to archive member member. RaiseKeyError
orIndexError
if member is not found in archive. Member can refer to anAceMember
object, a member name or an index into the archive member list. If member is a name and it occurs multiple times in the archive, then the last member with matching filename is returned.
-
getmembers
()¶ Return a list of
AceMember
objects for the members of the archive. The objects are in the same order as they are in the archive. For simply iterating over the members of an archive, it is more concise and functionally equivalent to directly iterate over theAceArchive
instance instead of over the list returned byAceArchive.getmembers()
.
-
getnames
()¶ Return a list of the (file)names of all the members in the archive in the order they are in the archive.
-
is_locked
()¶ Return True iff archive is locked for further modifications. Since this implementation does not support writing to archives, presence or absence of the flag in an archive does not change any behaviour of
acefile
.
-
is_multivolume
()¶ Return True iff archive is a multi-volume archive as determined by the archive headers. When opening the last volume of a multi-volume archive, this returns True even though only a single volume was loaded.
-
is_solid
()¶ Return True iff archive is a solid archive, i.e. iff the archive members are linked to each other by sharing the same LZ77 dictionary. Members of solid archives should always be read/tested/extracted in the order they appear in the archive in order to avoid costly decompression restarts from the beginning of the archive.
-
read
(member, *, pwd=None)¶ Read the decompressed bytes of an archive member. Member can refer to an
AceMember
object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. RaisesEncryptedArchiveError
if the archive member is encrypted but no password was provided.Note
For solid archives, reading members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.
Note
Using
AceArchive.read()
for large files is inefficient and may fail for very large files. UsingAceArchive.readblocks()
to write the data to disk in blocks ensures that large files can be handled efficiently.
-
readblocks
(member, *, pwd=None)¶ Read the archive member by yielding blocks of decompressed bytes. Member can refer to an
AceMember
object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. RaisesEncryptedArchiveError
if the archive member is encrypted but no password was provided.Note
For solid archives, reading members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.
-
test
(member, *, pwd=None)¶ Test an archive member. Returns False if any corruption was found, True if the header and decompression was okay. Member can refer to an
AceMember
object, a member name or an index into the archive member list. Password pwd is used to decrypt the archive member if it is encrypted. RaisesEncryptedArchiveError
if the archive member is encrypted but no password was provided.Note
For solid archives, testing members in a different order than they appear in the archive works, but is potentially very slow, because the decompressor needs to restart decompression at the beginning of the solid archive to restore internal decompressor state. For encrypted solid archives, out of order access may fail when archive members use different passwords.
-
testall
(*, pwd=None)¶ Test all the members in the archive. Returns the name of the first archive member with a failing header or content CRC, or None if all members were okay. Password pwd is used to decrypt encrypted archive members. To test archives that use multiple different passwords for different archive members, use
AceArchive.test()
instead. RaisesEncryptedArchiveError
if an archive member is encrypted but no password was provided.
-
advert
¶ ACE archive advert string as
str
. Unregistered versions of ACE compressors communicate that they are unregistered by including an advert string of*UNREGISTERED VERSION*
in archives they create. If absent, emptystr
.
-
comment
¶ ACE archive level comment as
str
. If absent, emptystr
.
-
cversion
¶ ACE creator version. This is equal to the major version of the ACE compressor used to create the archive, which equals the highest version of the ACE format supported by the ACE compressor which produced the archive.
-
datetime
¶ Archive timestamp as
datetime.datetime
object.
-
eversion
¶ ACE extractor version. This is the version of the ACE decompressor required to extract, which equals the version of the ACE format this archive is compliant with.
-
filename
¶ ACE archive filename. This is not a property of the archive but rather just the filename passed to
acefile.open()
.
-
platform
¶ String describing the platform on which the ACE archive was created. This is derived from the host field in the archive header.
-
volume
¶ ACE archive volume number of the first volume of this ACE archive.
-
volumes_loaded
¶ Number of loaded volumes in this archives. When opening a subsequent volume of a multi-volume archive, this may be lower than the theoretical volume count.
-
AceMember Class¶
-
class
acefile.
AceMember
¶ Represents a single archive member, potentially spanning multiple archive volumes.
AceMember
is not directly instantiated; instead, instances are returned byAceArchive.getmember()
andAceArchive.getmembers()
.-
attribs
¶ DOS/Windows file attribute bit field, as
int
, as produced by the WindowsGetFileAttributes()
API.
-
comment
¶ File-level comment, as
str
. If absent, emptystr
.
-
compqual
¶ Compression quality used; one of
QUAL_NONE
,QUAL_FASTEST
,QUAL_FAST
,QUAL_NORMAL
,QUAL_GOOD
orQUAL_BEST
.
-
comptype
¶ Compression type used; one of
COMP_STORED
,COMP_LZ77
orCOMP_BLOCKED
.
-
crc32
¶ ACE CRC-32 checksum of decompressed data as recorded in the archive, as
int
. ACE CRC-32 is the bitwise inverse of standard CRC-32.
-
datetime
¶ Timestamp as recorded in the archive, as
datetime.datetime
instance.
-
dicsize
¶ LZ77 dictionary size required for extraction of this archive member in literal symbols, ranging from 1K to 4M.
-
dicsizebits
¶ LZ77 dictionary size bit length, i.e. the base-two logarithm of the dictionary size required for extraction of this archive member.
-
filename
¶ Sanitized filename, as
str
, safe for use with file operations on the current platform.
-
ntsecurity
¶ NT security descriptor as
bytes
, describing the owner, primary group and discretionary access control list (DACL) of the archive member, as produced by the WindowsGetFileSecurity()
API with theOWNER_SECURITY_INFORMATION
,GROUP_SECURITY_INFORMATION
andDACL_SECURITY_INFORMATION
flags set. If absent, emptybytes
.
-
packsize
¶ Size before decompression (packed size).
-
raw_filename
¶ Raw, unsanitized filename, as
bytes
, not safe for use with file operations and possibly using path syntax from other platforms.
-
size
¶ Size after decompression (original size).
-
Constants¶
-
acefile.
COMP_STORED
¶ The compression type constant for no compression.
-
acefile.
COMP_LZ77
¶ The compression type constant for ACE 1.0 LZ77 mode.
-
acefile.
COMP_BLOCKED
¶ The compression type constant for ACE 2.0 blocked mode.
-
acefile.
QUAL_NONE
¶ The compression quality constant for no compression.
-
acefile.
QUAL_FASTEST
¶ The compression quality constant for fastest compression.
-
acefile.
QUAL_FAST
¶ The compression quality constant for fast compression.
-
acefile.
QUAL_NORMAL
¶ The compression quality constant for normal compression.
-
acefile.
QUAL_GOOD
¶ The compression quality constant for good compression.
-
acefile.
QUAL_BEST
¶ The compression quality constant for best compression.
Exceptions¶
-
exception
acefile.
CorruptedArchiveError
¶ Bases:
acefile.AceError
Archive is corrupted. Either a header or data CRC check failed, an invalid value was read from the archive or the archive is truncated.
-
exception
acefile.
EncryptedArchiveError
¶ Bases:
acefile.AceError
Archive member is encrypted but either no password was provided, or decompression failed with the given password. Also raised when processing an encrypted solid archive member out of order, when any previous archive member uses a different password than the archive member currently being accessed.
Note
Due to the lack of a password verifier in the ACE file format, there is no straightforward way to distinguish a wrong password from a corrupted archive. If the CRC check of an encrypted archive member fails or an
CorruptedArchiveError
is encountered during decompression, it is assumed that the password was wrong and as a consequence,EncryptedArchiveError
is raised.
-
exception
acefile.
MainHeaderNotFoundError
¶ Bases:
acefile.AceError
The main ACE header marked by the magic bytes
**ACE**
could not be found. Either the search argument was to small or the archive is not an ACE format archive.
-
exception
acefile.
MultiVolumeArchiveError
¶ Bases:
acefile.AceError
A multi-volume archive was expected but a normal archive was found, or mismatching volumes were provided, or while reading a member from a multi-volume archive, the member headers indicate that the member continues in the next volume, but no next volume was found or provided.
-
exception
acefile.
UnknownCompressionMethodError
¶ Bases:
acefile.AceError
Data was compressed using an unknown compression method and therefore cannot be decompressed using this implementation. This should not happen for ACE 1.0 or ACE 2.0 archives since this implementation implements all existing compression methods.
Examples¶
Extract all files in the archive, with directories, to current working dir:
import acefile
with acefile.open('example.ace') as f:
f.extractall()
Walk all files in the archive and test each one of them:
import acefile
with acefile.open('example.ace') as f:
for member in f:
if member.is_dir():
continue
if f.test(member):
print("CRC OK: %s" % member.filename)
else:
print("CRC FAIL: %s" % member.filename)
In-memory decompression of a specific archive member:
import acefile
import io
filelike = io.BytesIO(b'\x73\x83\x31\x00\x00\x00\x90**ACE**\x14\x14' ...)
with acefile.open(filelike) as f:
data = f.read('example.txt')
Handle archives potentially containing large members in chunks to avoid fully reading them into memory:
import acefile
with acefile.open('large.ace') as fi:
with open('large.iso', 'wb') as fo:
for block in fi.readblocks('large.iso'):
fo.write(block)
ACE File Format¶
Due to the lack of documentation on the ACE file format, a high-level overview over the ACE file format is given here, in the hope that it is useful.
File Structure¶
ACE archives are a series of headers and optional associated data. The first
header is called MAIN
header; it contains the magic bytes **ACE**
at
offset +7 and describes the archive volume. Subsequent headers are either
FILE
or RECOVERY
headers. FILE
headers describe archive members
and precede the compressed data bytes, while RECOVERY
headers contain error
correction data. Originally, in ACE 1.0, all headers used 32 bit length
fields. With ACE 2.0, alternative 64 bit versions of these headers were
introduced to support files larger than 2 GB.
In multi-volume archives, each volume begins with a MAIN
header that
carries a volume number. When archive members span multiple volumes, each
segment has it’s own FILE
header. The first volume has the filename
extension *.ACE
, subsequent archive volumes use *.C00
to *.C99
.
Archives can have a main comment and each archive member can have a file
comment. Additionally, archives can have an advert string, which is used by
unregistered versions of the ACE compressor to signal that the archive was
created using an unregistered version by setting it to *UNREGISTERED
VERSION*
.
Integrity Checks¶
Each header contains a 16 bit checksum over the header bytes. Each archive
member has a 32 bit checksum over the decompressed bytes. ACE uses a bitwise
inverted version of standard CRC-32 with polynomial 0x04C11DB7
as the
32 bit checksum, and a truncated version of that for the 16 bit checksum.
Compression Methods¶
Archive members are compressed using one of the following methods, as indicated
in their FILE
header:
- stored
- Data is stored as-is without any compression applied.
- LZ77
- ACE 1.0 plain LZ77 compression over a Huffman coded symbol stream, with configurable dictionary size of 1K..4M literals.
- blocked
ACE 2.0 blocked mode compresses data in separate blocks, each block using one of the following submodes with different lossless compression techniques.
- LZ77
- Plain LZ77 over a Huffman coded symbol stream, with configurable dictionary size of 1K..4M literals.
- EXE
- LZ77 over Huffman with a preprocessor that converts the relative target addresses of x86 relative JMP and CALL instructions to absolute addresses before LZ77 compression in order to achieve a higher LZ77 compression ratio for executables.
- DELTA
- LZ77 over Huffman with a preprocessor that rearranges chunks of data and calculates differences between byte values, resulting in a higher LZ77 compression ratio for some inputs.
- SOUND
- Multi-channel audio predictor over Huffman coding, resulting in a higher compression ratio for uncompressed mono/stereo 8/16 bit sound data.
- PIC
- Two-dimensional pixel value predictor over Huffman coding, resulting in a higher compression ratio for uncompressed picture data.
Blocks are of variable length. Mode switch instructions are encoded into each mode’s Huffman symbol stream as a mode switch symbol followed by the target mode identifier and parameters.
Solid archives use a single dictionary for the whole archive, while non-solid archives use a separate dictionary per archive member. The compression quality parameter ranging from 0 (none) to 5 (best) influences the amount of CPU cycles the compressor spends to find an optimal compression; it has no influence on decompression.
Comments are compressed using LZP over a Huffman coded symbol stream. Advert strings and other header information is uncompressed.
Encryption¶
Optional encryption is applied to the compressed data stream after compression. The user-supplied password of up to 50 characters is transformed into a 160 bit Blowfish encryption key using a single application of SHA-1, using non-standard block padding. Blowfish is applied in CBC mode using a constant zero IV to each archive member separately (a cryptographical design flaw). Each archive member can have a different password, but in practice most encrypted archives use a single password for all members. There is no password verifier in the file format; the only way to verify a password is to decrypt and decompress the archive member and check the CRC.