LZ4 Data Compression

(C) 2013 by Antoine VIGNAU and Olivier ZARDINI

> About the article

This technical article talks about the recent LZ4 Data Compression algorithm (April 2011) and its capability to become a good multi-purpose lossless compression format for our beloved 16 bits Apple IIgs computer (1986).  Well known to use an asymmetry data compression algorithm (complex / slow to compress, easy / fast to decompress), we see the LZ4 to be a perfect candidate for our Cross Development Tools Project, where modern platforms like PC running Windows are used to create the new Apple IIgs software (instead of using the native Apple IIgs tools, as we did in the past). We can compress the data using a modern PC and we only need the Apple IIgs to be able to uncompress the LZ4 data stream. Mostly for games programming area, where we have to handle multi-types of data (Graphic, Sound, Music, Code, Misc data...), the availability of ONE generic compression / decompression algorithm would be perfect.

LZ4 Compression for Apple IIgs is part of the Brutal Deluxe's Cross Development Tools Project, a full set of utilities available on Windows (and other) platforms to enable the creation of new Apple IIgs software : 65c816 Assembler, 65c816 Disassembler, 65c816 Simulator, Graphic File Converter, Resource Catcher...


> Common Data Compression

Like many computers of the 80 / 90's area, the Apple IIgs has used most of the well popular lossless compression algorithms. Each of them where used to compress a specific type of data :

 RLE : Run Length Encoding

The idea is to search for consecutive bytes having the same value and replace the sequence by the number of times the value has been found :

CB CB CB CB CB CB CB CB CB CB CB CB A1 11 F0 D3 D3 D3 D3 D3 D3 D3 D3 D3 D3 D3 45 89 A1 F8 C6

is compressed as :

[12] CB [3] A1 11 F0 [11] D3 [5] 45 89 A1 F8 C6

We compact the same value bytes (CB, D3) and we let the other bytes as is. The algorithm gives good results for bitmap graphics, where the probability to have same consecutive bytes is high. On the other end, it is inefficient to encode Text , Music or Code because we nearly never encounter several consecutive bytes with the same value there.

The RLE has been used in most of the Graphic file formats used at that time :
        - PCX, BMP for the IBM PC
        - IFF / ILBM for the Amiga
        - PC1, NEO for the Atari ST
        - Packbyte, Apple Preferred for the Apple IIgs

RLE
is fast for both packing and unpacking. You can encode a stream while you read it, no need to pre-compute something first (requiring the file to be read once).

HUFFMAN

The idea is to encode bytes using a variable number of bit (instead of 8) depending of the frequency of the value in the sequence of bytes to encode :

Ascii  :  Estelle est belle !
Hexa   :  45 73 74 65 6C 6C 65 20 65 73 74 20 62 65 6C 6C 65 20 21
Binary :  0100 0101,0111 0011,0111 0100,0110 0101,0110 1100,0110 1100,0110 0101,0010 0000,0110 0101,0111 0011,0111 0100,0010 0000,...


is compressed as :

BINARY :  1101 101 1100 00 01 01 00 100 10 101 1100 100 1110 00 01 01 00 100 1111 + Encoding Table

The first thing to do is to count how many different Byte values we have in the stream to be compressed. Here, we use only 8 different values (E,s,t,e,l,Space, b, !). So we can use only 3 bits per byte value instead of 8. The initial length was 19 characters x 8 bits = 152 bits. We can easily drop down to 19 x 3 = 57 bits ! But we can do even better !

Let's count how many times we found each letter in the sentence :
e=5 times, l=4 times, Space=3 times, s and t=2 times, E, b  and !=1 times

So, instead of using the same number of bit to encode each value (3 bits), we use less bit for the most used values
(2 bits) and we agree to use more bits for the rare ones (4 bits) :

Encoding Table :

e
=5         00
l
=4         01
Space=3     100
s=2         101
t=2         1100
E=1         1101
b=1         1110
!=1         1111

Once compressed with this new encoding table, the number of bit required drop down to 53 (instead of 57).

The Huffman compression is efficient with Text data (used in Apple IIgs software to compress large text file like dictionary). It is not very good for pictures unless you have some colors which are very used and some other not at all.

The Huffman is fast for unpacking the data but the compression step requires to read ALL the data stream in order to compute statistics on it BEFORE starting the compress process. You also have to store the encoding Table at the begining of the compressed data, which increase a little bit the coompressed file.

LZW

The idea is to search for redundancy in the data to be compressed and to build a dictionary of such patterns. In the compressed stream, we put the index of the pattern in the dictionary :

C1 25 12 18 2A 00 00 C1 25 12 18 2A 01 2F 45 45 45 00 00 01 2F 45 45 45

is compressed as :

[1] [2] [1] [3] [2] [3] + Dictionary

The complex job of the encoding algorithm is to identify patterns used in the dictionary :

C1 25 12 18 2A     [1]
00 00              [2]
01 2F 45 45 45     [3]

The LZW family (LZ*) is efficient for all kind of data (Graphic, Text, Code...) because there is always redundant patterns in the source data stream. It offers the best compression rate (comparing to other lossless algoithms like RLE or HUFFMAN).

LZW is the compression algorithm used in several graphic file formats such as :
    - GIF
    - PNG
    - DreamGraphix file format on the Apple IIgs.

The LZW algorithm has also been used in the most popular Archive File formats :
    - Zip, RAR for the IBM PC
    - LHA, LZX for the Amiga
    - ZIP, LHARC, PACK for the Atari ST
    - GS Shrinkit for the Apple IIgs

The LZW compression process requires a lot of computations to find patterns and encode the compressed data stream (as we encode a bit stream, a lot of bit manipulation functions are required). The uncompression process is faster but requires many bit manipulations to rebuilt the original data. The LZW has been used mostly for archive files, because the compression rate was more important than the time required to compress / uncompress.The primary goal was to minimize the time required for transferring the file over the network (BBS / FTP sites).

Sound Compression

If you have to compress a sound (8 bits Wave), there is an easy way to divide by 2 the space required to store the data. A sound Wave, as we use it for the Apple IIgs, uses 8 bits unsigned numbers (00-FF). If the frequency is 22 KHz, we need  22000 bytes of data for each second of sound. It requires a lot of memory space to store the sound. Because a sound is a wave, the byte values are always ordered to be pretty close from the previous one (+/- delta) :



00 04 05 04 03 04 06 07 05 03 03 04 04 03

is compressed as :

00 +4 +1 -1 -1 +1 +2 +1 -2 -2 +0 +1 +0 -1

The idea of the delta compression is to replace the byte values by the delta values to added or subtracted from the previous (known) value. Instead of using 8 bits to code each value, we can use 3 bits to encode the Delta value and 1 bit for the Sign. So each value can be -8 or +7 from the previous one. We drop down from 8 bits to 4 bits ! Depending of the wave, we can spend more bits for the Delta. If we agree to change some of the wave values, we can enforce a 3 bits delta value but in this case we damage a little bit the signal. Most of the time, you don't notice it if the number of modified bytes is limited.. 


> LZ4 File Format

Unlike other file formats (Graphic file, Archive file...) the LZ4 file format is very easy to handle. It has only two parts, the File Header and the Compressed Stream :


The Header part has a fixed size and the Compressed Data part can be any size. The Header is usually 16 bytes long. We find there the Uncompressed Data Size (4 bytes,  Little endian) and the Compressed Data Size (4 bytes, Little endian). We can notice that Header Size (16 bytes) + Compressed Database Size = LZ4 file size.

The following example comes from the Cogito.lz4 file (7659 bytes long = 0x1DEB). The uncompressed file was 32 768 bytes long (= 
0x8000). The 0x1F code is the first byte of the Compressed Data part.


Because our primary target is to compress / uncompress Apple IIgs files (pictures, object code, sound, music...), the uncompressed file size are <= 64 KB and the LZ4 files are of course < 64 KB.  So we can read the sizes from the Header as 16 bits values (the higher bytes are always 00) :

Uncompressed Data size : Offset = Begin + 4,       Length (bytes) = 2
Compressed Data Size :     Offset = Begin + 12,     Length (bytes) = 2
Compressed Data :             Offset = Begin + 16 ,    Length (bytes) = Compressed Data Size


The Compressed data stream is divided into a set of Compressed Data Blocks. The number of Blocks is not limited and their sizes may vary. All the Blocks have the same Data structure, except for the last one  which is slightly different :


The anatomy of a Compressed Data block is described in the following picture :

 

A block is made up of 3 sections :

- A mandatory Token section (one byte long). The high 4 bits (4-7) define the Length of the Literal part. The low 4 bits (0-3) define the Length of the Match part.
- An optional Literal section (0-N bytes long). If the section is there, it may contains extra bytes used to extend the Literal Length (0-N bytes long) and it contains Literal Data bytes (0-N. bytes long).
- A mandatory Match section (2-N bytes long). The section start with a mandatory Match Offset value (2 bytes long, Little Endian) and may contain extra bytes used to extend the Match Length (0-N bytes long).

The size of the Literal section is computed as follow :

- Extract the 4 high bits of the Token byte to get the initial Literal Data size (0x0-0xF).
        - If the value is 0x0, the whole Literal section is missing. So the Literal section size is 0.
        - If the value is 0x1 to 0xE, the Length part of the Literal section is missing and the size of the Data part of the Literal section is the value (0x1 to 0xE). So the Literal section size = value.
        - If the value is 0xF, the Length part of the Literal section is there. We process the bytes UNTIL we found a byte having a value NOT EQUAL to 0xFF. That will be the LAST byte of the Literal Length part. The value of the Literal Length is the sum of all the Literal Length bytes values + initial 0xF length.
So the Literal section size = Number of bytes required to encore the Literal LengthLiteral Length value.

Here are few examples of Literal sections :

Raw Block Data :  0F ...
    - Token byte is 0x0F, so the initial Length value = 0, the Literal section is not there.

Raw Block Data :  1F 11 ...
    - Token byte is 0x1F, so the initial Length value = 0x1, the Literal Length part is not there and the Literal Data size is 0x1. The Literal section size is 0x1.
    - The Literal Data byte is : 11.

Raw Block Data :  9F DD 66 FF FF FF FF 66 DD 11 ...
    - Token byte is 0x9F, so the initial Length value = 0x9, the Literal Length part is not there and the Literal Data size is 0x9.
The Literal section size is 0x9.
    - The Literal Data bytes are : DD 66 FF FF FF FF 66 DD 11.

Raw Block Data :  F5 03 E3 E3 EE 2E 22 22 29 29 29 99 88 87 23 45 54 0F 50 30 ...
    - Token byte is 0xF5, so the initial Length value = 0xF,
    - The Literal Length part is there, so the Literal Data size = 0xF + 0x03 = 0x12.
The Literal section size = 0x1 (number of bytes used in the Literal Length part) + 0x12 (Literal Data size).
    - The Literal Data bytes are : E3 E3 EE 2E 22 22 29 29 29 99 88 87 23 45 54 0F 50 30.

Raw Block Data :  F8 FF 1C 03 20 22 29 29 99 98 98 78 77 73 45 66 66 66 65 06 66 46 46 46 44 44 44 43 43 43 33 33 06 45 45 ...
    - Token byte is 0xF8, so the initial Length value = 0xF,
    - The Literal Length part is there, so the Literal Data size = 0xF + 0xFF + 0x1C = 0x12A.
The Literal section size = 0x2 (number of bytes used in the Literal Length part) + 0x12A (Literal Data size).
    - The Literal Data bytes are : 03 20 22 29 29 99 98 98 78 77 73 45 66 66 66 65 06 66 46 46 46 44 44 44 43 43 43 33 33 06 45 45 ...

The size of the Match section is computed as follow :

- Extract the 4 low bits of the Token byte to get the initial Match Data size (0x0-0xF).
        - If the value is 0x0 to 0xE, the Length part of the Match section is missing. So the Match section size = 0x2 (2 bytes to encode the Match Offset value).
        - If the value is 0xF, the Length part of the Match section is there. We process the bytes UNTIL we found a byte having a value NOT EQUAL to 0xFF. That will be the LAST byte of the Match Length part. The value of the Match Length is the sum of all the Match Length bytes values + initial 0xF length.
So the Match section size = 0x2 (2 bytes to encode the Match Offset value) + Number of bytes required to encore the Match Length.

- Because the minium value for Match Length is 0x4, we have to add a 0x4 to the previously computed Match Length value.

Here are few examples of Match sections :

Raw Block Data :  07 [Literal Part]  9E 00
    - Token byte is
0x07, so the initial Length value = 0x7. The Match Length is 0x7 + 0x4 (Minimun Length) = 0xB.
    - Match Offset is 0x9E.
    - 
The Match section size = 0x2 (number of bytes used for the Match Offset).

Raw Block Data :  0F [Literal Part]  A3 00 64
    - Token byte is
0x0F, so the initial Length value = 0xF.
    - Match Offset is 0xA3.
    - The Match Length part is there, so the Match Length = 0xF + 0x64 + 0x4 (Minimun Length) = 0x77.
    - The Match section size = 0x2 
(number of bytes used for the Match Offset) + 0x1 (number of bytes used in the Match Length part).

Raw Block Data :  0F [Literal Part]  2D 01 FF 1C
    - Token byte is
0x0F, so the initial Length value = 0xF.
    - Match Offset is 0x12D.
    - The Match Length part is there, so the Match Length = 0xF + 0xFF + 0x1C + 0x4 (Minimun Length) = 0x1EE.
    - The Match section size = 0x2 
(number of bytes used for the Match Offset) + 0x2 (number of bytes used in the Match Length part).

The LAST block of the compressed data stream is a little bit different from the others blocks :


The Literal section is there (with or without the extra bytes for the Literal Length) and the Match section is missing.


> LZ4 Decompression Algorithm

The LZ4 Decompresion algorithm is easy to write :

/* Init */
X = 0            ; Offset in the LZ4 compressed data stream
Y = 0            ; Offset in the output uncompressed data

/*** Decode Compressed Data Stream ***/
For All Compressed Data Block
        Decode Token Byte
        X++

        /** Process Literal Bytes **/
        If Literal part is here
            Compute Literal Length
            X += Number of bytes used in Literal Length
            MemoryCopy : Source = Compressed Data[X], Destination = Uncompressed Data [Y], Length =
Literal Length
            X +=
Literal Length
            Y += Literal Length
        End If

        /** Process Match Bytes **/

        If Match part is here

            Get Match Offset
            X += 2
       
    Compute Match Length
            X += Number of bytes used in Match Length
           MemoryCopy : Source = Uncompressed Data[Y - Match Offset], Destination = Uncompressed Data [Y], Length = Match Length
            Y += Match Length
        End If
End For

We can summarize the LZ4 decompression as 2 Memory Copy operations from 2 sources :
        - The Literal Data coming from the Compressed Data stream
        - The Match Data coming form the already Uncompressed Data stream.

Because the Match Offset is 2 bytes long, we are working within a window of 64 KB (from current Destination position) when we copy Match Data.
Unlike other LZ* family algorithms (LZW...), the LZ4 decompression algorithm do not require any Dictionary Lookups or complex Bit Manipulation operations (bit shift). The algorithm only deals with Bytes and unsigned integer numbers (8 bits or 16 bits).


> LZ4 Decompression in 65c816 Assembly Language

The LZ4 seems to be taylor made for 16 bits processors like the 65c816 because of the usage of unsigned 8 bits / 16 bits numbers and 64 KB window limit. So creating the 65816 unpacking code was not a real challenge. One option was to re-use the 6502 / 65c02 code from Peter Ferrie (by moving some parts into 16 bits native code like the ADC) but we have chosen another direction. The 65c816 is much more than a 16 bits 6502 and we could probably took advantage of the Copy Memory Block functions (MVM / MVP) and the 16 bits index (X and Y).

Our LZ4 Unpacker source code is 65 lines long and once assembled, takes only 133 bytes in memory. It can unpack up to 64 KB of compressed LZ4 data from any memory Bank to another one. The code can run from any memory location and do not use the Direct Page. It has to be called in native mode (16 bits). Here are few explanations about how the unpack job is done :

- The 16 bits X register is the current offset in the Compressed Data stream (in source bank).
- The 16 bits Y register is the current offset in the Uncompressed Data stream (in destination bank).
- We use MVN function to copy Data from Compressed Data stream to Uncompressed Data stream (
Literal Data copy)  and to copy Data from Uncompressed Data stream to itself (Match Data copy).
- We don't need to increment X and Y registers after the copy operations because MVN already does it for us !
- We check the End Of File after processing the Literal section because the LAST block doesn't have a Match section.
- The same function (
LZ4_GetLengthLit / LZ4_GetLengthMat) computes the Literal or Match length.

Here is the LZ4 unpacker source code, ready to be assembled into Merlin 16+ or Orca M :

                 MX   %00
*-------------------------------------------------------------------------------
LZ4_Unpack       STA  LZ4_Literal_3+1   ; Uncompress a LZ4 Packed Data buffer (64 KB max)
                 SEP  #$20              ; A = Bank Src,Bank Dst
                 STA  LZ4_Match_5+1     ; X = Header Size = 1st Packed Byte offset
                 STA  LZ4_Match_5+2     ; Y = Pack Data Size
                 XBA                    ;  => Return in A the length of unpacked Data
                 STA  LZ4_ReadToken+3  
                 STA  LZ4_Match_1+3    
                 STA  LZ4_GetLength_1+3
                 REP  #$30
                 STY  LZ4_Limit+1
*--
                 LDY  #$0000            ; Init Target unpacked Data offset
LZ4_ReadToken    LDAL $AA0000,X         ; Read Token Byte
                 INX
                 STA  LZ4_Match_2+1
*----------------
LZ4_Literal      AND  #$00F0            ; >>> Process Literal Bytes <<<
                 BEQ  LZ4_Limit         ; No Literal
                 CMP  #$00F0
                 BNE  LZ4_Literal_1
                 JSR  LZ4_GetLengthLit  ; Compute Literal Length with next bytes
                 BRA  LZ4_Literal_2
LZ4_Literal_1    LSR                    ; Literal Length use the 4 bit
                 LSR
                 LSR
                 LSR
*--
LZ4_Literal_2    DEC                    ; Copy A+1 Bytes
LZ4_Literal_3    MVN  $AA,$BB           ; Copy Literal Bytes from packed data buffer
                 PHK                    ; X and Y are auto incremented
                 PLB
*----------------
LZ4_Limit        CPX  #$AAAA            ; End Of Packed Data buffer ?
                 BEQ  LZ4_End
*----------------
LZ4_Match        TYA                    ; >>> Process Match Bytes <<<
                 SEC
LZ4_Match_1      SBCL $AA0000,X         ; Match Offset
                 INX
                 INX
                 STA  LZ4_Match_4+1
*--
LZ4_Match_2      LDA  #$0000            ; Current Token Value
                 AND  #$000F
                 CMP  #$000F
                 BNE  LZ4_Match_3
                 JSR  LZ4_GetLengthMat  ; Compute Match Length with next bytes
LZ4_Match_3      CLC
                 ADC  #$0003            ; Minimum Match Length is 4 (-1 for the MVN)
*--
                 PHX
LZ4_Match_4      LDX  #$AAAA            ; Match Byte Offset
LZ4_Match_5      MVN  $BB,$BB           ; Copy Match Bytes from unpacked data buffer
                 PHK                    ; X and Y are auto incremented
                 PLB
                 PLX
*----------------
                 BRA  LZ4_ReadToken
*----------------
LZ4_GetLengthLit LDA  #$000F            ; Compute Variable Length (Literal or Match)
LZ4_GetLengthMat STA  LZ4_GetLength_2+1
LZ4_GetLength_1  LDAL $AA0000,X         ; Read Length Byte
                 INX
                 AND  #$00FF
                 CMP  #$00FF
                 BNE  LZ4_GetLength_3
                 CLC
LZ4_GetLength_2  ADC  #$000F
                 STA  LZ4_GetLength_2+1
                 BRA  LZ4_GetLength_1
LZ4_GetLength_3  ADC  LZ4_GetLength_2+1
                 RTS
*----------------
LZ4_End          TYA                    ; A = Length of Unpack Data
                 RTS
*-------------------------------------------------------------------------------


Before calling the unpack code function, you have to allocate 2 memory banks of 64 KB, one containing the LZ4 file (Source Bank) and one ready to receive the Uncompressed Data (Destination Bank). Because we are only processing file with a size < 64 KB, we don't need more space and we don't have to deal with bank boundary. The LZ4 file will be loaded at the beginning of Source bank and the uncompressed data will be written at the beginning of the Destination bank. We also need to provide the Header size (16 bytes) and the LZ4 file size (used to stop the unpacking process) :

UnpackGraphic    LDA  #$0405            ; A = Bank Source ($04/0000), Bank Destination ($05/0000)
                 LDX  #$0010            ; 
X = LZ4 file Header Size = 1st Packed Byte offset
                 LDY  LZ4FileSize       ; 
Y = LZ4 File Size
                 JSR  
LZ4_Unpack        ; Uncompress a LZ4 Packed Data buffer (64 KB max)
                 CMP  #$8000            ;  => Return in A the length of unpacked Data
                 BNE  Error  
                 ...    


> Current Limitations

The current limitations of the LZ4 decompression code for the Apple IIgs are the following ones :

- The packed data size can't be larger than 64 KB. 
- The unpacked data size can't be larger than 64 KB.
- The check for bad packed data and/or buffer overflow is limited. If you are not sure about the LZ4 file, add some error checkings.
- The code has to run in RAM due to the usage of self modified code parts.

The memory organization of the Apple IIgs forces the developer to cut the code and the data into 64 KB size segments. In other way, the data we manipulate in the Apple IIgs world is usually < 64 KB : The graphic page size is 32 KB, the Wave bank for the Ensonic sound Memory is 64 KB., the code Segment in GS/OS is < 64 KB... The 64 KB size limit of our LZ4 unpacker is not a real limit because everything we will pack will be already < 64 KB.

The lack of extensive tests about the buffer overflow or bad compressed stream format is due to the kind of usage of such compressed files : they are part of a set of files WE have previously compressed, so we guarantee the format.


> Performance Considerations

It was interesting to compare the LZ4 results to existing compression algorithms. We have first chosen to work on pictures compression, where we have already the choice of several file formats and algorithms :

- RLE used in PackByte format
- LZW used in DreamGraphix format
- LZ4 applied to a raw 32 KB picture buffer (32000 byte of pixel data + 256 bytes of SCB + 512 bytes of Palettes) 
- LZW used in GIF file format 

We have applied these compression schemes to 19 Apple IIgs pictures. They were all 320x200, using 16 colors (1 palette) or 256 colors (16 palettes). Each picture, if unpacked, was 32 KB size. We have chosen a set of different pictures, including grey scale digitalization, colored digitalization, draw, game screen shot...

For each of the file formats, we have focused on :

- The compression rate  : we give the size of the pictures once compressed and the % compared to the 32 KB uncompressed data. Smaller is the compressed data, better is the algorithm.
- The unpack duration     : we give the number of 65c816 processor cycles required to unpack the data. We also give an output KB/s speed rate (based on a 2,5 MHz Apple IIgs).

For each picture, we turn in red the best compression rate and in blue the best unpack duration rate.

In order to compute the number of 65816 cycles required for the unpacking of each file format, we have programmed a 65816 simulator and we have run the unpacker code in the simulator :

- The source code for PackByte unpacker is the one used in the Apple IIgs Rom (Misc Tool)
- The source code for DreamGraphix unpacker is the one ripped from DreamGraphix
- The source code for  LZ4 unpacker is the one described below
- The source code for GIF unpacker is the one used in Convert 3200.

All of these source codes are 100% assembly language so we can really compare them in term of speed. The compressed data was loaded in bank 04 and the uncompressed data was written in bank 05. The unpacker was running from bank 03. There were no I/O involved during the unpack process. The unpacked data size was always 32 KB (size of the uncompressed picture) except for GIF where the native output was 64 KB (8 bits / pixels for a 320*200 picture). The size of the compressed data was everytime different, due to the quality of the compression algorithm.

We have chosen to add the GIF file format in the comparison table because GIF is known to have the best compression rate for a lossless compression algorithm, so it was interesting to compare Apple IIgs native solutions with industry standards. GIF has never been a native Apple IIgs graphic file format because it doesn't support Apple IIgs features like SCB or multi palettes and also because the output of GIF is a 8 bits / pixel encoding (where the Apple IIgs uses only 4 bits / pixels) and the color map can use 8 bits per component (16 Million colors) where the IIgs palette is limited to 4 bits per component (4096 colors). Even if the GIF pictures use only 16 colors, some extra work has to be done to convert the unpacked GIF picture data into native Apple IIgs picture data. In the following table, the number of cycles for the GIF unpacking code DO NOT  take into account the extra work required to remap the data into a 4 bits / pixel encoding. We only count there the basic GIF unpacking into an 8 bits / pixels picture. The real number of cycles would be higher for a 4 bit / pixel rendering.

Apple IIgs pictures, 320*200 pixels, 16 or 256 colors, 32 KB
  Pack Byte
Dream Graphix
LZ4
GIF

10 296 bytes
31,4 %

1 360 570 cycles
58,7 KB/s
6 249 bytes
19,0 %

2 231 064 cycles
35,8 KB/s
6 505 bytes
19,8 %

332 466 cycles
240,6 KB/s
6 126 bytes
18,6 %

6 152 697 cycles
13,0 KB/s

26 375 bytes
80,4 %

1 068 277 cycles
74,8 KB/s
21 815 bytes
66,5 %

4 871 780 cycles
16,4 KB/s
23 517 bytes
71,7 %

388 026 cycles
206,1 KB/s
19 545 bytes
59,6 %

8 958 474 cycles
8,9 KB/s

15 797 bytes
48,2 %

1 037 821 cycles
77,0 KB/s
15 435 bytes
47,1 %

3 803 489 cycles
21,0 KB/s
14 799 bytes
45,1 %

299 801 cycles
266,8 KB/s
14 003 bytes
42,7 %

7 810 314 cycles
10,2 KB/s

5 589 bytes
17,0 %

1 060 387 cycles
75,4 KB/s
3 625 bytes
11,0 %

1 801 482 cycles
44,4 KB/s
2 800 bytes
8,5 %

273 653 cycles
292,3 KB/s
3 355 bytes
10,2 %

5 633 328 cycles
14,2 KB/s

10 500 bytes
32,0 %

1 079 768 cycles
74,0 KB/s
8 488 bytes
25,9 %

2 613 486 cycles
30,6 KB/s
8 862 bytes
27,0 %

308 914 cycles
258,9 KB/s
7 634 bytes
23,2 %

6 480 595 cycles
12,3 KB/s

11 236 bytes
34,2 %

1 380 135 cycles
57,9 KB/s
6 908 bytes
21,0 %

2 346 779 cycles
34,0 KB/s
6 651 bytes
20,2 %

332 758 cycles
240,4 KB/s
6 510 bytes
19,8 %

6 249 860 cycles
12,8 KB/s

24 135 bytes
73,6 %

1 678 614 cycles
47,6 KB/s
16 406 bytes
50,0 %

3 972 969 cycles
20,1 KB/s
18 873 bytes
57,5 %

454 973 cycles
175,8 KB/s
16 196 bytes
49,4 %

8 249 420 cycles
9,6 KB/s

22 842 bytes
69,7 %

1 581 637 cycles
50,5 KB/s
12 657 bytes
38,6 %

3 349 851 cycles
23,8 KB/s
7 659 bytes
23,3 %

337 687 cycles
236,9 KB/s
15 275 bytes
46,6 %

7 970 263 cycles
10,0 KB/s

21 571 bytes
65,8 %

1 774 829 cycles
45,0 KB/s
14 130 bytes
43,1 %

3 593 395 cycles
22,2 KB/s
15 297 bytes
46,6 %

438 263 cycles
182,5 KB/s
14 410 bytes
43,9 %

7 894 609 cycles
10,1 KB/s

15 635 bytes
47,7 %

1 280 057 cycles
62,4 KB/s
11 876 bytes
36,2 %

3 214 526 cycles
24,8 KB/s
13 099 bytes
39,9 %

373 387 cycles
214,2 KB/s
11 010 bytes
33,5 %

7 163 882 cycles
11,1 KB/s

18 584 bytes
56,7 %

1 729 744 cycles
46,2 KB/s
11 948 bytes
36,4 %

3 233 915 cycles
24,7 KB/s
13 217 bytes
40,3 %

418 503 cycles
191,1 KB/s
11 107 bytes
33,8 %

7 187 650 cycles
11,1 KB/s

14 285 bytes
43,5 %

1 519 662 cycles
52,6 KB/s
8 729 bytes
26,6 %

2 658 058 cycles
30,0 KB/s
9 970 bytes
30,4 %

385 497 cycles
207,5 KB/s
8 003 bytes
24,4 %

6 552 497 cycles
12,2 KB/s

23 153 bytes
70,6 %

1 781 584 cycles
44,9 KB/s
15 168 bytes
46,2 %

3 766 558 cycles
21,2 KB/s
14 807 bytes
45,1 %

422 706 cycles
189,2 KB/s
15 341 bytes
46,8 %

8 082 557 cycles
9,8 KB/s

26 166 bytes
79,8 %

1 347 634 cycles
59,3 KB/s
17 923 bytes
54,6 %

4 259 943 cycles
18,7 KB/s
20 258 bytes
61,8 %

470 362 cycles
170,0 KB/s
17 049 bytes
52,0 %

8 410 343 cycles
9,5 KB/s

13 219 bytes
40,3 %

1 562 617 cycles
51,1 KB/s
7 678 bytes
23,4 %

2 479 857 cycles
32,5 KB/s
8 640 bytes
26,3 %

362 520 cycles
220,6 KB/s
7 217 bytes
22,0 %

6 387 125 cycles
12,5 KB/s

22 741 bytes
69,4 %

1 082 730 cycles
73,8 KB/s
14 622 bytes
44,6 %

3 679 626 cycles
21,7 KB/s
18 471 bytes
56,3 %

395 589 cycles
202,2 KB/s
14 606 bytes
44,5 %

7 934 587 cycles
10,0 KB/s

25 524 bytes
77,8 %

1 326 426 cycles
60,3 KB/s
18 684 bytes
57,0 %

4 389 130 cycles
18,2 KB/s
20 592 bytes
62,8 %

422 764 cycles
189,2 KB/s
17 256 bytes
52,6 %

8 458 843 cycles
9,4 KB/s

23 288 bytes
71,0 %

1 653 214 cycles
48,3 KB/s
14 427 bytes
44,0 %

3 647 405 cycles
21,9 KB/s
16 303 bytes
49,7 %

453 836 cycles
176,2 KB/s
14 683 bytes
44,8 %

7 954 603 cycles
10,0 KB/s

15 848 bytes
48,3 %

1 287 696 cycles
62,1 KB/s
12 239 bytes
37,3 %

3 275 280 cycles
24,4 KB/s
12 548 bytes
38,2 %

354 660 cycles
225,5 KB/s
11 723 bytes
35,7 %

7 345 132 cycles
10,8 KB/s
Total / Average
346 784 bytes
55,7 %

26 593 402 cycles
57,1 KB/s
239 007 bytes
38,3 %

63 188 593 cycles
24,0 KB/s
252 868 bytes
40,6 %

7 226 365 cycles
210,3 KB/s
231 049 bytes
37,1 %

  140 876 779 cycles
10,7 KB/s


Based on this set of pictures, we can see that :

- LZW / LZ4 offer a much better compression rate than the RLE (not a real surprise).
- GIF has the best result, but some pictures may be better compressed using DreamGraphix or LZ4.
- GIF / DreamGraphix / LZ4 are very close for compression rate (around 40 % of the initial size).
- LZ4 blows up the other algorithms for the unpack speed (8 times faster than the DreamGraphix file format and 19 times faster than GIF file format, for a similar compression rate).
- The uncompression perfomance for GIF is bad. The algorithm is very slow compared to the other ones.

The unpack speed is given in KB/s, which is a good way to compare the speed of the 4 algorithms we have here. But what is the maximum speed we can have on a 2,5 MHz Apple IIgs ? If we consider the MVN instruction to be the fastest way to copy a block of data from one memory bank to another one, the maximum speed rate is 348,7 KB/s (2 500 000 cycles / second at 2,5 MHz, 7 cycles to copy 1 byte for the MVN). So if the picture was not compressed, the uncompression algorithm would be a basic memory copy from one bank to another one. Because the LZ4 algorithm is based on 2 MVN (one for the Literal bytes, one for the Match bytes), the LZ4 unpack speed is not that far (60%) for the maximum theoretical speed.

If now we consider all these 19 pictures to be located on a hard drive and we want to load them into memory and unpack them (19 * 32 KB = 608 KB), which algorithm will be faster ? Of course DreamGraphix files are smaller (233 KB) than PackByte files (338 KB), so the load step will be faster. But the unpack step is faster for PackByte (57 KB/s) than for DreamGraphix (24 KB/s). For an Apple IIgs running at 2,5 MHz, the work done with BenchmarkeD showed us than we read a file under GS/OS at about 60 KB/s (from a hard drive or a CFFA card). Of course, if the physical media is a floppy drive (3,5 inches 800 KB), the read is much more slower than the hard drive and all the computations have to be done again.

- Not Compressed : Read Data (608 KB / 60 KB/s = 10,1 s) + Unpack Data (608 KB / 348,7 KB/s = 1,7 s) = 11,8 s
- PackByte :               Read Data (338 KB / 60 KB/s =   5,6 s) + Unpack Data (608 KB / 57,1 KB/s = 10,6 s) = 16,2 s
- DreamGraphix    Read Data (233 KB / 60 KB/s =   3,8 s) + Unpack Data (608 KB / 24,0 KB/s = 25,3 s) = 29,1 s
- LZ4 :                          Read Data (246 KB / 60 KB/s =   4,1 s) + Unpack Data (608 KB / 210,3 KB/s = 2,8 s) =   6,9 s
- GIF :                            Read Data (225 KB / 60 KB/s =  3,7 s) + Unpack Data (608 KB / 10,7 KB/s = 56,8 s) = 60,5 s
 
So even if DreamGraphix lets you put much more data on a disk than PackByte, it is faster to work with PackByte files than with DreamGraphix ones (if your data are stored on a hard drive).The good news about LZ4 is that it takes the best from the two worlds : good compression rate and very fast unpack process. No doubt that the LZ4 is today the best choice for compressing Apple IIgs graphic files. The LZ4 algorithm is the only one to have an unpack speed (210 KB/s) higher than the read from disk speed (60 KB/s). It is the only one we could use for a direct-from-disk software where the data read from disk would go in real time into the Apple IIgs memory (digitalized sound, video...) with a data file size bigger than the Apple IIgs memory size.

We have performed extra tests using LZ4 compression (LZ4.exe -c2) on other type of data we can find in Apple IIgs software, especially the ones we did not compressed back in the days, by lack of generic compression tool :

- Sound : Sounds can be used as a Wave Bank for Music software or directly as digitalized sounds for games.

Midi.Wav           SynthLab Wave File                               Uncompressed Size = 67 840 bytes        LZ4 Size = 58 535 bytes        Compression Rate = 86,2 %
Cambodia        SoundSmith Instrument File               Uncompressed Size = 66 304 bytes        LZ4 Size = 41 760 bytes        Compression Rate = 62,9 %
Music.W            SoundSmith Instrument File               Uncompressed Size = 66 396 bytes        LZ4 Size = 36 939 bytes        Compression Rate = 55,6 %
Music2.W          SoundSmith Instrument File               Uncompressed Size = 66 672 bytes        LZ4 Size = 52 469 bytes        Compression Rate = 78,7 %
Sounds             LemminGS game sounds                    Uncompressed Size = 53 248 bytes        LZ4 Size = 45 558 bytes        Compression Rate = 85,5 %
Sound               Tinies game sounds                              Uncompressed Size = 55 808 bytes        LZ4 Size = 51 235 bytes        Compression Rate = 91,8 %
Sound.2            Cogito game sounds                             Uncompressed Size = 60 168 bytes        LZ4 Size = 52 912 bytes        Compression Rate = 87,9 %

- Music : The Apple IIgs games can use many Music file formats like SynthLab (Midi) or SoundSmith / NoiseTracker (Digitalized Sounds).

Music01            SynthLab Music File                              Uncompressed Size = 43 396 bytes        LZ4 Size = 22 582 bytes        Compression Rate = 52,0 %
Music02            SynthLab Music File                              Uncompressed Size = 27 804 bytes        LZ4 Size = 15 120 bytes        Compression Rate = 54,3 %
Music03            SynthLab Music File                              Uncompressed Size = 14 100 bytes        LZ4 Size =   8 203 bytes        Compression Rate = 58,1 %
Music04            SynthLab Music File                              Uncompressed Size = 15 564 bytes        LZ4 Size =   8 740 bytes        Compression Rate = 56,1 %
Music05            SynthLab Music File                              Uncompressed Size = 15 804 bytes        LZ4 Size =   9 131 bytes        Compression Rate = 57,7 %
Music06            SynthLab Music File                              Uncompressed Size = 17 300 bytes        LZ4 Size = 10 343 bytes        Compression Rate = 59,7 %
Music07            SynthLab Music File                              Uncompressed Size = 33 108 bytes        LZ4 Size = 18 109 bytes        Compression Rate = 54,7 %
Music                 SoundSmith Music File                        Uncompressed Size = 35 574 bytes        LZ4 Size =   4 640 bytes        Compression Rate = 13,0 %
Music2              SoundSmith Music File                         Uncompressed Size = 43 638 bytes        LZ4 Size =   1 790 bytes        Compression Rate =   4,1 %
Music                 SoundSmith Music File                        Uncompressed Size = 24 822 bytes        LZ4 Size =   2 448 bytes        Compression Rate =    9,8 %
Music                 NoisteTracker Music File                    Uncompressed Size = 120 382 bytes      LZ4 Size = 80 939 bytes        Compression Rate = 66,9 %

- Code : The OMF format do not let you compress the code Segment you may have inside a S16 file, but there are other types of object code we could compress, like the output files from Mr Sprite (64 KB file of 65c816 code used to draw sprites).

Adon00.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 65 528 bytes        LZ4 Size = 29 775 bytes        Compression Rate = 45,4 %
Adon01.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 64 916 bytes        LZ4 Size = 31 029 bytes        Compression Rate = 47,7 %
Adon02.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 65 531 bytes        LZ4 Size = 31 563 bytes        Compression Rate = 48,1 %
Adon03.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 65 518 bytes        LZ4 Size = 32 666 bytes        Compression Rate = 49,8 %
Adon04.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 64 467 bytes        LZ4 Size = 31 421 bytes        Compression Rate = 48,7 %
Adon05.bin        MrSprite Compiled Sprite Code        Uncompressed Size = 65 508 bytes        LZ4 Size = 34 014 bytes        Compression Rate = 51,9 %
LEMMING.bin    LemminGS game Code                       Uncompressed Size = 63 653 bytes        LZ4 Size = 32 165 bytes        Compression Rate = 50,5 %
Convert.bin        Convert 3200 utility code                    U
ncompressed Size = 63 320 bytes        LZ4 Size = 28 953 bytes        Compression Rate = 45,7 %

Text : Software may use Text files, especially as an online help.

Doc                     Convert 3200 documentation            Uncompressed Size = 55 605 bytes        LZ4 Size = 20 488 bytes        Compression Rate = 36,8 %

Except for digitalized sounds where a dedicated compression algorithm could provide better results, the LZ4 compression algorithm appears to be very good for compressing any kind of data.


> F.A.Q

What about a LZ4 Compression utility for the Apple IIgs ?

There is no plan in our side to release a LZ4 compressor for the Apple IIgs because we are in the mood to use the modern platform to do the job. With the availability of C source code (see below), it could probably be done without any problem.

Which LZ4 compression software are you using ?

We are using a Windows command line tool from Yann Collet names LZ4.exe (v 1.3.3, 89 KB). We simply use it as : LZ4.exe -c2  <source_file_path>  <output_file_path> 


> External References

The LZ4 project Page, including Source code for many languages / platforms :  https://code.google.com/p/lz4/
The  Fast Compression Blog, written by Yann Collet the creator of the LZ4 algorithm : http://fastcompression.blogspot.fr
The Peter Ferrie Apple II stuff page, where you can get LZ4 unpacker source code for both 6502 and 65c02 :  http://pferrie.host22.com/misc/appleii.htm
A little hello to Dagen Brock who spoke about the LZ4 in comp.sys.apple2.programmer on March 2013.  Our interest for the LZ4 has started from there !


> Download

In this archive file (.zip) you will find all the materials used for this study (Source code, Pictures, Executable...)       LZ4 Files