0.Intro
Exploring the disassembly I finded an interesing function at address 0x0040f852 (I researched FF8 PC patch v1.2 executable). It gets two buffer pointers and copies data from one to another. But size of these buffers differs in general. After brief disassembly reading I understood that 0x0040f852 is address of unLZS function.
Ficedula's LZS description is accurate, but it's mostly a view from algorithm side. And we'll look on it from assembly side.
1. Developer's footprints
FF8 code is fullfilled with those things. The whole brunches of code never run. This function also contains "hello" at address 0x0040f883:
.text:0040F883 loc_40F883: ;jmp reference
.text:0040F883 mov ecx, 1 ;ecx = 1
.text:0040F888 test ecx, ecx ;ecx == 0 ?
.text:0040F88A jz loc_40F9D1 ;if true, jump to exit
Nice thing, isn't it?
2. Virtual circular buffer
LZS alorithm uses 4k circular buffer for packing/unpacking. It was described by Ficedula. I wouldn't repeat well-known things. If you don't know what we're speak about, just read that article. Maybe PSX platform implements circular buffers, I don't know, but on PC usage additional buffer is a great speed loss. FF8 developers solved this problem by buffer emulating. You need buffer when you're meet reference bit (0) in control byte and why you can't use your output buffer as circular when you need it? The main problem in this case is calculating right referenced data pointer.
;ecx == refLen (length of referenced sequence stored in lower half-byte of reference word)
.text:0040F942 mov edx, [ebp+pDestBuffPos]
.text:0040F945 lea eax, [edx+ecx+3]
.text:0040F949 mov [ebp+pDestBuffRefEnd], eax
.text:0040F94C mov ecx, [ebp+pDestBuffPos]
.text:0040F94F sub ecx, [ebp+pDestBuff] ; after this ecx == current decoded data size
.text:0040F952 mov edx, [ebp+refData]
.text:0040F955 sub edx, 0FEEh ; 0x0fee
; + 0x0012 (max referenced data size)
; = 0x1000 (circular buffer size)
.text:0040F95B sub ecx, edx
.text:0040F95D and ecx, 0FFFh
.text:0040F963 mov eax, [ebp+pDestBuffPos]
.text:0040F966 sub eax, ecx
.text:0040F968 mov [ebp+pRefData], eax
Here's all like listed in LZS description. But much effective way to implement circular buffer is implementing of "floating window".
mov edx, [ebp+refData]
add edx, 12h
mov eax, [ebp+pDestBuffPos]
sub eax, edx
mov [ebp+pRefData], eax
Following native FF8 code successfully process all error cases (it will be listed in next section).
3. Negative offsets in virtual circular buffer
LZS algorithm that looks pretty on circular buffer is full with hacks on plain buffer. When you use virtual circular buffer you should remember about negative offsets.
.text:0040F96B loc_40F96B: ; jmp reference
.text:0040F96B mov ecx, [ebp+pRefData]
.text:0040F96E cmp ecx, [ebp+pDestBuff]
.text:0040F971 jnb short loc_40F995
.text:0040F973 mov edx, [ebp+pDestBuffPos]
.text:0040F976 cmp edx, [ebp+pDestBuffRefEnd]
.text:0040F979 jnb short loc_40F995
.text:0040F97B mov eax, [ebp+pDestBuffPos]
.text:0040F97E mov byte ptr [eax], 0
.text:0040F981 mov ecx, [ebp+pDestBuffPos]
.text:0040F984 add ecx, 1
.text:0040F987 mov [ebp+pDestBuffPos], ecx
.text:0040F98A mov edx, [ebp+pRefData]
.text:0040F98D add edx, 1
.text:0040F990 mov [ebp+pRefData], edx
.text:0040F993 jmp short loc_40F96B
I suggest, FF8 was developed on C and compiled without code optimization flags or maybe 10 years ago compilers were weak in optimization. But this code looks very strange. We know all about our buffers. Why do we need to check bounds on every step? Why we need to copy zeroes byte-by-byte? Programmer could call memset function or implement own
mov ecx, [ebp+pRefData]
cmp ecx, [ebp+pDestBuff]
jnb short loc_40F995
push edi
mov edi, [ebp+pDestBuffPos]
mov ecx, [ebp+pDestBuff]
sub ecx, [ebp+pRefData] ; we already checked bounds, so, pDestBuff > pRefData
mov eax, [ebp+pDestBuffPos]
add eax, ecx
mov [ebp+pDestBuffPos], eax
xor eax, eax
rep stosb
pop edi
This looks a little better. We use (9 + zero_data_size) memory i/o operations instead of (10 * zero_data_size). It's faster when zero_data_size > 1.
4. Outro
I don't want to say that FF8 is a nasty-coded game. But there are many tricking and surgery bypassing code, there are a lot of temporary buffers and copying operations from one to other. All these things make reverse engineering hard but interesting. I hope this article was interesting for you too.