Well, after deciding to
produce a wrapper, there left only 3 megs of source code - the complete ICU API in
C & C++ headers. The C part of the API, that was easy - just elementary bindings
with the external keyword, really not worth to mention
it here more. But the challenge, that was the C++ Class API which is the majority
of ICU and provides the most interesting functionality (like Layout engine). So the
question was, how to bind to C++ classes and provide the same API interface in Object
Pascal ? In some existing approaches, pascal authors use to create a new classes
with methods wich are different or similar to original implementation. Then these
new classes in the end call the C library functions through the elementary external
linking mechanism. This kind of class-wrapping can be used on external C/C++ libraries,
which export just C API functions. Another approach called "Flattening"
exists, which in the end also uses bindings to C functions. To solve the requirement
for direct C++ classes calls bindings, let's first look, what's inside the C++ shared
library (DLL on Windows and SO on Linux).
Openning the icuuc36.dll with PE resource editing program
we can see in the export table traditional C function calls like u_isdigit_3_6.
There are also a C++ function
Which in demangled state mean:
UnicodeString::append(wchar_t const*, int, int );
This mangled function
call name is what we need to call the C++ class code. The specific issue by this
kind of calls is the way, by which the pointer to the already initialized class instance
gets in, as a part of the calling convention. On the assembler level, MS Visual C++
uses ecx register and GCC passes the class
instance pointer as a first parameter on the stack. Thus before fetching such a call,
we need to have a properly initialized instance of the class, whose method we
want to call. So before revealing, how to make such a calls, we must firstly
figure out, how in pascal can we obtain the properly initialized instances of the
Discovering the content
of the DLL library we can see, there are function call names for it's classes constructors
and destructors. In the following explanation we focus on UnicodeString ICU class. One of the constructors for UnicodeString class has a form:
This is the default constructor
of UnicodeString class and takes no parameters.
C/C++ compilers know, how to create constructor and destructor calls in the final
executable. Pascal compilers know that not, but there is the asm block directive, by using which we can simulate the behaviour of the C/C++
Now, we will try to create
and properly initialize the UnicodeString class instance in pascal, so it could be passed as a pointer (self/this) to
the C++ library class method calls. Following code does a half of that job:
const US_STACKBUF_SIZE = 7;
uint16_t = Word;
int32_t = Longint;
UChar = widechar;
UChar_ptr = ^UChar;
UnicodeString = class
fCapacity : int32_t;
fArray : UChar_ptr;
fFlags : uint16_t;
fStackBuffer : array[0..US_STACKBUF_SIZE - 1 ] of UChar;
procedure UMemory_New; external 'icuuc36.dll'
procedure UnicodeString_UnicodeString; external 'icuuc36.dll'
str : UnicodeString;
add esp ,4
mov dword ptr [str] ,eax
mov ecx ,dword ptr [str]
The code above really
works. Try to cut and paste it into your Delphi and see. However, there is one problem
which we are approaching. As I said, it is just a half of the job of creating and
properly initializing the C++ class instance. So far, we have called the UMemory::new() operator and the UnicodeString::UnicodeString() class default constructor. If
you will inspect the content of the str
variable now, you will see that:
fLength is "0",
fCapacity is "7",
fArray points to fStackBuffer (+18 from pointer(str))
and fFlags is "2".
No exception, balanced
stack and presence of above values in str's
fields is evidence, we have been successfull in creating the C++ class instance.
Now we should finish the initialization part of our C++ behaviour simulation.
The call to the C++ class constructor did all the initialization job except of setting
up the pointer to the virtual method table. So we should continue with code like
mov eax ,dword ptr [str]
mov dword ptr [eax] ,offset icu_3_6::UnicodeString::'local vftable'
But we can't. In pascal,
it is not possible to externally link to the data in DLL. That's the reason, why
also all constants have to be redefined in pascal, when creating binding ports. So
what's next ? Are we in a blind alley ? Answer is no, but we had to get here to understand,
why certain design decisions must be made when creating a fully functional Direct