Nowehere is this more true than in unit testing and test driven development. The ideal is to write the tests before the code, which will only be done if the tests are simple to write and the test framework can be swung into action without trouble.
I have tried several unit test frameworks and the simplest I have found is UnitTest++
Writing a test could not be simpler:
TEST( fixture, network_addnode )
{
cNetwork net;
CHECK_EQUAL( 1, net.AddNode( string("A")));
CHECK_EQUAL( 2, net.AddNode( string("B")));
}
Invoking the test framework is similarily to the point:
int _tmain()
{
return UnitTest::RunAllTests();
}
The snag is that the source distribution contains almost 40 seperate files. This is absurd. Managing the source control and build tasks for a simple project is dominated by looking after all these unit testing files.
Some of this complexity is due to the requirement for portability to different operating systems and hardware. However, many of the files contain only a line or two of code and it is hard to see the justification for their existence.
By dropping the portability to anything other than Windows, removing the alternative reporters, and merging the smaller files together, I have reduced the source to a single header file and a single C++ source file.
I have not changed the API in any way. This means that all test code written will compile and link with the UnitTest++ system, if the portability is needed later.
I have renamed the system Class Unit Tests, for short: CUTEST. To use it in a project, all that is required is to add the files cutest.h and cutest.cpp.
I have moved the framework into the raven::set namespace. The only change this requires to code using the framework must invoke the tests by:
return raven::set::UnitTest::RunAllTests();
The cutest framework is release as open source and can be obtained, along with a code timing profiler from here
[ add comment ] ( 75 views ) | permalink | ( 3 / 2466 )
For nearly a decade now I have been developing high performance, elegant windows desktop applications.
In the meantime the rest of the world as been moving ‘into the cloud’. Such web applications are sluggish and clunky, but the simplicity of no installation, universal availability and instant upgrades of the entire user base has trumped all other considerations.
So today I delivered my fist web application. Guitar informs the user when their guitar was built and by whom, if they type in the serial number. Behind the scenes there is a SQLITE database to store all the information about guitars and their makers, a PYTHON script to decode the serial numbers and find the exact information in the database, the WEB2PY framework to generate the dynamic web pages and a LIGHTTPD web server to communicate with the user. It is hardly elegant, and there is no need for any great performance, but the result is rather neat.
Perhaps it will be the first of many?
[ 1 comment ] ( 44 views ) | permalink | ( 2.9 / 560 )
Automatically self documenting code – the programmer’s dream!
This is almost possible with a painless three step process using two free tools: doxygen and fossil.
Step 1: Edit code to describe what it does
/**
Conversion between UTF-8 and UTF-16 strings.
UTF-8 is used by web pages. It is a variable byte length encoding
of UNICODE characters which is independant of the byte order in a computer word.
UTF-16 is the native Windows UNICODE encoding.
The class stores two copies of the string, one in each encoding,
so should only exist briefly while conversion is done.
This is a wrapper for the WideCharToMultiByte and MultiByteToWideChar
*/
class cUTF
{
public:
/// Construct from UTF-16
cUTF( const wchar_t * ws );
/// Construct from UTF8
cUTF( const char * s );
/// get UTF16 version
const wchar_t * get16() { return myString16; }
/// get UTF8 version
const char * get8() { return myString8; }
/// free buffers
~cUTF() { free(myString8); free(myString16); }
};
Step 2: Run doxygen to generate HTML documentation form the code. Doxygen hunts down all your code and works out the relationships, all in a flash.
Step 3: Do a fossil check in. Fossil finds all the code you have modified, and the new HTML documentation files generated by doxygen, and stores then into an online repository, so that they are immediately available to anyone with a browser. You can see the result here
[ add comment ] ( 40 views ) | permalink | ( 2.9 / 1790 )
It is a big old world, full of many varied characters.
There are familiar friends like A, B and C. There are also Chinese characters and Cyrillic characters. We might even want to use Klingon characters!
There are over 65536 different characters that a computer might have to handle. Every possible character has been assigned its own number, called a ‘Unicode code point’. You can see them all here http://www.unicode.org/charts/charindex.html
A byte is a collection of 8 ones or zeros, the basic unit of computer memory. The largest number that can be stored in a byte is 256. In the old days when the world was smaller and simpler, about 20 or 30 years ago, this was sufficient to store A, B and C and their familiar friends. So each character was stored in one byte.
Nowadays we need to include the whole word and even the occasional Klingon. We need more space for our characters and their new friends.
Two bytes are called a word, and can be handled conveniently by most computers. The largest number that can be stored in a word is 65536. It is almost enough for every character. So, in the system called UTF-16 ( because a word contains 16 ones or zeroes ), most every character is stored in a word, with the overflow stored in two words.
The Microsoft Windows operating system uses UTF-16 to handle Unicode characters. Here is how the C programming language creates a UTF-16 encoded Unicode string
wchar_t * ws = L”Hello World”;
There is a snag. Although computers use UTF-16 internally, they cannot communicate easily with each other using UTF-16. This is because, although every computer agrees on the order in which the ones and zeros of a byte should be arranged, they do not all agree on which order the bytes in a word should be arranged. In a reference to Jonathan Swift’s novel ‘Gulliver’s Travels’ where characters fought over which end an egg should be opened, the two ways of arranging the bytes in a word are called ‘Big Endian’ and ‘Little Endian’. When communicating with each other computers use another standard called UTF-8 where each Unicode character is encoded by a series of bytes in a specified order which is the same whether the computer is ‘Big Endian’ or ‘Little Endian’
When a computer program needs to communicate with another computer, perhaps by reading or writing a web page, it must constantly convert back and forward between UTF-8 and UTF-16 encoded character strings. The C programming language provides routines for doing this: WideCharToMultiByte() and MultiByteToWideChar(). However, they are a pain to use. Each conversion requires two calls to the routines and you have to look after allocating/freeing memory and making sure the strings are correctly terminated. We need a wrapper!
Here is the interface to my wrapper:
/**
Conversion between UTF-8 and UTF-16 strings.
UTF-8 is used by web pages. It is a variable byte length encoding
of UNICODE characters which is independant of the byte order in a computer word.
UTF-16 is the native Windows UNICODE encoding.
The class stores two copies of the string, one in each encoding,
so should only exist briefly while conversion is done.
This is a wrapper for the WideCharToMultiByte and MultiByteToWideChar
*/
class cUTF
{
wchar_t * myString16; ///< string in UTF-16
char * myString8; ///< string in UTF-6
public:
/// Construct from UTF-16
cUTF( const wchar_t * ws );
/// Construct from UTF8
cUTF( const char * s );
/// get UTF16 version
const wchar_t * get16() { return myString16; }
/// get UTF8 version
const char * get8() { return myString8; }
/// free buffers
~cUTF() { free(myString8); free(myString16); }
};
Here is the code to implement this interface
/// Construct from UTF-16
cUTF::cUTF( const wchar_t * ws )
{
// store copy of UTF16
myString16 = (wchar_t * ) malloc( wcslen( ws ) * 2 + 2 );
wcscpy( myString16, ws );
// How long will the UTF-8 string be
int len = WideCharToMultiByte(CP_UTF8, 0,
ws, wcslen( ws ),
NULL, NULL, NULL, NULL );
// allocate a buffer
myString8 = (char * ) malloc( len + 1 );
// convert to UTF-8
WideCharToMultiByte(CP_UTF8, 0,
ws, wcslen( ws ),
myString8, len, NULL, NULL);
// null terminate
*(myString8+len) = '\0';
}
/// Construct from UTF8
cUTF::cUTF( const char * s )
{
myString8 = (char * ) malloc( strlen( s ) + 1 );
strcpy( myString8, s );
// How long will the UTF-16 string be
int len = MultiByteToWideChar(CP_UTF8, 0,
s, strlen( s ),
NULL, NULL );
// allocate a buffer
myString16 = (wchar_t * ) malloc( len * 2 + 2 );
// convert to UTF-16
MultiByteToWideChar(CP_UTF8, 0,
s, strlen( s ),
myString16, len);
// null terminate
*(myString16+len) = '\0';
}
And here is some code to test the wrapper
// create a native unicode string with some chinese characters
wchar_t * unicode_string = L"String with some chinese characters \x751f\x4ea7\x8bbe\x7f6e ";
// convert to UTF8
cUTF utf( unicode_string );
// create a web page
FILE * fp = fopen("test_unicode.html","w");
// let browser know we are using UTF-8
fprintf(fp,"<head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=UTF-8\"></head>\n");
// output the converted string
fprintf(fp, "After conversion using cUTF8 - %s<p>\n", utf.get8() );
fclose(fp);
[ add comment ] ( 41 views ) | permalink | ( 3 / 2115 )
Raven’s Point has clients all round the world, and sometimes my client’s have customers all round the world. Although my clients all communicate with me in English, my clients customers much prefer to use the applications I deliver in their own language. Some of my clients customers are in China, which presents a particular challenge.
It is important to be able to switch language the user sees quickly and easily. The requirement is that the user can select the preferred language while the program is running and instantly the entire user interface should change to the new language without changing or interrupting anything that is going on. It is not satisfactory to stop the program and restart it, or run another version, to change the language displayed.
This week I added support for the German language, in addition to English and Chinese. This went very smoothly, once I had obtained the German translation. So, here is my recipe for multi-language support in a C++ program built with Microsoft Visual Studio.
Create a table which has every character string displayed by the user interface assigned to a number. Each language has its own base number and the translations of each string are assigned a unique number which has the same offset from the language base. For a program that supports English and German, I might choose that the English base number is 40000 and German is 70000. So the English string “Run” might be given the number 40131 and the German string "Geführt" the number 70131.
The numbers are arbitrary, but there are a couple of things to watch out for. The numbers 1000 and upwards are used by Microsoft Visual Studio for all sorts of purposes, so it is best to stay way from this area – starting at 40000 works fine. The language base numbers must be far enough apart that there is no chance that you will run out of room between them – a separation of 10000 should be enough.
The table of numbered strings is saved in a text file which looks like this
STRINGTABLE
BEGIN
40131 “Run”
END
STRINGTABLE
BEGIN
70131 “Geführt”
END
The text file containing the numbered strings table is a resource which is compiled by the resource compiler and linked to the rest of the program. However, it is maintained and edited by using a text editor and must be protected from being changed by the Microsoft Visual resource editor. Do this by naming the file language.rc and storing it in the res subfolder of the project directory. The resource compiler reaches the file through an include to the file in <project folder>/res/<projectname>.rc2
The numbered string table is used by code like this
SetDlgItemText( IDC_RUN,
CString(MAKEINTRESOURCE( myLanguage + 131 ) ) );
The global variable myLanguage contains the base number of the currently selected language. This code must be called every time the GUI is redrawn and also each time the user changes the selected language.
It is convenient for the user if, when the program starts, it remembers the language that was selected last time it was run. When the user changes the language, call this line
AfxGetApp()->WriteProfileInt(L"startup", L"language", myLanguage );
And when the program starts
myLanguage = AfxGetApp()->GetProfileInt(L"startup", L"language", 40000 );
There is a temptation to use defines to replace the string number offsets ( e.g. 131 ) with symbolic constants ( STR_RUN ). I recommend against doing this. It is just another table which must be maintained and once there are more than a few dozen strings, maintenance becomes a pain. The numbered string table is self documenting and, if your are careful assigning the resource IDs ( IDC_RUN ) and use plenty of comments, the code will be self documenting, despite the sprinkling of mysterious numbers ( 131 ) through out.
Now, we come to the support of Chinese and other East Asian languages. Out of the box, Windows will not even display East Asian characters. Here is a link to advice from Robert Y Eng on switching on this support.
The next problem is how to represent the Chinese characters. There are several alternatives here and many technical details. It is easy to get lost for many days in researching and evaluating the alternatives ( I did! ). I am simply going to describe what I do.
The Chinese character strings are represented by 16 bit unicode numbers, using escaped hexadecimal. They look like this:
60131 L"\x8FD0\x884C"
This produces a couple of hieroglyphics which, I am assured, mean “Run” to anyone who can read them.
The advantage of this method is that you just have to add another language base number ( in my case 60000 ) for Chinese and immediately, magically the program displays Chinese characters in all the appropriate places on any computer with East Asian languages switched on. No new code is required.
The disadvantage of this method is that you probably will not receive the Chinese strings from the translator in this form. Since there are so many different ways to represent Chinese characters, this problem will probably arise no matter what scheme you choose. I have been doing this for less than a year, and already have received Chinese translations in several different formats which require some hacking about to decode. I cannot give details of all the different possibilities, but here is some general advice.
The first thing is to determine if the characters are being represented with fixed width 16 byte numbers. If they are, then you need to convert them into escaped hexadecimal ASCII character strings.
The other format that you will often see is variable width multibyte numbers, often called UTF-8. These need to be converted. Here is a straightforward manual procedure.
• Paste into notepad editor
• Clean up so that everything is as regular as possible
• Save as unicode big-endian
• Open in a hex editor
• Copy and paste the required code string into the string table file, escaping as you go.
Obviously, this procedure is only feasible for a small number of strings. If you need to automate this procedure, contact me.
[ add comment ] ( 77 views ) | permalink | ( 3 / 2095 )