16 Jan

16 bits COM Oddity

I can’t even pinpoint what a 16 bits COM Oddity really means, but I think the idea is therein, somehow. Previously, I explained how to code a simple a “hello, world” program using the DEBUG tool that was shipped with DOS. Revisiting this obsolete knowledge was unexpectedly fun. We’ll retrieve the hexadecimal version of “hello, world” (well, “hello, world!!”) from that post:

EB 13 0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64
21 21 0D 0A 24 B4 09 BA 02 01 CD 21 B4 00 CD 21

That’s all we need for our “hello, world!!” binary. 32 bytes exactly. We can create that file bit by bit but that’d be too excessive, I think. Let’s use the echo command instead. This is the full command I entered in my Windows 10 cmd.exe prompt:

echo|set /p="Ù‼♪◙hello, world!!♪◙$┤○║☻☺═!1└═!">hello.com

After that you’ll get a 16-bit COM, hello.com, that will display the “hello, world!!” message. Funny 🙂

What are those weird characters?

First a little explanation. We want our hello.com file to be, byte after byte, an exact representation of the hexadecimal sequence above presented. We’ll use cmd.exe commands to dump characters into the file and, if we choose our characters carefully in order to match the target hexadecimal values, we’ll end up with the exact representation we’re looking for. For instance, the first 2 bytes block, EB 13, is the “jmp 115” instruction. Then comes the newline (0D 0A), and so on. If we convert our hexadecimal to decimal, we get:

235 19 13 10 104 101 108 108 111 44 32 119 111 114 108 100 
 33 33 13 10  36 180   9 186   2  1 205 33 180   0 205  33

The first byte in hello.com must be EB, or 235 in decimal. In order to dump our characters from the command line, we’ll convert that decimal value to a character. I’m trying this on a Windows 10 (64-bits) machine, with cmd.exe using Code page 850 Multilingual Latin 1. In such code page, character 235 is Ù. And 19 is ‼. And, luckily, 13 is ♪ and 10 is ◙. Those two characters are especially important because they represent the carriage return and the line feed, respectively, and some shells won’t convert them to characters. However, happily, cmd.exe with my default code page will handle them as we need. To input those characters you can type the usual ALT + decimal value.

There are a few important things to notice:

  1. The echo command appends a newline automatically. We don’t want that. We want hello.com to comprise 32 bytes exactly. That’s why we’re using set /p=text, which is essentially a hacking. set /p will read the value for an environment variable from the command line, i.e., it will prompt for user input, displaying to the user the provided text without adding a newline. And then it sits waiting for that input. We’ll use a pipe from an empty echo to just provide that “input” set /p is waiting, effectively returning the text we want, without a newline. That text will be redirected to hello.com. I found that set/p hack here.
  2. Notice that set /p is only available on Windows cmd.exe. For DOS I think you could try this uber-hacky thing: echo Ù‼>hello.com&echo hello, world!!>>hello.com&echo $┤○║☻☺═!1└═!>>hello.com. This line to create the hello.com file comprises 3 invocations of echo. The first one creates the file, and the others append characters. I split them to precisely illustrate the fact that echo appends newline characters, and thereby we can omit characters ♪◙ in our input. We got lucky, though, as & is not a character of our program, and so we’re able to run multiple commands in a single line. With code page 850, & is decimal 38. No 38 in our program, phew! Obviously, the final echo will add a newline, yielding a hello.com with 34 bytes, but it should run anyway.
  3. If you’re really following you should have noticed that we have a 0 (zero) in our sequence of bytes. We cannot represent that zero in our list of characters. Well, let’s apply a tiny hack. That zero belongs to the 2 bytes block 180 0, which represents the instruction mov ah, 0. Remember from our previous post that such instruction is required to invoke INT 21h in order to gracefully terminate our program. Let’s replace that by xor ax, ax. The 2 byte block for that is 31 C0, or in decimal: 49 192. That’s why we have 1└═!” at the end of our string.

The 16 bits COM Oddity does not run

We’ll get a 16 bits binary and it won’t run on a 64 bits Windows. But it should run on a 32 bits Windows and in DOS. For Windows 10 64 bits I executed the oddity on the fantastic DOSBox. I placed hello.com under directory d:\ikigames\dos and then mounted such directory from DOSBox with command mount c d:\ikigames\dos. Then, I executed the program. Success.

16 bits COM Oddity on DOSBox under Windows 10 64 bits.
HELLO, WORLD!! on DOSBox. Notice that I’m using uppercase chars here.

Warning: Copy-pasting the above commands might yield a broken hello.com. You have to be sure that the characters you’re pasting into your command line correspond to the proper decimal values. For instance, on a Windows 7 (32 bits) machine I tested this on, the character for 235 is δ, not Ù, and therefore the correct command for that machine was echo|set /p=”δ‼♪◙hello, world!!♪◙$┤○║☻☺═!1└═!”>hello.com