Turning myself into vacation mode, a small summer coding challenge is always fun and a good way to relax. This time I chose how to convert arabic to roman numerals.
It took me a while to figure out the rules regarding which letters can be combined. E.g. I can only be before V or X, and not before L or higher numbers, and so on. So I ended up in this translation table where the combinations are present. In the diagram below, that is the table in the center of the image with rows labeled as roman and arabic.
![](https://matspetterss.wordpress.com/wp-content/uploads/2023/07/230703.drawio.png?w=881)
As I try to illustrate above using 14 as an example. I walk through the row with arabic numerals until I find a value where 14 is greater than or equal to current value.
In this case, that would be 10. So I put X into the return string that I am building up. So next step is to subtract 10 from 14, which gives 4 as the new starting point. Taking it another round, I find the number 4 which 4 is greater than or equal to. Corresponding roman numeral i IV, which I add to the return string. Resulting in XIV.
The assembly routine
I am by no means a skilled assembly language coder, but I like to code using NASM in 64-bit Linux which this example is using.
global arabicToString:function
section .bss
retstr db 25 dup(0)
section .rodata
roman db 'M', 0, 'CM', 0, 'D', 0, 'CD', 0, 'C', 0, 'XC', 0, 'L', 0, 'XL', 0, 'X', 0, 'IX', 0, 'V', 0, 'IV', 0, 'I', 0 ; Roman symbols
arabic dw 1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1, 0 ; Corresponding integer values
section .text
; rdi = int
arabicToString:
xor r10, r10 ; index for return string
xor rcx, rcx ; counter for index in arabic
lea r8, roman
lea r9, arabic
mov rdx, rdi ; holds the remainder
loopa: ; find the position in the table
cmp rdi, 0
je end
xor rbx, rbx
mov bx, [r9 + rcx * 2]
cmp rdi, rbx
jge found0
inc rcx ; increase counter
jmp loopa
found0:
push rdi
push rbx
push rcx
mov rdi, rcx ; rcx holds the index in the array
call getRomanNumber
pop rcx
pop rbx
pop rdi
sub rdi, rbx ; subtract the value found
xor rcx, rcx ; counter for index in arabic
jmp loopa
end:
lea rax, retstr
ret ; return string in rax
getRomanNumber: ; rdi holds index of string in array
push rdi
xor rcx, rcx
inc rcx
xor r11, r11
lea r12, roman ; r12 contains pointer to roman string
loopr1:
mov r11b, byte [r12]
cmp r11b, 0
je foundnull
inc r12
jmp loopr1
foundnull:
cmp rcx, rdi
jge ptrdone
inc r12
inc rcx
jmp loopr1
ptrdone:
mov rax, r12
cmp rdi, 0
je g1
inc rax
jmp g2
g1:
lea rax, roman
g2:
mov rdi, rax
lea rcx, retstr
call copyRoman
pop rdi
ret
copyRoman:
xor r13, r13
copyR1:
mov r13b, [rdi] ; rdi = source, rcx = target
cmp r13b, 0
je copyRomanDone
mov [rcx + r10], r13b
inc r10
inc rdi
jmp copyR1
copyRomanDone:
mov rax, rcx
ret
The C code that calls the assembly routine
I use a simple C program to call the assembly routine. It takes a command line argument and passes it on to the assembler routine.
#include <stdio.h>
#include <stdlib.h>
extern char * arabicToString(int);
int main(int argc, char *argv[]) {
if(argc == 2) {
printf("Arabic number: %s\n", argv[1]);
int a = atoi(argv[1]);
char *str = arabicToString(a);
printf("str=%s\n", str);
} else {
printf("Enter arabic number as parameter...\n");
}
}
Running the program
Compiling, linking and then running it results in this output. 3999 is the highest number that can be represented using these numerals. There are higher ones that can be represented, but I save that for another day.
![](https://matspetterss.wordpress.com/wp-content/uploads/2023/07/roman-c.png?w=805)
Testing it out using python
Taking the same logic into a python program, I ended up with this code:
import sys
roman = ["M", "CM", "D", "CD", "C", "XC", "L", "XL", "X", "IX", "V", "IV", "I"]
arabic = [1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1]
arabicrest = 0
retstr = ""
if __name__ == "__main__":
if(len(sys.argv) == 2):
arabicrest = int(sys.argv[1])
else:
print("Specify argument.... I give you a 1")
arabicrest = 1
while arabicrest > 0:
i = 0
for x in arabic:
if(arabicrest >= x):
print("{:d} is greater or equal to {:d} {:d}".format(arabicrest, x, i))
arabicrest = arabicrest - x
retstr = retstr + roman[i]
break
i = i + 1
print("Roman: {}".format(retstr))
Running the python program
Running the code results in this output, which is showing the same result as the assembly routine.
![](https://matspetterss.wordpress.com/wp-content/uploads/2023/07/roman-py.png?w=820)
Using the time command in Linux to get an understanding of the execution is not really fair, since I print out more stuff in python. But 0.001 vs 0.017 is still quite a difference I think.
Anyway, it was big fun to do some coding. 🙂 Here is the code on GitHub if you are interested: https://github.com/matspettersson/assemblyplay/tree/main/roman
Wishing you all a great summer!
Leave a comment