Exploring roman numerals in assembly and Python

Turning myself into vacation mode, a small summer coding challenge is always fun and a good way to relax. This time I chose how to convert arabic to roman numerals.

It took me a while to figure out the rules regarding which letters can be combined. E.g. I can only be before V or X, and not before L or higher numbers, and so on. So I ended up in this translation table where the combinations are present. In the diagram below, that is the table in the center of the image with rows labeled as roman and arabic.

As I try to illustrate above using 14 as an example. I walk through the row with arabic numerals until I find a value where 14 is greater than or equal to current value.

In this case, that would be 10. So I put X into the return string that I am building up. So next step is to subtract 10 from 14, which gives 4 as the new starting point. Taking it another round, I find the number 4 which 4 is greater than or equal to. Corresponding roman numeral i IV, which I add to the return string. Resulting in XIV.

The assembly routine

I am by no means a skilled assembly language coder, but I like to code using NASM in 64-bit Linux which this example is using.

global arabicToString:function

section .bss
retstr      db    25 dup(0)

section .rodata
roman       db    'M', 0, 'CM', 0, 'D', 0, 'CD', 0, 'C', 0, 'XC', 0, 'L', 0, 'XL', 0, 'X', 0, 'IX', 0, 'V', 0, 'IV', 0, 'I', 0      ; Roman symbols
arabic      dw    1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1, 0           ; Corresponding integer values

section .text

; rdi = int
arabicToString:
      xor   r10, r10                                  ; index for return string
      xor   rcx, rcx                                  ; counter for index in arabic

      lea   r8, roman
      lea   r9, arabic

      mov   rdx, rdi                                  ; holds the remainder
loopa:                                                ; find the position in the table
      cmp   rdi, 0
      je    end
      xor   rbx, rbx
      mov   bx, [r9 + rcx * 2]
      cmp   rdi, rbx
      jge   found0
      inc   rcx                                       ; increase counter
      jmp   loopa
found0:
      push  rdi
      push  rbx
      push  rcx
      mov   rdi, rcx                                  ; rcx holds the index in the array
      call  getRomanNumber
      pop   rcx
      pop   rbx
      pop   rdi
      sub   rdi, rbx                                  ; subtract the value found
      xor   rcx, rcx                                  ; counter for index in arabic
      jmp   loopa
end:
      lea   rax, retstr
      ret                                             ; return string in rax

getRomanNumber:                                       ; rdi holds index of string in array
      push  rdi

      xor   rcx, rcx
      inc   rcx
      xor   r11, r11
      lea   r12, roman                                ; r12 contains pointer to roman string
loopr1:
      mov   r11b, byte [r12]
      cmp   r11b, 0
      je    foundnull
      inc   r12
      jmp   loopr1
foundnull:
      cmp   rcx, rdi
      jge   ptrdone
      inc   r12
      inc   rcx
      jmp   loopr1
ptrdone:
      mov   rax, r12
      cmp   rdi, 0
      je    g1
      inc   rax
      jmp   g2
g1:
      lea   rax, roman
g2:      
      mov   rdi, rax
      lea   rcx, retstr
      call  copyRoman
      pop   rdi

      ret

copyRoman:
      xor   r13, r13
copyR1:
	mov   r13b, [rdi] ; rdi = source, rcx = target
      cmp   r13b, 0
      je    copyRomanDone

	mov   [rcx + r10], r13b
      inc   r10
      inc   rdi
      jmp   copyR1
copyRomanDone:
      mov   rax, rcx
	ret

The C code that calls the assembly routine

I use a simple C program to call the assembly routine. It takes a command line argument and passes it on to the assembler routine.

#include <stdio.h>
#include <stdlib.h>

extern char *  arabicToString(int);

int main(int argc, char *argv[]) {

if(argc == 2) {
	printf("Arabic number: %s\n", argv[1]);
	int a = atoi(argv[1]);
	char *str = arabicToString(a);
	printf("str=%s\n", str);
} else { 
	printf("Enter arabic number as parameter...\n");
}

}

Running the program

Compiling, linking and then running it results in this output. 3999 is the highest number that can be represented using these numerals. There are higher ones that can be represented, but I save that for another day.

Testing it out using python

Taking the same logic into a python program, I ended up with this code:

import sys

roman = ["M", "CM", "D", "CD", "C", "XC", "L", "XL", "X", "IX", "V", "IV", "I"]
arabic = [1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1]
arabicrest = 0
retstr = ""

if __name__ == "__main__":
    if(len(sys.argv) == 2):
        arabicrest = int(sys.argv[1])
    else:
        print("Specify argument.... I give you a 1")
        arabicrest = 1
    while arabicrest > 0:
        i = 0
        for x in arabic:
            if(arabicrest >= x):
                print("{:d} is greater or equal to {:d}   {:d}".format(arabicrest, x, i))
                arabicrest = arabicrest - x
                retstr = retstr + roman[i]
                break
            i = i + 1
    print("Roman: {}".format(retstr))

Running the python program

Running the code results in this output, which is showing the same result as the assembly routine.

Using the time command in Linux to get an understanding of the execution is not really fair, since I print out more stuff in python. But 0.001 vs 0.017 is still quite a difference I think.

Anyway, it was big fun to do some coding. 🙂 Here is the code on GitHub if you are interested: https://github.com/matspettersson/assemblyplay/tree/main/roman

Wishing you all a great summer!


Posted

in

by

Tags:

Comments

Leave a comment

Design a site like this with WordPress.com
Get started