Assembler progress

I’ve spent the last few evenings tweaking the LLVM backend’s MC code to fix a lot of minor issues, and it’s to the point where it can generate normal output from ECLair assembly input now! It understands that different instructions can be different lengths, the difference between half-width registers and full-width registers, and the first few instructions are integrated and working.

One challenge is that the regression test framework for the CPU itself relies on ASCII-binary (01010110 vs. actual binary values) input with comments in it, and the test framework reads these comments to determine what checks to run at which PC steps. After thinking about a few ways to do it (requesting that llvm-mc generate ELF object output then writing a tool to turn this into ASCII-binary and re-integrate the comments was the main one but felt overly complicated), I settled on using the assembly-input/assembly-output mode of llvm-mc. With the -preserve-comments flag, the comments in the input are left in place in the output, and with the -show-encoding flag the output includes extra comments that show the exact hex values that would have been output if the output format was binary rather than text assembly. I then wrote a quick little tool that parses this output and uses the show-encoding comments to generate ASCII-binary, and since the comments are already left in place, that’s all the tool really has to do.

I’ve converted 10 of the 109 test files over from seasm to llvm-mc format so far and it all seems to be working well. I’ll work on converting the rest over bit by bit as I add support to the LLVM backend for each instruction in turn over the next little while.

It’s really great to see the LLVM backend actually generating useful output and being integrated into the development process after so long and so much work!