MOS 6502 CPU emulator in C++

This is my C++ emulator of the MOS Technology 6502 CPU. The code emulates a fully functional 6502 CPU and it seems to be pretty fast too. Some minor tricks have been introduced to greatly reduce the overall execution time.

Interrupt and bus read/write operations are emulated as well.

Github repo: https://github.com/gianlucag/mos6502

6502-300x135

What’s a 6502???

Here a brief descrption: http://en.wikipedia.org/wiki/MOS_Technology_6502

Main features:

  • 100% coverage of legal opcodes
  • decimal mode implemented
  • read/write bus callback
  • jump table opcode selection

Still to implement

  • 100% cycle accuracy
  • illegal opcodes
  • hardware glitches, the known ones of course 🙂

The emulator was extensively tested against this test suite:

https://github.com/Klaus2m5/6502_65C02_functional_tests

and in parallel emulation with fake6502 http://rubbermallet.org/fake6502.c

so expect nearly 100% compliance with the real deal… at least on the normal behavior: as I said stuff like illegal opcodes or hardware glitches are currently not implemented.

Why yet another 6502 emulator?
Just for fun :). This CPU (and its many derivatives) powered machines such as:

  • Apple II
  • Nintendo Entertainment system (NES)
  • Atari 2600
  • Commodore 64
  • BBC micro

and many other embedded devices still used today. You can use this emulator in your machine emulator project. However cycle accuracy is not yet implemented so mid-frame register update tricks cannot be reliably emulated.

Some theory behind emulators: emulator types
You can group all the CPU emulators out there in 4 main categories:

  • switch-case based
  • jump-table based
  • PLA or microcode emulation based
  • graph based

Graph based emulators are the most accurate as they emulate the connections between transistors inside the die of the CPU. They emulate even the unwanted glitches, known and still unknown. However the complexity of such emulators is non-linear with the number of transistors: in other word, you don’t want to emulate a modern Intel quad core using this approach!!!

for an example check this out: http://visual6502.org/JSSim/index.html

The PLA/microcode based are the best as they offer both speed and limited complexity. The switch-case based are the simpler ones but also the slowest: the opcode value is thrown inside a huge switch case which selects the code snippet to execute; compilers can optimize switch case to reach near O(log(n)) complexity but they hardly do it when dealing with sparse integers (like most of the CPU opcode tables).

Emulator features
My project is a simple jump-table based emulator: the actual value of the opcode (let’s say 0x80) is used to address a function pointer table, each entry of such table is a C++ function which emulates the behavior of the corresponding real instruction.

All the 13 addressing modes are emulated:

// addressing modes
uint16_t Addr_ACC(); // ACCUMULATOR
uint16_t Addr_IMM(); // IMMEDIATE
uint16_t Addr_ABS(); // ABSOLUTE
uint16_t Addr_ZER(); // ZERO PAGE
uint16_t Addr_ZEX(); // INDEXED-X ZERO PAGE
uint16_t Addr_ZEY(); // INDEXED-Y ZERO PAGE
uint16_t Addr_ABX(); // INDEXED-X ABSOLUTE
uint16_t Addr_ABY(); // INDEXED-Y ABSOLUTE
uint16_t Addr_IMP(); // IMPLIED
uint16_t Addr_REL(); // RELATIVE
uint16_t Addr_INX(); // INDEXED-X INDIRECT
uint16_t Addr_INY(); // INDEXED-Y INDIRECT
uint16_t Addr_ABI(); // ABSOLUTE INDIRECT

All the 151 opcodes are emulated. Since the 6502 CPU uses 8 bit to encode the opcode value it also has a lot of “illegal opcodes” (i.e. opcode values other than the designed 151). Such opcodes perform weird operations, write multiple registers at the same time, sometimes are the combination of two or more “valid” opcodes. Such illegals were used to enforce software copy protection or to discover the exact CPU type.

The illegals are not supported yet, so instead a simple NOP is executed.

Inner main loop
It’s a classic fetch-decode-execute loop:

while(start + n > cycles && !illegalOpcode)
{
   // fetch
   opcode = Read(pc++);

   // decode
   instr = InstrTable[opcode];

   // execute
   Exec(instr);
}

The next instruction (the opcode value) is retrieved from memory. Then it’s decoded (i.e. the opcode is used to address the instruction table) and the resulting code block is executed.

Public methods
The emulator comes as a single C++ class with five public methods:

  • mos6502(BusRead r, BusWrite w);
  • void NMI();
  • void IRQ();
  • void Reset();
  • void Run(uint32_t n);

mos6502(BusRead r, BusWrite w);

it’s the class constructor. It requires you to pass two external functions:

uint8_t MemoryRead(uint16_t address);
void MemoryWrite(uint16_t address, uint8_t value);
respectively to read/write from/to a memory location (16 bit address, 8 bit value). In such functions you can define your address decoding logic (if any) to address memory mapped I/O, external virtual devices and such.

void NMI();
triggers a Non-Mascherable Interrupt request, as done by the external pin of the real chip

void IRQ();
triggers an Interrupt ReQuest?, as done by the external pin of the real chip

void Reset();
performs an hardware reset, as done by the external pin of the real chip

void Run(uint32_t n);
It runs the CPU for the next ‘n’ machine instructions.

Links
Some useful stuff I used…

http://en.wikipedia.org/wiki/MOS_Technology_6502

http://www.6502.org/documents/datasheets/mos/

http://www.mdawson.net/vic20chrome/cpu/mos_6500_mpu_preliminary_may_1976.pdf

http://rubbermallet.org/fake6502.c

Tagged , , .

11
Leave a Reply

avatar
2 Comment threads
9 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
3 Comment authors
GianlucaLGBKoen van Vliet Recent comment authors
LGB
LGB

I am not sure why you state that switch-case is slower. Actually it’s MUCH faster. With the right compiler of course, indeed. But basically it compiles into usually a jump instruction with the to be emulated opcode indexed (and scaled for by 4, on 32 bit arch at least). For function pointers, entering/leaving functions takes time, pushing/popping pc, and even maybe stack frame keeping stuffs, whatever. So, in nutshell, selecting the right native code to emulate an opcode in ideal case is no more than a single JMP indexed opcode. I can hardly understand how function pointer solution can beat this 🙂 Sparse stuffs, what you mean can be a valid reason, but you can assign a “case” for each opcode even without too much valuable code, just having something, and it’s not sparse any more, simple enough 🙂 Actually I’ve just tried, with gcc on 64 bit: I got a *single opcode* solution, this one: jmp *.L4(,%rdi,8) In this case I didn’t had “default” case, but all the 256 (0…255) had case’s and the input had 8 bit data type (unsigned char) so it was optimized quite well, gcc (-Ofast option, version 5.4.0) may even noticed, that all cases are handled, no need to check for jump table boundaries, so it’s really just one x86 opcode nothing other. And this way, we didn’t even used stack which is again time and time.

I coded some emulators already, I always create assembly output to see what’s happening. Of course, this may not be optimal if another compiler is used, an non-x86 etc, if you have portable C code, you may end with a situation that some architecture + compiler handles that differently. However that’s true for function pointers as well, so I don’t thunk it’s a too valid reason to choose those over switch-case things. I think, it is always a good advice, to compile with -S and see the assembly output and judge about that (especially when you distribute the work in mostly binary form, so it’s up to you what others will get exactly).

LGB
LGB

I’m busy with my emulators since a while, just by “incident” I found your page while browsing about even more advanced solutions, like JIT stuffs 🙂 Now I am about to make a Python script to generate the C code 🙂 After all 65xx instruction set can be emulated with interpreting the combination of bits of the opcode itself for example, also I don’t need to look at on a C code with 256 case’s 🙂 Moreover, I can have option for 6502 NMOS, 65C02 and even 65CE02 (for my Commodore 65 and Mega 65 emulators). It’s easier to hack a shorter Python code, which emits then a C source, can be compiled into hopefully efficient native code. Also I wrote a JavaScript 65C02 emulator with this “generator” theory (from Python). But also in general, it’s always fun (well, at least for me) to work about emulation related tasks 🙂

LGB
LGB

https://github.com/lgblgblgb/xemu
But please note, that especially the CPU emulation is a total mess currently, I am about rewriting it (to the “generator” idea). It was the result of some generation already but for Javascript back to then for my Commodore LCD online emulator (see here: http://commodore-lcd.lgb.hu/jsemu/ ) but then it was hacked by hand a lot, then converted into C, etc etc, now even I can’t find things any more, time to re-code from zero 🙂 But it’s 65C02 and 65CE02 not plain 6502 (so no illegal opcodes etc, since these CPUs have “valid” opcodes there). My emulation is not cycle level exact (it the timing of execution the opcode itself), just the total amount of used cycles per opcode is booked. But this is also by will, ie it would be too slow to be more precise, ie for Mega65 I will need to emulate a CPU clocked at 48MHz for real, not counting the emulation of other hardware parts, including the VIC-III and VIC-IV, DMAgic, etc. But the theory is simple anyway, I want for case-switch, as I’ve already told, it’s great with decent compilers, ie a single jmp+jump table is what it’s compiled for, I can’t do that faster not even doing the emulation by myself in – for example – x86 assembly 🙂 Well, at least the “skeleton” of the emulation, maybe there can be tricks with the opcode realization itself (I’m on the way to provide inline assembler parts optionally if running on x86_64 for the speed, eg the Mega65 stuff again). But if you’re interested about my blah-blah style of writing, you can read about this here: http://cubed-borka.blogspot.hu/

LGB
LGB

Oh, one thing. When I say “cycle exact” I mean to have a way to call emulation on per clock “tick” basis, and having the possibility to provide the same “inner step of the opcodes” exactly, when it is done. Thus it’s a “sub-opcode precision level” or whatever we want to name it 🙂

LGB
LGB

Indeed, you’re absolutely right! A good CPU emulator of 65xx should be precise, one example: the behaviour of RMW opcodes (this is the point where Commodore 65 fails too crediting one source of C64 incompatibility issue – that’s why I mentioned this example! -, the RMW works “write old data” then “write modified data” on the good old NMOS 6502, but it’s not the case with the CMOS line of 65xx it seems (I’m not sure if SuperCPU is effected in case of 65816) – and many software for C64 used eg INC $D019 and similar tricks to acknowledge VIC-2 interrupt, that don’t work if you don’t emulate the behaviour of RMW opcodes. And this was only *ONE* example, not so timing related what we talked about, but still). My only “excuse” not to deal with cycle accuracy that I don’t emulate the NMOS CPUs first of all, but 65CE02 (and 65C02). Secondly, since we talk about mainly unreleased / new computers like Commodore LCD, Commodore 65 and the Mega 65, there is not so much information how these worked exactly at the hardware level anyway (ok, you can guess, the 65C02 at least is documented enough, but not the 65CE02 …), also, emulations, like these systems have “more important” problems than exact timing, ie, to work at all 😀 😀 For a C64 emulator for example, the situation would be different, and some would be able to say with a valid reason, that it’s a *must* to have sane emulation with exact timing. In my C65/M65 emulator I don’t even allow currently to do any tricks a basic demo would do, like raster effects etc, since the construction of the emulator has the focus now “to have the needed minimal features”, and rendering VIC-3/VIC-4 frames are done in one step, that’s ugly enough, I know, don’t say anything, please 🙂

But for not to be off-topic too, http://yapesdl.codeplex.com/SourceControl/latest#cpu.cpp
Surely, Yape is not my work (though the author is Hungarian too, like me, at least hehehehe), but it has a more sane way of cycle exact emulation, what may be interesting to check out. My only note would be, that the construction of this emulator with “embedded” switch stuff is not optimal in my opinion, the best would be to have one “giant” switch with not only the all possible opcodes, but all opcodes linearly combined with its all “steps”. That would be a nightmare to maintain then, I admit, that’s why I would use a generator script to emit that C code. But anyway, in case of emulated CPU clock around 1-2MHz, it’s cannot be a major bottleneck for a modern PC, though it is about, when I have to write a Mega65 emulator dealing with 48MHz, combined with a lots of extra features what M65 has.

Koen van Vliet

Hi, I used your emulator to emulate the CBS6000 computer that I designed and built.
https://hackaday.io/project/4406-cbs6000-8-bit-computer/log/28096-more-about-the-emulator

The emulator is able to run the same operating system as the real hardware which is really exciting. It is going to make debugging code a lot easier!