MOS 6502 CPU EMULATOR IN C

16/05/2015

This is my C++ emulator of the MOS Technology 6502 CPU. The code emulates a fully functional 6502 CPU and it seems to be pretty fast too. Some minor tricks have been introduced to greatly reduce the overall execution time.

Interrupt and bus read/write operations are emulated as well.

Github repo: https://github.com/gianlucag/mos6502

6502-300x135

What's a 6502???

Here a brief descrption: http://en.wikipedia.org/wiki/MOS_Technology_6502

Main features:

Still to implement

The emulator was extensively tested against this test suite:

https://github.com/Klaus2m5/6502_65C02_functional_tests

and in parallel emulation with fake6502 http://rubbermallet.org/fake6502.c

so expect nearly 100% compliance with the real deal... at least on the normal behavior: as I said stuff like illegal opcodes or hardware glitches are currently not implemented.

Why yet another 6502 emulator?

Just for fun :). This CPU (and its many derivatives) powered machines such as:

and many other embedded devices still used today. You can use this emulator in your machine emulator project. However cycle accuracy is not yet implemented so mid-frame register update tricks cannot be reliably emulated.

Some theory behind emulators: emulator types

You can group all the CPU emulators out there in 4 main categories:

Graph based emulators are the most accurate as they emulate the connections between transistors inside the die of the CPU. They emulate even the unwanted glitches, known and still unknown. However the complexity of such emulators is non-linear with the number of transistors: in other word, you don't want to emulate a modern Intel quad core using this approach!!!

for an example check this out: http://visual6502.org/JSSim/index.html

The PLA/microcode based are the best as they offer both speed and limited complexity. The switch-case based are the simpler ones but also the slowest: the opcode value is thrown inside a huge switch case which selects the code snippet to execute; compilers can optimize switch case to reach near O(log(n)) complexity but they hardly do it when dealing with sparse integers (like most of the CPU opcode tables).

Emulator features

My project is a simple jump-table based emulator: the actual value of the opcode (let's say 0x80) is used to address a function pointer table, each entry of such table is a C++ function which emulates the behavior of the corresponding real instruction.

All the 13 addressing modes are emulated:

// addressing modes
uint16_t Addr_ACC(); // ACCUMULATOR
uint16_t Addr_IMM(); // IMMEDIATE
uint16_t Addr_ABS(); // ABSOLUTE
uint16_t Addr_ZER(); // ZERO PAGE
uint16_t Addr_ZEX(); // INDEXED-X ZERO PAGE
uint16_t Addr_ZEY(); // INDEXED-Y ZERO PAGE
uint16_t Addr_ABX(); // INDEXED-X ABSOLUTE
uint16_t Addr_ABY(); // INDEXED-Y ABSOLUTE
uint16_t Addr_IMP(); // IMPLIED
uint16_t Addr_REL(); // RELATIVE
uint16_t Addr_INX(); // INDEXED-X INDIRECT
uint16_t Addr_INY(); // INDEXED-Y INDIRECT
uint16_t Addr_ABI(); // ABSOLUTE INDIRECT

All the 151 opcodes are emulated. Since the 6502 CPU uses 8 bit to encode the opcode value it also has a lot of "illegal opcodes" (i.e. opcode values other than the designed 151). Such opcodes perform weird operations, write multiple registers at the same time, sometimes are the combination of two or more "valid" opcodes. Such illegals were used to enforce software copy protection or to discover the exact CPU type.

The illegals are not supported yet, so instead a simple NOP is executed.

Inner main loop

It's a classic fetch-decode-execute loop:

while(start + n > cycles && !illegalOpcode)
{
   // fetch
   opcode = Read(pc++);

   // decode
   instr = InstrTable[opcode];

   // execute
   Exec(instr);
}

The next instruction (the opcode value) is retrieved from memory. Then it's decoded (i.e. the opcode is used to address the instruction table) and the resulting code block is executed.

Public methods

The emulator comes as a single C++ class with five public methods:

mos6502(BusRead r, BusWrite w);

it's the class constructor. It requires you to pass two external functions:

uint8\_t MemoryRead(uint16\_t address);
void MemoryWrite(uint16\_t address, uint8\_t value);

respectively to read/write from/to a memory location (16 bit address, 8 bit value). In such functions you can define your address decoding logic (if any) to address memory mapped I/O, external virtual devices and such.

void NMI();

triggers a Non-Mascherable Interrupt request, as done by the external pin of the real chip

void IRQ();

triggers an Interrupt ReQuest?, as done by the external pin of the real chip

void Reset();

performs an hardware reset, as done by the external pin of the real chip

void Run(uint32\_t n);

It runs the CPU for the next 'n' machine instructions.

Links Some useful stuff I used...

http://en.wikipedia.org/wiki/MOS_Technology_6502

http://www.6502.org/documents/datasheets/mos/

http://www.mdawson.net/vic20chrome/cpu/mos_6500_mpu_preliminary_may_1976.pdf

http://rubbermallet.org/fake6502.c


Torna alla home

Commenti

12 commenti


Koen van Vliet (8by8mail@gmail.com)
il 19 Novembre 2015 alle 15:55

Hi, I used your emulator to emulate the CBS6000 computer that I designed and built.
https://hackaday.io/project/4406-cbs6000-8-bit-computer/log/28096-more-about-the-emulator

The emulator is able to run the same operating system as the real hardware which is really exciting. It is going to make debugging code a lot easier!

Rispondi


Gianluca (Admin)
il 19 Novembre 2015 alle 23:00

Hi Koen,
the main loop of the CPU emulator executes "cycles" instruction in one go (or until it found an illegal opcode). You can change the behavior to make it execute just one instruction at a time.
I have implemented the while loop to speed up the emulation for my Raspberry SID player which uses a real SID chip with an emulated MOS 6502 CPU running on the rPi.

Rispondi


LGB (spam@lgb.hu)
il 3 Settembre 2016 alle 02:45

I am not sure why you state that switch-case is slower. Actually it's MUCH faster. With the right compiler of course, indeed. But basically it compiles into usually a jump instruction with the to be emulated opcode indexed (and scaled for by 4, on 32 bit arch at least). For function pointers, entering/leaving functions takes time, pushing/popping pc, and even maybe stack frame keeping stuffs, whatever. So, in nutshell, selecting the right native code to emulate an opcode in ideal case is no more than a single JMP indexed opcode. I can hardly understand how function pointer solution can beat this :) Sparse stuffs, what you mean can be a valid reason, but you can assign a "case" for each opcode even without too much valuable code, just having something, and it's not sparse any more, simple enough :) Actually I've just tried, with gcc on 64 bit: I got a

single opcode

solution, this one: jmp *.L4(,%rdi,8) In this case I didn't had "default" case, but all the 256 (0...255) had case's and the input had 8 bit data type (unsigned char) so it was optimized quite well, gcc (-Ofast option, version 5.4.0) may even noticed, that all cases are handled, no need to check for jump table boundaries, so it's really just one x86 opcode nothing other. And this way, we didn't even used stack which is again time and time.

I coded some emulators already, I always create assembly output to see what's happening. Of course, this may not be optimal if another compiler is used, an non-x86 etc, if you have portable C code, you may end with a situation that some architecture + compiler handles that differently. However that's true for function pointers as well, so I don't thunk it's a too valid reason to choose those over switch-case things. I think, it is always a good advice, to compile with -S and see the assembly output and judge about that (especially when you distribute the work in mostly binary form, so it's up to you what others will get exactly).

Rispondi


Gianluca (Admin)
il 3 Settembre 2016 alle 11:35

You're absolutely right! Usually a switch case is translated into a jump table. The function pointer solution I used just to make the code more readable but yes.. a switch case is usually the way to go. Feel free to fork and improve the code, there's so much room for improvement!

Rispondi


LGB (spam@lgb.hu)
il 3 Settembre 2016 alle 13:14

I'm busy with my emulators since a while, just by "incident" I found your page while browsing about even more advanced solutions, like JIT stuffs :) Now I am about to make a Python script to generate the C code :) After all 65xx instruction set can be emulated with interpreting the combination of bits of the opcode itself for example, also I don't need to look at on a C code with 256 case's :) Moreover, I can have option for 6502 NMOS, 65C02 and even 65CE02 (for my Commodore 65 and Mega 65 emulators). It's easier to hack a shorter Python code, which emits then a C source, can be compiled into hopefully efficient native code. Also I wrote a JavaScript 65C02 emulator with this "generator" theory (from Python). But also in general, it's always fun (well, at least for me) to work about emulation related tasks :)

Rispondi


Gianluca (Admin)
il 4 Settembre 2016 alle 11:17

Great, could I have a look at your projects? I should make my emulator cycle exact so I could use it on some machine emulator but, you know, there are many cpu emulators available out there and they are pretty good too

Rispondi


LGB (spam@lgb.hu)
il 4 Settembre 2016 alle 12:25

https://github.com/lgblgblgb/xemu
But please note, that especially the CPU emulation is a total mess currently, I am about rewriting it (to the "generator" idea). It was the result of some generation already but for Javascript back to then for my Commodore LCD online emulator (see here: http://commodore-lcd.lgb.hu/jsemu/ ) but then it was hacked by hand a lot, then converted into C, etc etc, now even I can't find things any more, time to re-code from zero :) But it's 65C02 and 65CE02 not plain 6502 (so no illegal opcodes etc, since these CPUs have "valid" opcodes there). My emulation is not cycle level exact (it the timing of execution the opcode itself), just the total amount of used cycles per opcode is booked. But this is also by will, ie it would be too slow to be more precise, ie for Mega65 I will need to emulate a CPU clocked at 48MHz for real, not counting the emulation of other hardware parts, including the VIC-III and VIC-IV, DMAgic, etc. But the theory is simple anyway, I want for case-switch, as I've already told, it's great with decent compilers, ie a single jmp+jump table is what it's compiled for, I can't do that faster not even doing the emulation by myself in - for example - x86 assembly :) Well, at least the "skeleton" of the emulation, maybe there can be tricks with the opcode realization itself (I'm on the way to provide inline assembler parts optionally if running on x86_64 for the speed, eg the Mega65 stuff again). But if you're interested about my blah-blah style of writing, you can read about this here: http://cubed-borka.blogspot.hu/

Rispondi


LGB (spam@lgb.hu)
il 4 Settembre 2016 alle 13:00

Oh, one thing. When I say "cycle exact" I mean to have a way to call emulation on per clock "tick" basis, and having the possibility to provide the same "inner step of the opcodes" exactly, when it is done. Thus it's a "sub-opcode precision level" or whatever we want to name it :)

Rispondi


Gianluca (Admin)
il 4 Settembre 2016 alle 14:00

Thanks for the links! Yes for cycle accuracy we mean the same thing, that is being able to 'step' the emulator on a clock tick basis (Or whatever sub tick level). As you surely know, many demos and games use to update VIC registers mid frame on very precise moments to show more sprites, have more sounds and other tricks.

Rispondi


LGB (spam@lgb.hu)
il 4 Settembre 2016 alle 15:32

Indeed, you're absolutely right! A good CPU emulator of 65xx should be precise, one example: the behaviour of RMW opcodes (this is the point where Commodore 65 fails too crediting one source of C64 incompatibility issue - that's why I mentioned this example! -, the RMW works "write old data" then "write modified data" on the good old NMOS 6502, but it's not the case with the CMOS line of 65xx it seems (I'm not sure if SuperCPU is effected in case of 65816) - and many software for C64 used eg INC $D019 and similar tricks to acknowledge VIC-2 interrupt, that don't work if you don't emulate the behaviour of RMW opcodes. And this was only

ONE

example, not so timing related what we talked about, but still). My only "excuse" not to deal with cycle accuracy that I don't emulate the NMOS CPUs first of all, but 65CE02 (and 65C02). Secondly, since we talk about mainly unreleased / new computers like Commodore LCD, Commodore 65 and the Mega 65, there is not so much information how these worked exactly at the hardware level anyway (ok, you can guess, the 65C02 at least is documented enough, but not the 65CE02 ...), also, emulations, like these systems have "more important" problems than exact timing, ie, to work at all :D :D For a C64 emulator for example, the situation would be different, and some would be able to say with a valid reason, that it's a

must

to have sane emulation with exact timing. In my C65/M65 emulator I don't even allow currently to do any tricks a basic demo would do, like raster effects etc, since the construction of the emulator has the focus now "to have the needed minimal features", and rendering VIC-3/VIC-4 frames are done in one step, that's ugly enough, I know, don't say anything, please :)

But for not to be off-topic too, http://yapesdl.codeplex.com/SourceControl/latest#cpu.cpp
Surely, Yape is not my work (though the author is Hungarian too, like me, at least hehehehe), but it has a more sane way of cycle exact emulation, what may be interesting to check out. My only note would be, that the construction of this emulator with "embedded" switch stuff is not optimal in my opinion, the best would be to have one "giant" switch with not only the all possible opcodes, but all opcodes linearly combined with its all "steps". That would be a nightmare to maintain then, I admit, that's why I would use a generator script to emit that C code. But anyway, in case of emulated CPU clock around 1-2MHz, it's cannot be a major bottleneck for a modern PC, though it is about, when I have to write a Mega65 emulator dealing with 48MHz, combined with a lots of extra features what M65 has.

Rispondi


Gianluca (Admin)
il 5 Settembre 2016 alle 18:53

Your knowledge of this stuff is absolutely stunning.. do you recommend any website on the subject? If you are interested checkout my SID player project on this blog, it makes use of the emulator..I guess you'll like it :-)

Rispondi


6502 Microprocessor Internals | My Technical Blog ()
il 6 Ottobre 2019 alle 03:07
Rispondi


Il tuo nome o email (Se usi l'email potrai essere notificato delle risposte)
Il tuo messaggio