4x 68SEC000, MODE pin tied to ground so it starts in 8 bit mode. 40MHz.
512K SRAM per CPU, mapped in from $0000 to $7FFF (2K to 512K)
The DPRAM is mapped to $0 - $7FF, mirroring the SRAM, during "config" state.
The DPRAM is mapped to $7800 - $7FFF, still mirrored by SRAM, during "execute" state.
There is a register set that allows the location of DPRAM and reset state of any of the fore cores to be configured. It is read and writeable and is a single byte.
Code: Select all
Config Register BIT:
7 6 5 4 3 2 1 0
CPU3 CPU2 CPU1 CPU0
RS3 RM3 RS2 RM2 RS1 RM1 RS0 RM0
RSx = 1, /RESET low (reset) for CPU x
RSx = 0, /RESET high (run) for CPU x
RMx = 1, DPRAM mapped to $0000 for CPU x
RMx = 0, DPRAM mapped to $7800 for CPU x
The start up sequence would be to set the config register to 11111111, so all CPUs are halted with DPRAM mapped to $0000. The next step would be to load the vector table and PC into the lower bytes of the DPRAM, and a "loader" utility that fits in the remaining bytes from the top of the vector table to $3FF (the bottom 1K, minus the top two words). Then the config register is set to 11111101 and CPU0 is taken out of reset.
Loader utility would use two words at $3FC and $3FE. $3FC would be a pointer to a 1K block of SRAM. $3FE would be a status word for signaling copy completion and other status changes. Only bit 0 is defined. 0 = done. The QL setting bit 0 to 1 indicates the loader can act on the pointer in $3FC with the data from $400 - $7FF. When the block is copied, the loader sets bit 0 to 0.
I haven't made any provision for partial transfers of less than 1K, so if the last block is an odd amount, the remainder must be blanked to 0's for predictable behavior.
Conversely, bit 1 of $3FE is used for copying 1K blocks from SRAM to DPRAM. This means large results can be copied back.
One area where design simplicity can cause problems: DPRAM and SRAM shadow each other. If the DPRAM is relocated from $0 to $7C00 or vide versa and the contents of DPRAM changed, then is configured back to the previous location, the contents of DPRAM and SRAM will differ. They will be OR'd together. This could be used as a large "OR" machine, if that has utility, but the behavior is noted here because simple programming trickery can be used to allow for or eliminate this behavior.
Once the entire image that desired to be transferred is copied into SRAM, the last step is for the loader to overwrite itself, by copying the intended bottom 1K image to $0 in SRAM.
At this point, the CPU is reset by setting the config register to 11111111, waiting .2 seconds, then to 11111101 again. CPU0 will now restart and proceed with your job.
Using this method, all four CPUs can be given the same or different jobs. The assembly could be loaded on a per-CPU basis, or a common image could be loaded to all four CPUs. A 68K compatible OS, eg: Minerva could be copied in, and booted. Responses from microdrives could be emulated, keyboard input simulated, etc. though I anticipate the OS may need some sections replacing with NOPs or JMPs to prevent unfortunate hangs.
All CPUs fit in a single expansion slot, physically and logically. Where an expansion is at base, memory layout is as follows:
Code: Select all
CPU0 DPRAM base+$0000 - $07FF
CPU1 DPRAM base+$0800 - $0FFF
CPU2 DPRAM base+$1000 - $17FF
CPU3 DPRAM base+$1800 - $1FFF
Config Reg base+$3FFF (byte)
No shared memory between CPUs.
No provision at this time for 68882 co-processors (though this could be added if serious number crunching were required.)
No provision for any kind of interrupt.
Things I might play with in the future:
68EC030 instead of 68SEC000, with 32-bit private RAM. For the "power user."
64Kx16 DPRAM, for a much larger window into the CPU - or to be the CPU's entire memory map.
A fifth CPU, with a video generator attached, so it could offload all video processing.
Obviously this is a simple design with some limitations - mostly with the CPUs not having any shared memory with each other, but only with the host system. This means any transfers of data between the systems has to be managed by the host system under program control. The lack of an interrupt system means there's no way of requesting or alerting to a result.
Example uses for this type of simple expansion are:
Gaming: if you have, for example, a flight simulator, one CPU could track the flight dynamics, one the fuel/mass/gravity, one location/mapping, etc.
Other stuff: things that you had an idea and ran with.
I said up top, "some assembly required." Yes, you will probably need to know assembly to take full advantage of this type of device. However, I suspect that after a short period of time, there would be a custom Minerva version that could be loaded onto the CPUs, with video, storage and IO removed. This could be issued jobs/tasks by the host OS. This is strictly theory, because someone has to produce it. However, the idea of a total of 160MHz worth of clock cycles available to a single QL must be appealing to some.
FEEDBACK/IDEAS WELCOMED!