Symmetric multiprocessing...

Dave · Post by **Dave** » Thu Jun 06, 2019 8:51 pm

Things my little experimenter board has:

4x 68SEC000, MODE pin tied to ground so it starts in 8 bit mode. 40MHz.
512K SRAM per CPU, mapped in from $0000 to $7FFF (2K to 512K)
The DPRAM is mapped to $0 - $7FF, mirroring the SRAM, during "config" state.
The DPRAM is mapped to $7800 - $7FFF, still mirrored by SRAM, during "execute" state.

There is a register set that allows the location of DPRAM and reset state of any of the fore cores to be configured. It is read and writeable and is a single byte.

Code: Select all

Config Register BIT:
7   6    5   4    3   2    1   0
CPU3     CPU2     CPU1     CPU0
RS3 RM3  RS2 RM2  RS1 RM1  RS0 RM0

RSx = 1, /RESET low (reset) for CPU x
RSx = 0, /RESET high (run) for CPU x

RMx = 1, DPRAM mapped to $0000 for CPU x
RMx = 0, DPRAM mapped to $7800 for CPU x

It really is that simple. Some assembly required. In this example, we will use CPU0.

The start up sequence would be to set the config register to 11111111, so all CPUs are halted with DPRAM mapped to $0000. The next step would be to load the vector table and PC into the lower bytes of the DPRAM, and a "loader" utility that fits in the remaining bytes from the top of the vector table to $3FF (the bottom 1K, minus the top two words). Then the config register is set to 11111101 and CPU0 is taken out of reset.

Loader utility would use two words at $3FC and $3FE. $3FC would be a pointer to a 1K block of SRAM. $3FE would be a status word for signaling copy completion and other status changes. Only bit 0 is defined. 0 = done. The QL setting bit 0 to 1 indicates the loader can act on the pointer in $3FC with the data from $400 - $7FF. When the block is copied, the loader sets bit 0 to 0.

I haven't made any provision for partial transfers of less than 1K, so if the last block is an odd amount, the remainder must be blanked to 0's for predictable behavior.

Conversely, bit 1 of $3FE is used for copying 1K blocks from SRAM to DPRAM. This means large results can be copied back.

One area where design simplicity can cause problems: DPRAM and SRAM shadow each other. If the DPRAM is relocated from $0 to $7C00 or vide versa and the contents of DPRAM changed, then is configured back to the previous location, the contents of DPRAM and SRAM will differ. They will be OR'd together. This could be used as a large "OR" machine, if that has utility, but the behavior is noted here because simple programming trickery can be used to allow for or eliminate this behavior.

Once the entire image that desired to be transferred is copied into SRAM, the last step is for the loader to overwrite itself, by copying the intended bottom 1K image to $0 in SRAM.

At this point, the CPU is reset by setting the config register to 11111111, waiting .2 seconds, then to 11111101 again. CPU0 will now restart and proceed with your job.

Using this method, all four CPUs can be given the same or different jobs. The assembly could be loaded on a per-CPU basis, or a common image could be loaded to all four CPUs. A 68K compatible OS, eg: Minerva could be copied in, and booted. Responses from microdrives could be emulated, keyboard input simulated, etc. though I anticipate the OS may need some sections replacing with NOPs or JMPs to prevent unfortunate hangs.

All CPUs fit in a single expansion slot, physically and logically. Where an expansion is at base, memory layout is as follows:

Code: Select all

CPU0 DPRAM   base+$0000 - $07FF
CPU1 DPRAM   base+$0800 - $0FFF
CPU2 DPRAM   base+$1000 - $17FF
CPU3 DPRAM   base+$1800 - $1FFF
Config Reg   base+$3FFF (byte)

Things my little experimenter board doesn't have:

No shared memory between CPUs.
No provision at this time for 68882 co-processors (though this could be added if serious number crunching were required.)
No provision for any kind of interrupt.

Things I might play with in the future:

68EC030 instead of 68SEC000, with 32-bit private RAM. For the "power user."

64Kx16 DPRAM, for a much larger window into the CPU - or to be the CPU's entire memory map.
A fifth CPU, with a video generator attached, so it could offload all video processing.

Obviously this is a simple design with some limitations - mostly with the CPUs not having any shared memory with each other, but only with the host system. This means any transfers of data between the systems has to be managed by the host system under program control. The lack of an interrupt system means there's no way of requesting or alerting to a result.

Example uses for this type of simple expansion are:

Gaming: if you have, for example, a flight simulator, one CPU could track the flight dynamics, one the fuel/mass/gravity, one location/mapping, etc.

Other stuff: things that you had an idea and ran with.

I said up top, "some assembly required." Yes, you will probably need to know assembly to take full advantage of this type of device. However, I suspect that after a short period of time, there would be a custom Minerva version that could be loaded onto the CPUs, with video, storage and IO removed. This could be issued jobs/tasks by the host OS. This is strictly theory, because someone has to produce it. However, the idea of a total of 160MHz worth of clock cycles available to a single QL must be appealing to some.

FEEDBACK/IDEAS WELCOMED!

Dave · Post by **Dave** » Fri Jun 07, 2019 1:44 am

All the repeated components/tracks are done. Just need the control logic now - two GALs and a latch) plus some small odds and ends.

: Q-Hydra?

Derek_Stewart · Post by **Derek_Stewart** » Fri Jun 07, 2019 9:44 am

Hi Dave,

Looks great, I would like one...

One question, who is going to write the driver software?

You talk about a Flight Simulator to use the enhanced CPU hardware, which sounds excellent idea, but the same question, who would write that game simulation software?

The major problems I had in the past bring new hardware to the QL World, is that there is no new software to take into account the new hardware. Peple only want to use old software that will not run unless there is onlt 128K ram and 4/8 Colours.

Personally, I would buy all the idea you are bringing, but I do not have the software programming skills to implement upto date applications and game software to take into account the new hardware.

This sound a little negative, but the Q60 suffered from this attitude and to some degree it is still present now.

Maybe I should rear all the manual again and retrain myself as a software engineer to development the necessary software to use the excellent hardware that starting to appear.

Sorry for the dampener, I will be back to sleep and try and read TTs badly written manuals...

Dave · Post by **Dave** » Fri Jun 07, 2019 9:47 am

How good are your friends?

Pr0f · Post by **Pr0f** » Fri Jun 07, 2019 11:19 am

that's a very sparsely populated board !

Derek_Stewart · Post by **Derek_Stewart** » Fri Jun 07, 2019 12:54 pm

Dave wrote:How good are your friends?

Do have many friends, the boys in the pub, do not seem know anything about Multiprocessing or QPTR programming. Only disposal beer...

Can you supply a 3D representation of a populated board?

Would this new board be standard alone or plug into anothe QL, maybe iss8?

Dave · Post by **Dave** » Fri Jun 07, 2019 5:13 pm

Pr0f wrote:that's a very sparsely populated board !

No glue logic, no caps, no buffers, no nothing yet!

Derek Stewart wrote:Can you supply a 3D representation of a populated board?

Would this new board be standard alone or plug into anothe QL, maybe iss8?

No "populated" representation. I don't use KiCad or any of the fancy packages.

The board would plug into any QL.

I repeat, this is VERY speculative.

Now I have the core blocks done I can move them around the PCB and decide where the logic and voltage regulator will go.

Dave · Post by **Dave** » Sat Jun 08, 2019 8:25 pm

Adding in the logic. My approach on this is very ad hoc and not at all considered. I'll clean it up later. It's just a hobbyist prototype.

Top GAL is the decoder, which creates the selects for the four DP SRAMs. Bottom GAL is the Reset Manager, which uses four flipflops to set the state of /RESET for each CPU. I don't think I can simplify it any more than that.

The top GAL decodes down to four 2K blocks, creating CPU0..3 which are the /OEL enables for the four DPRAMs from the QL's side. The datasheet is a little weird, and describes /CEL and /OEL a little differently to normal SRAM. It implies /OEL allows reads or writes and !/OEL inhibits reads and writes, but another place it seems to suggest that /OEL affects reads only. Which would be weird. It also has a CARRY pin, for A11..19 being correct, which goes to...

The bottom GAL decodes down to the byte level, and maintains four outputs for /RESET. It takes CARRY and A0..10. It can then decode to the individual byte level, and toggle internal flipflops.

Yes, I know I need to put some address lines from the RM to the decoder, so I have the pins for the reset lines. I also need to have a GAL handle address decoding for each of the CPUs - though that will be very simple and take four more GALs, each identically programmed.

Also, the 68K vector area and etc. are so small, it might be unnecessary to remap the DPRAMs, IF they do not conflict with a ROM image. I haven't tackled that part of the design yet.

Pr0f wrote:you can use the FC0-FC2 to signal a fetch of restart vector - as it's different to other vector fetches - and could remap the dual port ram when a reset is done on the off board processor, but otherwise leaving it alone.

If you'd like to expand on that I'd be interested to read it.

Dave · Post by **Dave** » Sat Jun 08, 2019 11:23 pm

Ok, individual unit GALs added, plus debounce caps.

Dedicating the next couple of hours to working out the logic a bit more.

Quick back of envelope calculation for power budgetL

Code: Select all

Item   draw     Qty   TOTAL
CPU     25mA     4      100mA
SRAM   250mA     4     1000mA  (50mW per MHz at 20 MHz, 10ns rated part)
DPRAM  325mA     4     1300mA
GALS    75mA     6      450mA
                       2850mA

This one will need an external supply. If I do a second one, I'll probably implement the whole thing at 3.3V, as those parts are more modern and much more power efficient. It would also lower costs, as 3.3V DPRAM is cheaper than 5V.

Interestingly, I did just find a bunch of 4Kx8 DPRAM, so that might be a future option too. I would do a straight awap, but it has a very different pinout from the 2Kx8, and I am just doing a proof of concept here.

After I finish with this in a couple of hours, I am going to dive in with the serial card. I'd like to get those PCBs ordered this week.

Dave · Post by **Dave** » Sat Jun 08, 2019 11:25 pm

ALSO!

Next version, I will probably rotate the PCB contents 90 decreed, provide a standard expansion header outside the QL case, and put the CPU and supporting logic on that. That way, the DPRAM and SRAM can be QL internal, and the CPU can go on an expansion card with its glue logic. That way, the CPU card could be 68000, Z80, 6502, whatever you want. You could even have a mixed system.

Just musing out loud.

The Sinclair QL Forum

Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...

Re: Symmetric multiprocessing...