In this case the problem with changing the clock is even worse because it's NOT only due to switching at pre-defined intervals because otherwise one gets illegal pulse lengths on the clock line. The problem is that the whole issue mentioned relies on wrong assumptions.
First, the 020 does not do anything in 5 clocks. It does it in 3 clocks MINIMUM. The maximum is highly extendable using DTACKL. So there's that.
Before a design decision is made, it is important to understand how the CPUs work on the bus (and then specifically the QL bus, mostly due to some assumptions made by the 8301).
1) 68008 and 68000 of any kind: a bus access is minimum 4 clocks, divided into 8 phases, one half clock each. (*)
2) 68020: a bus access is minimum 3 clocks, divided into 6 phases, one half clock each.
3) 68020: a bus cycle can have two forms, the synchronous version can in theory be as short as 2 clocks, and the regular asynchronous one is the same as 68020.
All the CPUs operate in terms of phases, which are basically half cycles.
HOWEVER and this is very important, a bus cycle can start at ANY full clock and they do not have to be successive, so a number of clock cycles can occur between two bus accesses, including zero. For regular 68k (including the 008 sub-species) there is a maximum which comes from the longest possible instruction execution time. For the 68020 and 030 in theory it can be almost any number if the cache is enabled. In other words, there is no guarantee whatsoever that a convenient point to switch clocks and an access request will coincide - assuming the clock can even be switched on non-static versions of the CPU, only the 68SEC000 being that sort, within the required clock rates.
The whole point of defining the bus protocol the way it is, is for the periphery to be able to run WITHOUT sharing the same clock as the CPU, the clocks are only given in order to derive the basic requirements for the signals, and how long something is or has to be valid before the signal that qualifies it, and how long it has to stay valid or will sty valid after the signal that qualifies it.
Note that the QL peripherals (except the 8301) have NO clock reference at all, just address, data, read/write, data strobe, data transfer acknowledge.
Why is the 8301 the exception? Because it is possible to figure out what the CPU is doing at which time ASSUMING the logic doing that is also supplying the CPU clock, it's possible in some cases to anticipate the CPU response and generate signals like DTACKL somewhat in advance of it actually being needed in order to get the CPU to respond the quickest possible IN THAT PARTICULAR CASE. As soon as the CPU clock is not coupled to the 8301 clock, but runs from it's own, the assumption can fail and data is corrupt. Also, it pays to see what actual signal the 8301 uses to synchronize the CPU access to it's own internal workings (and of course by association the DRAM it controls): it looks at DSL from the CPU and it generates DTACKL to the CPU. And that is ALL. That's also how the CPU designers envisioned it to work and built the bus protocol around this.
So, assuming (And it's a big assumption) you actually can switch clocks (which requires it's own separate logic), what are the problems in designing this into a QL system?
1) The only practical way to do this on a clock-by-clock basis is to have the clock generated from a master clock, and this must be true for both the CPU and 8301. This approach cannot work reliably without some very special and complex logic if the clocks are completely independent.
2) First, there idea is to 'slow down the clock' when the video area is being accessed. This requires the address to be known before the clock slow down is to happen, because it does not have to happen for every address. Th CPU has to work at it's default high clock when this happens or otherwise it's being slowed down all the time. The only sure-fire way on every 68k member to know that the address is real is to wait for /AS to go low. Which is done after a certain clock edge, and by the time it's detected, in order to slow down the clock, you have already wasted at least one half clock at the higher speed, before you can even start figuring out the point of switching the clock over to the slow rate. The amount of logic multiplies and there is additional slowdown. On 68k, the read and write cycles are not the same, as the write data appears after the address so now that has to also be catered for, maikg it even more complex, unless RAM shadowing is performed so only write cycles are being looked at.
3) Since all 68k CPUs and especially 68020+ overlap instruction fetching, instruction execution and (if needed) writing data back into RAM, slowing down the clock to slow down a bus access slows down the WHOLE CPU and this is particularly problematic on write cycles, which is what must always be slowed down when an 8301 is in the pucture. This is even more a problem on 68020+ where the data write can be initiated after the next instruction has been fetched, and indeed may already be executing (fetch can be from cache on 68020+ and never even appear on the bus). While a simple 'wait' on the bus done by extending the time before /DTACK is generated would enable overlapping of the next instruction(s) at full speed, slowing down the clock slows down everything.
Now let's go back to the 8301. What does it do in order to solve this exact problem, since it has to make the CPU wait until it's done with reading the RAM in order to generate the screen? Does it slow down or gate off the clock? Well, no - it just delays the moment /DTACK goes low. Given how 'frugal' Sinclair was about any complexity in hardware, this should be telling.
It's not like we don't have to extra examples of this - GC and SGC. They both run the CPU at a clock rate completely independent of the 8301. They also only generate write access to the 8301 DRAM. And do they mess around with the CPU clock? Well, no - in fact on the SGC the oscillator is connected directly to the clock pin on the CPU. What do they do? Surprise, surprise: they ignore the DTACKL signal until the appropriate time for that clock rate, in other words, they again implement a delay until /DTACK is generated to the CPU, they just use the 8301 supplied DTACKL and add a correction.
Why all this? Well, because that's the whole POINT of /DTACK. To avoid all sorts of complexity in gating or switching clock, the designers of the CPU reduced that problem to one single signal, exactly to solve this whole problem. Of course, one might want to attempt the difficult and arguably pointless in order to learn how difficult and pointless it is, so I'd propose a homework to attempt synthesizing that kind of logic. Then by comparison, make logic to delay a single signal a known amount. The latter seems MUCH simpler to me.
But I digress, so back to the 8301 again. WHY are there problems in using a different (and higher as lower would be kind of pointless) clock for the CPU? Well, because the 8301 logic assumes the 8301 is generating the CPU clock, it also assumes it can use that clock (or rather 2x that clock) to generate all required delays, or predict all delays in the CPU operation - remember, at the start i said, the CPU generates bus timing at half clock periods, or, at 2x the clock rate - which the 8301 conveniently has at it's disposal.
The logic design assumes a delay between it's generating DTACKL=L and the CPU actually recognizing it at the proper point, so it activates DTACKL at a specific clock edge before the one that the CPU uses to detect DTACKL to be sure it's detected at the right point in time. In principle, it could activate it at any point but then there could be anything between some 10s of ns and one full CPU clock delay until the CPU figures it's got a valid DTACKL, so the timimg would vary, which we don't want if we need to be very precise in generating the picture. There is also the assumption that the clock runs at the given speed, so when DTACKL goes low, it's counting on the other signals including data written or read to be present on the bus for some lengths of time, on which the DRAM timings it generates are based.
So when the CPU runs at it's own clock, the logic inside the 8301 works wrong because it's looking at the wrong clock to make assumptions - it's own rather than the real CPU clock. The 8301 is designed around the fact that the data read from memory is sampled by the CPU one clock after it detects DTACKL being low, so counts on having a bit less time than that until the RAM actually has to present valid data. It also data from the CPU not to be removed for a defined while after the CPU detects DTACKL going low. When the CPU runs from a different clock, this is no longer the case. The edges the 8301 is using and the ones the CPU is using no longer coincide so an error can happen. If the CPU clock is fairly similar to that of the 8301, the built-in leeway in the 8301 logic and daya access speed from RAM will tolerate the difference and all should be fine. However, if the clock is faster, the CPU will react faster unbeknown to the 8301, and will expect the data from the 8301 controller RAM earlier than it's available (the 8301 taking it's sweet time running at a slower clock and making assumptions it's still got time) and read gibberish, or, it will expect the 8301 controlled RAM to have accepted the data provided before it actually is (again, the 8301 taking it's sweet time etc...) and remove the data too early. All of this will happen because the 8301 activates DTACKL TOO EARLY expecting the CPU to be slow to react and assuming how slow it will be based on the clock it thinks it's providing to the CPU.
So, what is the solution? Well, very simple - because the problem is the 8301 is providing DTACKL too early, it follows that DTACKL from the 8301 needs to be delayed until the appropriate point where the CPU will react as quickly... or, to be precise, as slowly, as the old 68008.
I think there is an obvious advantage of providing a delay to ONE signal rather than figuring out a whole bunch of complex logic to switch between two synchronous clocks based on a third asynchronous one. Like... what happens when one input of an AND gate goes from 0 to 1 at the same time the other goes from 1 to 0? Or, what happens when the clock input to a flip-flop gets an edge coinciding with the change of the input to the flip-flop? Well, this sort of problem is what is implied in the switching clocks solution.
Adding a delay is exactly what the (S)GC does. Note that the approach works equally well on 68k and 68020 even though the vus timing and the clock frequency is different. However, the (S)GC has one advantage - it can apply a pre-calculated delay because it only has to cater for a single chosen CPU clock, 16 or 24MHz. Things do get slightly more complex when the clock can be almost anything.
One more important point has to do with how the DRAM works. DRAM latches addresses by default - and it also works a bit differently than ordinary static RAM. For a static RAM data is written as the address for it to be written to is decoded. The access time given is always pertinent to the time all conditions needed for data to be written are valid simultaneously. So, address, data, chip select and write must all be valid for (access time) duration for data to be written. IF any of those signals are active before or after the time when they are all coincident in the proper write configuration, the SRAM does not care.
DRAM has one odd characteristic that it decodes the place the data is to be written to in two parts, first the row, then the column in the memory cell array. SRAM also has rows and columns but it decodes them at the same time (and the row and column address is provided as a unified address all at the same time).
Because DRAM does it in two stages, the data does not have to be available for writing until the second stage. This means that, unlike for SRAM, it can arrive 'late'.
The 8301 was constructed to drive DRAMs that used this feature but did not latch data. They do latch the address by default since it's provided multiplexed. What this implies is, that the CPU can actually remove the address altogether if the column address has been latched into the DRAM, this happens a bit after the /CAS signal on the DRAM has gone low. Once that's done, the DRAM does not care what the state of the address lines is any more.
In contrast, older DRAMS required the data to be written to remain available until a short time after /CAS goes high again, which is what signals the end of the access cycle - this is much later than the address must still be available.
If only write cycles are performed to 8301 controlled DRAM, this is where it will fall short if the CPU runs faster - since the data availability scales with CPU speed, the CPU can end up removing the data before it's actually written into the DRAM.
However, newer DRAMs, in fact those proposed on this replacement motherboard, have a 'data latch' feature. IF they detect the /WE signal, which tells the DRAM data is to be written, to be active at the point where /CAS goes low, they will not only latch the column address but also the data to be written, after which the data can be removed from the bus even though it will actually be written into the RAM cells when /CAS goes high again - the DRAM chip itself buffers the data.
Unfortunately, as tempting as this is regarding the 'fast CPU problem' because it may solve that problem all by itself, there is the case of writing to the 8301 internal control register, which is done by faking a DRAM write without actually doing anything to the DRAM. It's difficult to figure out what assumptions about timing have been done here, but there is a way around it, and I repeat, ONLY if RAM shadowing is implemented so ONLY write cycles are done to the 8301.
The trick is to latch data going to the 8301. This is easy to do by replacing the bidirectional buffer 74LS245 between the CPU bus and the 8301 RAM data bus, by a unidirectional latch - unidirectional in the direction from CPU to 8301, as now we only need to write data. This would be a 74LS373 chip, which can also have it's output 3-stated (disconnected from the 8301 side) when the 8301 needs to read data and wants no interference from the CPU.
The 373 is a transparent latch which will freely pass data when the latch is enabled, but when it's disabled, the output will freeze in the state the input was at that time. Because of this we can use /DS from the CPU to latch the data - when DS is low, data from the CPU is valid and stable, and will be passed directly to the 8301 side if the output of the latch is enabled. When the CPU deactivates /DS because it has detected DTACKL=low and as a result finishes the access (signaled by /DS going high again) data presented by the CPU is still guaranteed to be valid long enough for the latch to freeze it, and so it remains at the disposal of the 8301 even though the CPU may have removed it earlier than the 8301 expected it. In fact, it will remain available to the 8301 until the next access to the 8301 side, and in order for that to happen, the CPU must start another cycle.
But as I said - all of this requires 8301 RAM shadowing and reducing all accesses to 8301 to writes only. At the moment the proposed motherboard does not implement that. Catering for both cases (read and write) is more complex - and also may end up slower for some values of the clock frequency.
Aside: if both read and write to the 8301 had to be supported (although the reason escapes me...) there would have to be a bidirectional latch or register, such as the 74HCT646 as used on Qubide - and (note!) the SGC on the entire data bus, for the very same reason - it needs to cater for issue 5 boards that have the 8302 on the 8301 side of the data bus, so there has to be a provision for reading as well.
(*) - from waaaay up there in the beginning: The very first versions of the 68008 (running at only 4 MHz and in a very expensive and then still not standardized white ceramic 64-pin DIP package) had a 4 clock read and 5 clock write. This was documented in the very first data books, but this was essentially prototype silicon. It was never widely available and only very old textbooks show that timing. All commercially available chips run at 8MHz or more. The first 4MHz white ceramic version is VERY desirable by collectors!
To get back to the poll topic:
My choice would be the 68008FN, IF the timing and clock rates are unchanged. The reason for that is that it's fairly easy to add the two extra address lines to the existing J1 expansion bus by replacing lines FC2 and E (which were both outputs so no contention is possible). This introduces ONE incompatibility to my knowledge, and that is using older 6800 (note absence of third 0) peripheral chips which require the E signal as a clock. The ONE notable peripheral that used that was the QEP III EPROM programmer (which could actually probably be modified to work without it). Other than that there may be homebrew hardware that udes it but to my knowlwdge nothing else did. FC2 was never used for anything. Adding those two lines is necessary if an external >1M RAM expansion is implemented. It can be avoided, but then there has to be extra logic that would enable other typical peripherals to be used without them aliasing into every 1M boundary of the now expanded 4M address range. Using the old 68008FN also uses less board space and pretty much leaves everything else alone - added logic to cater for 2 more address lines is less than the same catering for 4 more of the option below. The 68008 was available up to 12MHz but these are rare, mostly the 10MHz variant is available and it could run at 11MHz which is conveniently available on the IPC.
A 68SEC000 is also a very interesting option that is also cheaper and uses less power, yet can run much faster, and address up to 16M address space. It is possible to add even the two extra address lines on top of the two added by a 68008FN by using two of the RGB lines on the J1 connector - again, to my knowledge nothing used those, except some ambitious homebrew project. Or, again, added logic is needed to prevent regular peripherals of aliasing into every 1M boundary inside the 16M now available. Favorite operating frequency could be 3x 7,5MHz, or... 4x 7,5MHz. For this case a clock synchronizing circuit could be built for the 8301 but it's a moot point - better add logic to completely decouple CPU clock from 8301 clock.
As Dave has already noted - keep in mind that just like the original 68008 had 1M of address space, you could not use all of it as RAM as EVERYTHING (ROM, ROM port, IO, RAM) has to fit in that 1M. It's the same with 4M or 16M - especially since added address space makes it possible to have extra expandability previously not possible. Even with 4M or 16M RAM implemented, not all of it would be available.