Now that I have thoroughly thrashed the 8302-IPC combo
it is time to shortly return to the 8301. There are two aspects of the 8301 which need to be considered and they are to an extent dependent on each other.
The 8301 plays two roles in the QL, and that is the main decoder that maps the ROM, RAM and I/O into the CPU address space, and video interface.
As the decoder, it only uses the lowest 18 out of 20 address lines to do this, so the entire address space it covers is 256k. The actual way it does this has already been touched upon in the very first posts in this thread. The 8301 has pins for address lines A17 and A16, but not A15, A14 and A6 which it needs to properly decode the I/O area. It gets those through the address multiplexers for the DRAM chips. This is where we get into a different aspect of the 8301, and that is that the two ULA chips were most probably originally conceived as a single larger chip.
I will try to shortly note a few oddities about the 8301, but as most of them are not externally correctable, it's more as a reference.
A) At least one control line for the address multiplexers, data buffer, 8302 select and /DTACK may have been omitted. There are strong indications that the 8301 and 8302 ULAs were originally a single chip design, that got split into two chips at the 11th hour. As a result there are a number of signals that could have been executed differently with a possible saving of at least one pin on the 8301 ULA and as mentioned certainly one pin on the 8302 ULA, if the latter was connected directly to the CPU from the start, rather than from iss6 motherboards and the introduction of the HAL chip.
This is something that has already been implicitly addressed in the previous posts, where an 'advanced HAL' is proposed to solve several inherent problems of the motherboard circuits. This can simply be done as a modern device these days is easily fast enough, while the HAL would have been marginal at the given speed grade. The HAL chip is actually a 'hardwired' PAL device, an early programmable logic device (PLD) which is why an appropriately programmed GAL can be used to replace it, given that GAL chips were made to replace many types of PAL devices.
When complex logic is integrated into chips like the ULA, it is always a trade-of between the features you need to integrate to make the production of your computer simple and cheap enough, and the features that you get with a couple of really cheap standard components, or even a smaller ULA type chip. In this case I think there were some serious missed opportunities.
B) Simple logic changes would increase RAM access speed by 15% and provide more robust refresh. The 8301 significantly slows down CPU access to the RAM due to a lot of bandwidth being used to display the screen. Even a few tweaks could have made this non-trivially faster. The standard ULA reduces the available bandwidth to the CPU to only 46.66% of theoretical, and that will be slightly less for writing. A very simple tweak would have brought that up to 53.85%, a 15.4%% increase in speed - not trivial in benchmarks of the day! There is also a small problem with refresh due to the way addresses are multiplexed into rows and columns, that would also easily have been corrected by a minor change.
C) The 8301 addresses two banks of 64k RAM but can only display video from the first bank. Oddly, the later revisions of the ULA did include an extra video control bit to select NTSC video timing (bit 6 in the control register), but not a control bit that would simply use CAS1L instead of CAS0L to read video data and display it from the second RAM bank. Since the addresses used by screen 1 are normally used for system data structures, having the option to use a screen 2 and 3 may have resulted in far more use of multiple screen areas - alas, this was never done so it is what it is and here we are. On the other hand, it is actually possible to add externally with a bit of logic, but that would not be the first thing I'd do if I was extending the design.
D) 16 colors instead of (or as an alternative to) flash. One could argue that adding 16 colors would require adding an intensity bit, which would require an extra pin (see (A)!), there are ways to do this without adding a pin and still have the output pins be pure TTL logic, no analog circuitry. This could have been done by leveraging the mode 4 circuitry by using pairs of pixels in various ways. As there are 4 bits per pixel in mode 8, 3 are used directly as the RGB output, which gives us 8 combinations and the 8 basic color combinations including black and white. When the 4th bit is set, the pixel can be split to two (as in mode 4) and different combinations of RGB could be displayed in the 'half pixels' to get an illusion of more colors. One could argue the logic for that is simpler than the one used for flashing. Another possibility would have been modifying the flash bit not to flash! That would make it possible to generate filled areas dynamically by setting the bit to start and end a filled part of a horizontal line. That being said the sub-pixel idea might have been abandoned due to TV mode. Sub pixels in mode 8 are equivalent to mode 4 stipples, and these cause interference problems through the TV or composite output.
What needs to be decided if an advanced QL is to be re-created with the old chipset, if the 8301 is to be used at all, and I would argue that the answer, if possible, is no. The reason has been stated before - it is becoming near impossible to get a compatible display, and this includes TVs, due to the over-scan timing the 8301 generates. While one could implement hardware around the original 8301, when the cost and space calculation is done, it can't be justified compared to re-implementing the video circuits from scratch, even using some fairly expensive parts.
But,
let's assume for the moment that the 8301 is used, at which point we have to decide how to circumvent some of it's quirks as well as problematic aspects of the actual implementation on the motherboard.
The first and biggest design choice is if the 8301 is actually used to control the motherboard RAM at all, and also what kind of RAM is it going to be. The next decision that follows automatically, is how much RAM should be on board. And, if this is more than the original 128k, should (some of) the extra be used as a ROM shadow, so that the OS code can be alterable?
Old style 8301 implementation, but better
Let's first start from an improved QL implementation, like previously with the IPC and 8302. So, the 8301 is used and controls the 128k on-board RAM, but, as we established before, let's assume a 'superHAL' that does all the necessary decoding, so this is out of the hands of the 8301. This means the /PCEN and ROMOE pins are not used, and the /DSMCL pin on the 8301 is not directly connected to the CPU but rather generated by the 'superHAL' decoder. There are a number of advantages doing it this way as changing the superHAL makes it easy to implement more advanced ideas. So, here are a few suggestions for other changes:
6a)The original RAM chips are not recommended. These are now almost impossible to get, and they also use quite a lot of power. There are several other solutions to consider, here are some highlights:
- 61464 64k x 4 chips: these are 4 bit wide rather than single bit like the original so only 4 are needed to implement the entire 128k RAM. They are also not that easy to get, but need significantly less power - the power per chip remains near constant so these need only 1/4 of the power the originals did
- 64k x 16 or 256x16 dual CAS chip. The smaller 64k x 16 variants were often used on old IDE hard disk and CD ROM drives, and the larger 256k x 16 variants were common on old PC ISA, VLB and PCI VGA graphics cards. These can still be found at low prices. The 256k x 16 one is actually 512k bytes total RAM but for that in has one more multiplexed address line. One might be tempted to get the 8301 to address the full size and implement 512k bytes of total RAM, but since the 8301 does not have an extra address line, nor is there a reliable way to provide proper refresh, it's better not to use the extra line. This reduces the capacity back to 64k x 16, the original 128k. The opportunity here is to drastically reduce the RAM chip count - from 16 to just 1. These are 16 bit wide devices but have separate CAS lines for the upper and lower byte in a 16-bit word, which makes them behave like two 8-bit wide banks with all the address, write and RAS lines in parallel - just like the original RAM. What remains is to connect the high and low byte together into a single 8-bit bus and use the separate /CAS pins just as the 8301 ULA does, to implement two banks of 64k bytes.
- 128k x 8 static RAM chip. The static RAM has a non-multiplexed address bus and slightly different control signals so driving it from a multiplexed address bus requires demultiplexing. Now one could say, hold on, there are two TTL multiplexer chips (74LS257) on the motherboard that are used to multiplex the non-multiplexed address bus of the CPU, why use them only to demultiplex the addresses back to the SRAM format? Well, the 8301 expects a multiplexed bus, not only for the RAM but also to get to the address lines A5, A14 and A6 (and possibly A5) to decode it's internal screen mode control register. Granted, there are several ways the logic can be done, but when calculating the number of chips, PCB real-estate and extra logic, the simplest way to do it is to de-multiplex the row address back into a latch chip. A single gate is needed to generate the chip select for the SRAM. The advantage of this approach is a completely standard and easy to find SRAM chip and the lowest power consumption, with about the same PCB footprint and 4 64k x 4 DRAM chips.
6b) Unbuffered RGB and synch outputs MUST be buffered. Since 8301s are getting short in supply, and one getting zapped by static through the monitor lines is quite common, having those lines properly buffered is an absolute must, also a set of a few strategically positioned passive components will help to protect the buffered outputs as well. It is quite shocking to me that Sinclair did not even include low resistance series resistors as the most basic protection method. The protection diodes that came later are really almost useless without the resistors. Besides, providing a cheap buffer chip as a 'sacrificial component' is a much better deal than sacrificing an ULA. The buffer itself could just as well be an 8-wide one, which leaves us 3 unused buffer channels to buffer, say, the speaker output. With a DC blocking capacitor in series
6c) The 8301 uses the /DS signal on the CPU to detect when the CPU wants to access the RAM. Although this is actually correct in principle, because /DS becomes active ne full CPU clock later in an access cycle, writing to RAM is always penalized by at least one wait cycle. It is possible to use the /AS signal instead, which is not used on the motherboard at all. That being said, using /AS on a write cycle does require some care as the data from the CPU does indeed appear after the address appears, which is why /DS, which signals that stable data is present on the data bus, gets activated later in the cycle. When DRAM chips are used to implement RAM, care has to be taken as there are interactions between the address strobes /RAS and /CAS and data to be written. In the particular case of the 8301, the timing of the signals implies that the data to be written is expected to be stable when the /RAS signal goes low, because it will not only latch the row address into the DRAM chip but also the data to be written. In DRAM terms this is called 'early write'. Since faster or different chips could be used in a motherboard 'reboot' as mentioned in 6a), a small amount of signal delay added in the right place may make it possible to cut the 1 wait state penalty. However, I mention this only for completeness, this is not the best strategy to get the QL a bit faster without actually breaking anything critical, because on a long term basis, the ratio of data writes to data reads is quite a lot less than 1, remember, for any data the CPU needs to write, it had to read instructions to do so and also previously read data and process it to get some sort of result to be written. Given that the 1 cycle penalty occurs only during the 46.666% of the time when the CPU is given full access to the RAM, and write cycles are comparatively rare, the difference in speed is not going to be that big.