The Sinclair QL Forum

Posted: **Tue Aug 17, 2021 8:51 pm**

Here is a little topic which may prove useful. The idea is to explain how the QL's addressing capability has been expanded from the original 1Mbytes addressable, to more, as done by several expansion cards or alternative hardware and indeed emulators.

To start off, however, first a small re-cap on how the memory map was originally structured:
Traditionally, several well known address ranges were defined in the original literature:
$00000..$0BFFF 48k ROM
$0C000..$0FFFF 16k ROM slot
$10000..$1FFFF 64k on-board IO
$20000..$3FFFF 128k on-board RAM
$40000..$BFFFF 512k expansion RAM
$C0000..$FFFFF 256k expansion cards (ROM plus on-card IO)

Having a look at the hardware and OS code however gives us more details on how the usage of the areas was intended, given how it uses them:

$00000..$0FFFF is actually treated as a single 64k ROM area, which is decoded by a single active high signal from the 8301 ULA, ROMCS. This signal goes high whenever data is read from these addresses, but not when written. Writes here are ignored by the motherboard hardware. External hardware is expected to decode the appropriate internal ROMs or the ROM slot, which is why the original ROM chip set uses multiple chip selects to do this without requiring extra logic chips. The ROM slot hardware however must include some logic to decode address lines A14=1 and A15=1 and ROMCS=1, which completes the decode for the top 16k of the ROM space. This is also the reason why the entire ROM can be located in a ROM cartridge if the internal ROM chips are removed.
There is a special difference in how the OS treats the ROM slot area ($0C000..$0FFFF) and that is by looking for an add-on ROM flag ($4AFB0001) at the beginning of that area. This is required for any expansion ROM to be detected.

$10000..$1FFFF seem like a waste of space to address only the small amount of on-board IO addresses. There are really only 17 total locations used in there. While the hardware decodes this differently depending on the version of the 8301 ULA, the software only uses a very small portion of the total address space. All IO locations for all versions are enclosed within the 128 bytes at $18000..$18007F. Even then only the 17 addresses are used.
This resulted in a slightly different usage of the 64k area, like this:

$10000..$17FFF is normally not used in an unexpanded QL but the Minerva OS will also look for up to two 16k expansion ROMs, at address $10000 and $14000.

$18000..$1BFFF is designated as the on-board IO area. Still, only the aforementioned 17 addresses are used except if QIMI is fitted, in which the last 128 bytes are also used, as in old PTR_GEN will look for QIMI hardware there.

$1C000..$1FFFF is not specifically defined for any use.

The next area is RAM, which is, as far as the OS goes, treated as a single contiguous section of the address map. So, $20000 to $BFFFF, 640k in total, is designated as RAM, of which the initial 128k is expected to be populated on the motherboard.

Finally there is the expansion area from $C0000 to the end of address space, $FFFFF, which rounds out the total 1M of addressable by the 68008. This is expected to be structured as up to 16 slots of 16k each, containing ROMs which supply the necessary OS extensions to use the hardware on the expansion card. If any addresses are used to control this hardware, it is expected that they reside within the same 16k, which requires the software to always reference them relatively from the start address of the ROM. It should be noted that this is not an absolute requirement - addresses that control expansion hardware could in principle be located anywhere that is not already populated, say in the unused addresses of the on-board IO area or the undefined area at $1C000..$1FFFF.

One more thing that has to be mentioned is that the QL bus provides a signal that can be used to defeat any oy-motherboard decoding and replace motherboard hardware by other hardware on the bus. This can also be used to do other tricks, and in fact has been in the past.

Initial state, the bare QL

The bare QL motherboard completely disregards the top two address lines coming out of the CPU and the above mentioned mechanism is expected to take over in order to implement decoding of extra stuff like expansion RAM.
So, initially, the motherboard implements only the first 256k of the address map, which then repeats an additional 3 times to fill up the entire 1M address space, since address lines A18 and A19 are 'don't care', there are 4 states they can have (00, 01, 10, 11) and therefore 4 copies of the initial 256k.
Since the OS looks for RAM starting at $20000, once it hits $40000, it will actually address another copy of the ROM and not getting a proper readback of data it attempted to write, will stop looking for extra RAM, establishing the total RAM at the on-board 128k.
In a similar manner, the OS looks for expansion cards starting at $C0000, which is again at the start of a 256k boundary, so in a bare QL it will find the ROM there, and depending on ROM version may continue finding other parts of the ROM, IO and RAM. If I remember correctly, JM ROMs had a bug which only looked for the first expansion ROM at $C0000, which Toolkit 2 would fix. The proper sequence was ultimately built into Minerva. This was a happy circumstance because doe to the 4 copies of the forst 256k, looking for all possible expansion ROMs would end up looking at 16k boundaries in the forst 256k, eventually double linking the ROM slot (which would also appear at $CC000), and also having the obscure consequence that it could falsly detect a ROM if the clock seconds counter register happened to hold a value equal to the ROM flag, $4AFB0001, because looking at $D8000 for it would actually read the clock register.

The special case: TrumpCard

The first expansion to break the addressing convention was the TrumpCard, which added up to 768k of additional RAM instead of the previous maximum of 512k. It did this by using the 256k of expansion IO addresses for RAM. As it happens, the OS will normally continue to search for RAM until it finds an area that does not read back what is written, or the end of the CPU address range is encountered. If RAM is extended within the expansion IO area, the system will happily use it as such, though there are/were limits to this due to some interesting bugs. However without any addresses left for expansions, Miracle had to devise a way to implement what was standard at the time within the available pool of addresses and it cleverly used $10000..$17FFF for the ROM with extra software needed for Trumpcard features, and $1C000..$1FFFF for control locations for the extra hardware on it. To do it and get around OS limitations, it had to do a clever dance with relocating the ROM in order to get the system to see it in it's true address. As an aside, the original GoldCard would do a double initialization to achieve the same, but with different means. In any case, this provides us with the first clue on how the available addressing range was expanded using the original OS addressing framework, and more about that in the next post.

Posted: **Thu Aug 19, 2021 7:02 pm**

Getting things unified: Minerva

While older QDOS versions had provisions for extending the original QL hardware past the 1M addressable space, there were various bugs that prevented full functionality. For instance, some may remember the (somewhat ill fated) MegaRAM internal 2M expansion RAM - it used the PLCC version of t he 68008 which has a 4M address space, but could only use half.
Things changed a lot with Minerva, which also had extended capability in order to utilize previously unused areas of the address map.
Minerva has carefully re-hashed the way the OS initializes the system, making it far more flexible regarding what the address map can be.
The basic procedure goes like this:
1) Memory test, which starts at $20000 and proceeds in 64k blocks until it either finds an area which does not test like RAM or loops back to the beginning of the addressable space. It also takes care that it properly detects aliases, in case the RAM is not completely decoded and appears as several copies. In the end Minerva will properly detect at least up to the full 4M address space of a 68008 PLCC.
2) Minerva also makes it possible to (in principle) move the system variables to any address. In reality it only uses this feature in it's two-screen mode, because the second screen address starts where normally the system variables and other system memory structures would reside. Unfortunately this is not widely used as there seems to be a number of pieces of software that expects the system variables at $30000.
3) Since there is a two screen option, Minerva console drivers also make it possible to move the base of the screen area and indeed the size and organization of the screen area to any address in RAM. I used this extensively while developing Aurora.
4) Minerva looks for extension ROMs at the original $C000 as well as $10000 and $14000, which was previously unused. After that it starts looking for more ROMs starting at $C0000 unless there is RAM there, in which case it looks for ROMs after the end of detected RAM, again unless it has discovered that the entire address map to the end has been used by RAM.
It seems that in principle there is no limit to how much RAM or how many extension ROMs can be discovered except the maximum size of addressable space, which is in theory 4Gbytes.

Practical limits: slave blocks and Qliberator

Software imposes some limitations to the above rules.
First, the OS uses free RAM to implement a buffer system called slave blocks. The problem is, that the slave block table, which is proportional to initial free RAM (once the system starts) has a linear search performed on it in order to find a free block, which gets progressively longer as the table grows and RAM is used up. At a certain point the overhead of searching blocks becomes higher than the sped-up gained by buffering files in slave blocks. In other words, the slave block table needs to be limited in size even if there is extra free RAM. Curiously, an early version of Minerva had an interesting bug - initialization of the slave block table was done using a word offset to the table base, so when there was over 2M of RAM, invrementing the offset would cause an overflow into negative numbers at which point the slave block table would continue from the top of the screen

The Qliberator compiler uses the top 3 bits in 32-bit addresses to signal certain things about the address, relying on the fact that the top 12 bits of the addresses that are kept as 32 bit internally in the CPU, are not visible outside the CPU as address bits on the address bus and are thus 'don't care'. Because of this, QL systems as a rule do not use A31, 30 and 29 for address decoding so the actual maximum size of the address map is 512Mbytes. Without this provision, Qliberated programs will not work right.

So, in conslusion, the situation is this:
$00000..$0BFFF 48k OS ROM
$0C000..$0FFFF 16k Extension ROM
------------------------------------------------------- Both of the above are available throug ha single ROM select
$10000..$13FFF 16k Optional Extension ROM
$14000..$17FFF 16k Optional Extension ROM
------------------------------------------------------- Both of the above are initialized by Minerva if present
$18000..$1BFFF 16k Motherboard IO
------------------------------------------------------- Only a few addresses are used, more can be used assuming no address conflicts
$1C000..$1FFFF 16k Undefined
-------------------------------------------------------
$20000.. up to 512Mbyte RAM (test every64k)
-------------------------------------------------------
$C0000..up to theoretically $1FFFFFFF, extension ROMS (test every 16k)

Per above rules, RAM will be tested for until no more is found or the address has aliased back to $00000.
Extension ROMs will then be detected from $C0000 or efter the last RAM address (whichever is greater) until the address has aliased back to $00000.
The originaal screen area starts at the start of RAM at $20000 and is normally 32k, but in principle it can reside anywhere in the address mas as long as there is no conflict with other areas, or it can be a part of system RAM but then has to be reserved at OS start to prevent other stuff overwriting it.
System varialbels start at $28000 although they could be anywhere, but for compatibility they are normally at that address.

Or, simplified, when the address map is extended beyond the original 1M addressable space, the original map is 'split' somewhere in the RAM area between $40000 and $BFFFF and 'more addresses' are inserted as needed or implemented by a given CPU, keeping in mind the limitations imposed by slave blocks (which can be catered for by the OS) and Qliberator.

Posted: **Fri Aug 20, 2021 11:00 pm**

Very clear and detailed post, but one correction: the standard location for the system variables is $28000, although Minerva allows for $30000 when the second screen is enabled.

The slave block system is indeed a limiting factor in the traditional QDOS memory model. This limits the maximum amount of RAM to 16 MB, since that causes just under 32768 slave blocks to be kept track of ((16*1024*1024 - $28000)/512), and the code which handles this uses 16-bit signed integers. The code (both in Sinclair and early Minerva ROMs) used to set up the slave block table had another bug causing it to fail with over 2MB of RAM, but this is patched by the (S)GC ROM.

SMSQ/E fixes the slave block problem by not allowing them to extend beyond 1MB and allocating common heap space from the Transient Program area first. But maybe that's something for Part 3?

Posted: **Sat Aug 21, 2021 12:16 am**

Thanks Jan, correction re system variables was a typo, duly noted and corrected. Also, great info on limiting slave blocks.
In fact, I would strongly urge anyone with any pertinent data to feel free and add it into the thread.
There will be more parts, with a sort of small primer on how to build a 'modern QL compatible computer'. To this ned any data on how caches are handled with 020+ CPUs would be highly appreciated.

Posted: **Sat Aug 21, 2021 10:10 pm**

OS changes: TK2, Minerva, SMSQ/E

Since the QL came to the market quite unfinished, lots of unfinished stuff and features that had not been implemented. Toolkit 2 was and is considered an absolutely essential extension for the QL. It was a big and valiant effort to correct bugs and add features, however, there are limitations to what can be done with extension ROMs - while some things can be replaced given the way the OS is structured, others cannot.
The first limitation are the CPU exception vectors, which are at the start of the address space. The hardware is configured such that interrupts are auto-vectored, which significantly reduces the space needed by the various vectors.
The second limitation are OS vectored routines, which are accessed through a jump table in ROM - so that's an obvious thing that can not be changed.
None of these routines could be changed from the ground up so in order to do some corrections to the OS, radical steps had to be taken - and there are two fundamental possibilities:

Change the entire ROM - this is the approach Minerva has taken. In essence, the hardware stipulations stay entirely the same, and the way to change any of the vectored routines, and of course for that matter, anything else, is to change the OS ROM containing the OC code. The main limitation, assuming no extra hardware is added in the process, is that there is a limit to the maximum code size, that being the actual size of the ROM space - any changes one does, have to fit into the original 48k. That being said, the code could establish redirection 'hooks' in RAM, should more changes or additions be needed. With Minerva, this had to be done extremely judiciously, because the point was to keep the same (or perhaps more) free RAM once the OS is initialized. This was a design decision, very much in line with the state of QLs at the time. Of course, keep in mind that Minerva has had some changes done during the years, and is perhaps the most flexible OS to use to make new QL compatibles, at least as a 'first boot' OS.

Emulate ROM with RAM - this approach was taken by several hardware extensions that also implement a way to extend the available address space. Most notably, GoldCard, SuperGoldcard, Q40/60 and of course Q68 all have variants of this mechanism. An important thing of note is that these systems also have extra RAM available compared to the maximum possible with the original QL's 1M address space, so unlike the Minerva case, there is extra RAM.
The original reasons why ROM might be replaced with RAM (in a very specific order, as will be explained) was to leverage a wider and faster data bus used by the more powerful CPUs used on the aforementioned cards. The magic is done along the following lines:

1) At startup, the mechanism used to disable parts of the original address map is used to replace the original ROM with one supplied on the xtension board. This ROM is usually initially mapped 'everywhere', i.e. aliases all over the address map. When the CPU executes the reset exception, the reset vector normally points to the alias of the boot ROM which is going to stay mapped into the system either always or until boot is over. Keep in mind 'boot' now refers to something that executes before the actual OS startup. In this case it often replaces the initial alias at $00000 back to the actual OS ROM, as it would normally appear in an original QL. The point being, control is now within the boot ROM but the original OS code is available to be read.
2) The boot software reads the actual OS ROM and possibly other ROMs (like Toolkit 2), at either the original address or some other where it is available, then writes a copy of it to RAM which is mapped at the normal ROM address. If the ROM address is still at the original place at $00000, it will be read only, while RAM will be write only. For purposes of RAM data integrity, the better soluition is to have the area at $00000 be actual RAM, which is also tested before the OS ROM is copied into it.
3) The boot software patches the OS code as needed for extra features, like extra RAM or CPU features that were not implemented n the original 68008.
4) Optionally, the RAM where the patched OS copy now resides is write protected so it cannot be corrupted by a runaway or misbehaving program.
5) A reset is either invoked as an exception or simulated, to get the OS code to execute from the RAM copy, and the system initialization proceeds as normal. It is also customary for parts of the boot ROM to have been copied as extensions to the OS or emulations of extension ROMs, in order to extend the OS with any features and drivers supporting any extra hardware on the expansion board.

It should be noted that all of the above 'remapping' is done by dedicated hardware on the expansion board(s), and some of it can also be undone, like the write protection of RAM that now resides at addresses previously used for ROM. While this requires extra steps at boot time, it also provides huge flexibility not only how the hardware is constructed (i.e. an 8-bit wide boot ROM or indeed completely different media such as an SD card can be used to hold the boot and other code), and the OS code is not limited to the size of the ROM area, but can use other addresses that do not collide with previous usage cases to extend the OS. Indeed, since the OS is 'soft loaded', multiple versions of the OS can be provided on the same boot media or chip. The very epitome of this approach is SMSQ/E which integrates many previously separate OS extensions and a whole new improved OS, which is itself extensible using code modules. This approach also makes it possible to load a new OS on top of another already running one from any supported device on the running OS, which also makes it possible to load SMSQ/E from disc or flash using a superbasic command. Using the correct SMSQ/E version will start code that is aware of the particulars of the hardware it runs on and go through the proper procedure as described above.

If a new QL compatible is to be made (either hardware or emulated), it is well advisable to follow a soft-boot model, while providing a default OS, like Minerva.

Posted: **Wed Sep 01, 2021 5:52 pm**

So how does one go about building QL compatible hardware?
Well... the first consideration is should one use the original chipset or not, the original chipset being the two ULA chips and the IPC.

So, let's first explore the option of using the original chipset, either building hardware from scratch or expanding the original machine. In principle, as long as the address map rules laid out in the previous posts are adhered to, with some attention to certain details, you will have at least basic functionality of a QL. However, this approach also has some limitations, especially if a faster CPU was used, and/or one with a wider bus.
There is also some leeway, for instance this new hardware can implement completely new video hardware, as long as it at least initially is capable of working in MODE4 so that some sort of visible display will appear. In particular, Minerva will be well suited here as it's initial display actually does use MODE4. This would then replace the 8301 ULA. In some sense it is the easiest to replace as the system does not care much what is there as long as it uses the same video memory organisation as the 8301, and has the basic video control bits (to select mode, screen, blanking and possibly NTSC display) at the same address as the 8301. The part of the 8301 that handles the decoding of various things like ROM, RAM, 8302 would probably be handled by different hardware anyway if this hypothetical machine is designed to significantly extend RAM or use a faster CPU.

Once the 8301 is replaced, the 8302 is fairly easy to interface just like it is on the regular QL, like a simple 8-bit peripheral. The IPC then hangs off of the 8302 and is not visible in the complete address map.
This is where one might run into the problems mentioned above. First, the original ULA chips are intended to work off an 8-bit bus, and particularly in the case of the 8301, it has to be a fairly good imitation of the original 68008. This is much simpler to do if you use a CPU which is capable of directly driving an 8-bit bus (such as the 68SEC000 in 8-bit bus mode) or a wider bus CPU that implements automatic bus sizing, like the 68020, 030 and certain variants of the 683xx MCU family. CPUs like the 68040, 060 of the Coldfire MCF5102 do not support this function and it has to be implemented with external hardware, in the former 3 cases, not completely trivial. And then there is the original 68000 and all variants that have a 16-bit bus, which also does not implement bus sizing, so again, extra hardware is needed. This is simpler than on the 32-bit bus CPUs.

If the 8301 is replaced by something else, it is likely to address the same width memory as the actual CPU bus width, as the whole point would be to speed up screen RAM access, especially if the hardware also implements higher screen resolutions. The latter automatically requires more data movement to draw and move around more pixels, so faster access is a logical requirement. Replacing the 8301 gets rid of the requirement to rather precisely follow 68008 bus behaviour. The remaining 8302 is far less critical (and also much faster to access) regarding bus timing, which might simplify the design of bus sizing hardware, which is still required since the 8302 is an 8-bit peripheral, that is unless (substantial) changes to the code that accesses it are implemented, such as OS interrupt handlers, drivers for serial ports, microdrives, net and IPC comms. Another aspect that may require changes to 8302 related OS code appears when a different speed CPU is used. This is not only a question of the CPU clock speed, but of the actual CPU type - more modern versions of the 68k architecture are quicker ona a clock by clock basis. For instance, a CPU32 based chp, like a 68340 will be on average around 1.6x faster at the same clock and bus speed than the original 68k or 68008 if 8-bit wide bus is used. This is important as some timing intervals for microdrives and especially NET access are generated in software, so a faster CPU will result in shorter times, almost certainly messing things up. This is why the Goldcard and Supergoldcard routinely patch these routines once they are copied from ROM into a RAM shadow copy (which makes it possible to change the contents).

So, bottom line, unless you are using ether the 68008FN with 4M of address space or a 68SEC000 in 8-bit mode with 16M of address space at more or less exactly the QL's 7.5MHz clock, plan on doing some software patches. It should be noted that things like speeding up the screen access or optimizing the system so that bus accesses always run at full speed do not require any modifications as long as the above mentioned CPUs run at the same clock speed, because - remember - the affected routines normally run from ROM code, which reads at full bus speed. The relatively tiny portion of the time this code copies a byte from or to RAM to do something microdrive or NET related makes a negligible difference.

Posted: **Sun Sep 05, 2021 12:41 am**

68008FN (if only it was available sooner...)

Let's shortly review a version of the 68008 that came out about a year after the original 68008, in a smaller 52 pin PLCC package, which is also cheaper to make. The small down side is relatively rarely used 52-pin (rather than 44 or 68 pin) PLCC socket, but then, the original 68008 came in a 48-pin case which was comparatively rare back then (the first 68008s came in a ceramic package) and so were the sockets.

An aside to the 48 pin socket problem, since there are not that many 48 pin DIl chips around, it was quite probably difficult to source a socket since the selection of manufacturers making it was certainly not large, and I would bet, the actual sockets much more expensive per pin, given a smaller demand. So, we got a crappy 48 pin socket in our QLs. I will never understand why they did not use two 24 pin sockets stacked in a row, common and cheap as dirt with at least the same quality as the ROM and ULA/IPC sockets?!?! Granted, they are a bit more difficult to fit in an automated environment but I can't see why the lower price and increased reliability would not have covered this, possibly still with a saving.

The 68008FN has the merit of having 4 more pins, which have been used for some quite useful signals:

1) Two more address lines (increasing the addressable range from 1M to 4M bytes. This is a VERY nice increase, and if this were available at the start of QL commercialization, even with all the problems it had, this would have been a BIG advantage over competitors, since no-one offered that kind of expandability and in a nice linear manner, without any need of changing the software. This is something which I will concentrate on later in the post.

2) All 3 interrupt request lines, instead of 2 on the original 68008. While this would be nice for a general 68k system, this is not too relevant to the QL. There is an, IMHO, essential upgrade to the way the QL treats the interrupt lines, but the total number of interrupt levels is sufficient as it is.

3) The 2-wire bus arbitration mechanism is extended to 3-wire as on the original 68000. In general 68k terms, this might be useful, although I doubt it, the mechanism is actually needlessly complicated and was later dropped on more advanced 68k CPUs. In QL terms, this is very largely useless and has to be reduced back to the original 2-wire scheme. Even then, it is only used in one way - to electrically remove the 68008 CPU from the busses in order to replace it by something else on the expansion bus, by the Miracle GoldCard and SuperGoldCard. It could be used more creatively in a different basic design, though.

To use the 68008FN in a QL context, it needs to be set up pretty much as the standard 48-pin part except for the two extra address lines. So, point 1) above is used to our advantage, while points 2) and 3) are actually defeated, or rather downgraded to the original version, by tying /BGACK to /BG (also /BGACK may be strapped to ground) and /IPL0 is connected directly to /IPL2

A look at some QL architecture blunders

The QL, as we know, has gad a turbulent introduction to the market, burdened with many broken deadlines and attempts to fix the bugs and make working hardware. Unfortunately, once a 'stable' design was introduced (somewhere at the time of the JM ROMs) and issue 5 motherboards, some 'details' were left ...shall we say, unfinished, or uncorrected. Let's look at some of those:

1) /RESET and /HALT pins connected together on the motherboard. Power on reset of a 68008 CPU is accomplished by pulling both the /RESET and /HALT pin on the CPU low. However, the CPU also has a RESET command that results in itself pulling the /RESET pin low, so /RESET is actually a bidirectional pin, effectively wired OR (remember, we are using active low logic here). This was intended for the CPU to be able to reset all it's external hardware without having to reset the actual CPU. However (and this is going to be a recurring theme) in order to avoid using external hardware (a fancy name for one diode and one resistor), Sinclair simply connected these two lines together. As a result, if a RESET command is used, it will reset the whole machine as if from power up. Possibly the biggest problem is that some interesting options like single stepping through accesses or repeating an access cannot be used, because these require the use of the /HALT pin, which does not appear on the expansion connector, and as mentioned, is connected to the /RESET pin directly anyway. So, it would really be prudent to do this properly if new hardware is made.

2) Potential signal conflict during interrupt acknowledge. The 68k family of CPUs uses a mechanism where any device capable of interrupting the normal program flow via the interrupt lines, can provide a 'vector' to the CPU as a part of the interrupt acknowledge process. The 'vector' is an 8-bit index into a table of addresses of code used to service the interrupt caused by the interrupting device. This requires some extra hardware to decode an interrupt acknowledge and vector fetch from the CPU and some hardware in the interrupting device that stores the vector. The main idea behind the mechanism is to avoid polling all the devices to find out which has caused an interrupt, but rather have a (possibly hardware enhanced including a more complex interrupt priority scheme) mechanism for interrupting devices to 'get in touch directly' and tell the CPU it was them, making it possible for the CPU to almost instantly jump to the right piece of code. Again, extra hardware is not what Sinclair wanted, so an alternative way of interrupt operation was used, that makes it possible for the CPU to generate a vector internally based on the interrupt priority level. This mechanism is called autovectoring, and it is enabled by pulling the /VPA pin on the CPU to zero when the interrupt acknowledge cycle is performed by the CPU, rather than providing a vector and ending the access cycle by pulling /DTACK low as usual. This also requires some extra logic to detect and generate the /VPA signal, but far less than the full vectored version. In fact the QL used the simplest way to do it, which is actually too simple and by some mad luck does not cause unpredictable behavior by the CPU.
An interrupt acknowledge cycle is actually a read from a special address with a special encoding on the function code (FC0..2) lines. These lines describe what kind of access the CPU is performing, and the encoding of all 1s (FC0..2=111) indicates 'CPU address space', i.e. that the CPU is generating a special access and the address lines should be interpreted differently than a regular address. In the particular case of interrupt acknowledge, the address lines hold an address $FFFFx where the bottom 4 bits (x) hold 1 on A0 and A1..A3 encode the interrupt level that is being serviced. In the context of the QL, since auto-vectoring is used and the address line state is not needed for that, the hardware is s simple NAND gate that looks for FC0 and FC1 both being 1, since this combination can only happen for interrupt acknowledge on the 68008 (others are possible on more advanced CPUs in some cases). However, they forgot to prevent the rest of the system from seeing this as a normal read of address $FFFF5 and responding with pulling low of /DTACK. The 68008 manual specifically says /VPA and /DTACK should never appear simultaneously as this may result in erratic behavior by the CPU. This bug is present on issue 5 motherboards, without the HAL chip.
On a bare QL, due to 4-fold aliasing, address $FFFF5 will effectively read from $3FFF5 and will cause a read of one of the last bytes of on-board RAM, which in itself is benign, but it will also make the 8301 ULA pull /DTACK low. By some incredible fortune, the simple logic to generate VPA is so simple that it pulls down /VPA before /DTACK and the 68008 detects this properly as an autovectored interrupt acknowledge and all is well... until some unlucky person designs a peripheral that sits up at the address $FFFF5, which may generate /DTACK faster (so problems might occur). What is more insidious is that a typical peripheral will have ROM at the start of a 16k boundary within the expansion area at $C0000 and above, in this case it would be $F8000, and quite probably some bits that control some hardware that is not memory, but say, a floppy controller, in the highest address of it's 16k slot. In this case this may end up being $FFFxx, and xx could well be $F5. In other words, the highest addresses are likely to be used for some type of hardware control, and in this case it is not uncommon that even the actual reading of data from such addresses (regardless of the contents) may have effects in the real world, i.e. behavior of the said hardware, which will then possibly malfunction as soon as the very first interrupt is acknowledged by the CPU, while it will function just fine if located in any of the other 16k expansion 'slots' but the last one.
This problem was only addressed with issue 6 and later boards by the HAL chip, which inhibits all other parts of the system responding to the interrupt acknowledge. This obviously points to the need to include the same logic into the address decoder of our prospective new hardware, most importantly, to prevent the decoder decoding anything else but an interrupt acknowledge if FC0 and FC1 are both 1, in other words, FC0 and 1 have to be taken into account when decoding the address for any cycle lest it be an interrupt acknowledge.

3) Although 3 interrupt levels are possible, in practice this is reduced to one. The 68008 used in the QL has only two interrupt lines which makes it possible to implement 3 interrupt request levels, and those would be level 2, 5 and 7, which also determines the interrupt priority level. When multiple level interrupts requests are present, servicing of higher priority interrupts is performed first. However, priority encoding into the said 2 interrupt request lines is implemented with hardware external to the CPU, and we all know how popular this would be to Sinclair, so this hardware is missing.
The QL only uses level 2 interrupts, and the multiple sources of those are handled by simple hardware in the 8302 and some software that determines which interrupt is serviced first, if multiple interrupt requests are present. There is only a single interrupt line coming out of the 8302 ULA and it connects to /IPL1 on the CPU, when the 8302 pulls it low, a level 2 interrupt request is generated in the CPU.
While both interrupt request lines are present on the expansion bus, and are also each connected to a pin on the IPC, there is no realistic way to use any of the other interrupt levels properly. This is because there is no way to insert an interrupt level encoder externally or prevent 8302 from pulling IPL1 low. If anything else attempts to pull IPL02 low in order to generate a level 5 interrupt, it might well happen that something else like the 8302 is pulling low /IPL1 resulting in a level 7, nonmaskable interrupt. This can be seen if the user presses CTRL-ALT-2, 5 or 7, which makes the IPC pull down IPL1 and/or IPL02 through it's port pins, which will almost as a rule crash the machine, since there are no interrupt handling routines present, and level 7 is by nature something that needs a lot of special consideration to implement. So, this has to be sorted out. While it is a relatively simple thing to patch the ROM with default 'do nothing' code for interrupt levels, it would be highly desirable to implement a simple bit of logic to encode the interrupt level properly.

4) Incomplete decoding fixes and workarounds on motherboards using the HAL chip. The HAL chip got rid of several TTL logic chips and implemented better decoding of the 8302, and fixed the /VPA+/DTACK problem mentioned above. That being said, the HAL chip also implements some unused logic that points to other things that may have been tried but ended up ultimately not working. Since a machine that expands the addressing capability of the original will use different decoding, the internal decoding of the 8301 and HAL should be replaced by a completely new decoder. One could argue that a LOT would have been fixed if the QL entered the market with a HAL fix to the problems with issue 5 boarda and probably earlier ones that were there in the very beginning and were eventually recalled. A HAL chip from the start would also have relaxed the requirements of use of the /DSMCL line which is essential on anything that plugs into the expansion port except perhaps a backplane. Again, in the interest of 'simplifying' actual logic downo to a single resistor, extra burden was imposed on the hardware of any expansion. The actual HAL chip contains an attempt to solve it but was never connected in that manner. So, if we are doing it anew, let's do it right. Along with FC0 and 1 being inputs to the master decoder logic, it also needs /DSMCL as an input and decode everything, not letting the 8301 do it. Also, just like on any board that has the HAL chip, the 8302 ULA is connected to the data bus of the CPU, and not the data bus of the 8301 ULA.

5) Serial ports and IPC peculiarities. This is a topic in itself, and is going to be discussed in the next post, along with the first instance of some design decisions that may limit original functionality, but enhance what is left, particularly with respect to serial port operation.

Posted: **Tue Sep 07, 2021 5:20 pm**

The I/O subsystem: 8302 ULA and IPC

These two actually work together from the standpoint of the user to provide all the various types of I/O on a bare QL. The interplay is especially obvious in the case of serial ports, where the 8302 ULA handles serial port transmit, and the IPC handles serial port receive, and then actually transmits the received data to the 8302 ULA where it is serially received and made available to the CPU, using a bit-bang serial protocol. So, as can be seen, there's a whole can of worms just there.

Lets start with the peculiarities of the IPC - but first let me say that I honestly would not bother with the original IPC and strongly suggest using a Hermes chip instead, especially now that it is public domain. There are a bunch of problems with the firmware in the original one which are enumerated and explained in the Hermes manual (I believe available on Dilwyn's page). That being said, there are some problems and oddities to the hardware connection as well:

5a) Serial port input cannot handle chattering ports. A few words here need definition in the context of the QL. First, a chattering port is one which sends data even when instructed not to by the relevant handshake signal(s). Usually this happens because the sending port is incorrectly configured or connected. Now, in the context of the QL: the QL has two serial ports, so, one would expect two serial port inputs - but in reality, there is only one serial input on the IPC and both serial input lines are simply AND-ed together and connected to it. The IPC relies on sending ports to correctly interpret handshake signals, so it can let them transmit in turn, to prevent them from both transmitting at the same time. The problem here is obvious: if we want to use both serial port inputs on the QL, they must have handshake enabled and be correctly connected so that it actually works. If not, sending data to a serial port input while another port is expected to send data will result in corrupted data. A particularly odd consequence of this hardware quirk is that if you set up a serial link say to ser1, you can connect the receive input pin on either port and this will still work. The reason is that there is no way for the IPC to 'mute' the input from the port that is not supposed to send, it only relies on the sending device to react to the handshake signal that tells it not to send.
To make matters worse, there is no way to really solve this problem with external hardware. At first glance, one could think, OK, when the handshake signal is in a state that means 'do not send', simply mute the corresponding serial input signal so it does not interfere. Unfortunately, the handshake signals are handled on a byte basis while the data is bit serial, so it is possible that the handshake goes to the 'do not send' condition while the current byte is still being sent (not all bits have been transmitted by the sender). The idea of the signal is 'do not send THE NEXT byte' so it is possible to say this (and in fact happens almost as a rule) while the current byte in a stream of bytes is still being transmitted. This is because the receiver knows when there is 'space' to receive one more byte, and when the start bit for that byte has been received, the receiver already know there is no space for the next one. This also gives the sender an early warning.
The only proper solution would be to have actual two inputs for the ports on the IPC and have it internally select the input pin having the knowledge when it wants to receive from which port, OR, the handshake has to be set to 'do not send' just as the last bit of data (this includes parity) is received and before the stop bit(s) from the transmitter have 'cleared the line', or, in other words, as soon as possible during the stop bit, to let t he transmitter detect the handshake signal in time, and does not transmit the next byte's start bit. In the latter case, an external multiplexer would cut out the data from the port we do not want to receive from any more. Unfortunately, I have not checked how Hermes behaves in this case, and I would not be surprised if it actually does this, or, possibly, the code could easily be changed so that it does.

5b) Effective receive speed with two ports open can at most be half of that on a single port. This may not be obvious but directly follows from the way the IPC input is hooked up. Assuming Hermes, reception is possible at 19200 baud, but only from one port at a time, so in essence, the TOTAL bit rate for both ports is 19200, so when both are receiving, 9600 would be the absolute maximum input data rate. Unfortunately, it is even less than that because of the handshake protocol. When both ports are open, the speed will drop to less than half even if one port never sends anything. This is because the IPC has to open up handshake for a transmitter to POSSIBLY send something and wait for some time to see if the transmitter is going to send something, i.e. there has to be some time given for the transmitter to react to the handshake and start transmitting. So, the other port that has data to transmit has to be stopped while this is going on.
This is also one of the reasons why split baud rate on the two ports would be a problem. Even though in theory, at least for receive, it is possible, suppose you set one port to 19200 and the other to 1200 baud. Every time the IPC would have to look at the 1200 baud port it would have t give it enough time to send something, scaled to it's slower speed - therefore a longer time. While the scaling is not completely linear, it is likely the wait would take so long that the effective receive rate from the 19200 baud port would drop down to speeds on the order of the 1200 baud port...

5c) Effective total transmit speed of all open ports can at most be 19200 baud Here I will jump a bit to the 8302 ULA, because it handles serial transmit. Lo and behold, it actually uses the very same principle of multiplexing, but in reverse - it has only one instance of transmitting hardware, and selects one of two ports to send data to. Again, this is further restricted by handshake inputs, and again, when both ports are open, assuming the receiving hardware is capable of receiving full speed, the total transmitting speed when both ports are transmitting is that of one port at maximum speed. In other words, while data is coming out of the QL at 19200 baud, it is split between the number of open ports. Also, transmitting at different baud rates is possible IN THEORY (by switching the baud rate as the output is switched from port 1 to 2 and back) but the complication is not worth it, especially as the same slowdown effect is present - the slower speed port would HEAVILY hamper the effective speed of the faster port, as all the data bits on the slower port except the stop bit have to be transmitted before the fast port can start transmitting.

So here comes the first hardware 'cut' - if it is not obvious by now, the only way to get a decently functioning serial port is to implement only one and use only one. Then it can behave like a 'normal' port up to 19200 baud and even support split baud rates - if Hermes is used. This is because Hermes generates it's own baud rate, while the original IPC uses an output signal from the 8302 called BAUDx4 as an input to one of it's timers to generate the baud rate. This also means that the corresponding pin on the IPC (T1) and on the 8302 (BAUDx4) is not used any more.
There could be an exception to this but it would require a change to the IPC code, and that would be support for a serial mouse. Given how rare they are these days, it might be a moot point, but serial mice were often used on the QL. The problem is, they do not use handshake signals for their intended use, but to provide power to the mouse internal circuits. Once the mouse is initialized (by setting up the handshake signal to enable transmitting) it only transmits data and of course, it will corrupt the input data from the other serial port as per above. The mouse is also transmit only, and will use the constant high level provided by the transmitter on the other end as a power source (remember that serial port signals are inverse logic, active low = logic 1). It is also very slow, 1200 or 2400 baud. What this amounts to, could be an input only serial port on one of the IPC port pins that are left unused if only one 'full' serial port is implemented, since only the receive input is needed. That, however, requires changes to the IPC firmware. In fact, superHermes implemented one such dedicated mouse serial input port.

5d) One pin on the 8302 could have been saved (if Lau Reeves was doing the IPC programming...). As was mentioned above, the original IPC required a baud rate signal input, running at 4x the selected baud rate, and generated by the 8302. There are a number of questions one may have about this as the signal is fed into the input pin of one of the IPCs internal timers. There are two internal timers and one does not use it's input. Then, there is the frequency of the IPCs clock, derived from an 11MHz crystal. This is actually a non-standard version, while the actual crystal frequency most often used on these chips was either 12 or 11.0592MHz. The first one is the maximum specified, and the second one, as strangely as it seems, is used because it is a standard value used for baud rate generation, is available everywhere and cheap. I suppose Sinclair must have found a huge lot of 11MHz crystals for real cheap because getting one through normal channels requires custom orders and of course means the price will be higher. Fortunately 11MHz (exactly) is only 0.54% off and likely not even detected by any serial port hardware. So, it is a double question considering how Sinclair was always about leveraging software to save on hardware and than minimizing parts costs further. Hermes does not use the BAUDx4 pin at all so it could have been used for something else on the 8302 ULA!
A (lengthy) aside: How does one get to 11.0592 being 'a baud rate generator frequency'? Well, it is simple and particularly pertinent to the IPC. The IPC uses 12 clock cycles per one machine cycle, the latter being the time needed to execute the simplest instruction, and more complex ones execute in multiples of this time. On the other hand, baud rates are usually derived by successively dividing a master clock by 2, for instance 19200, 9600, 4800, 2400, 1200, 600, 300, 150, 75. Because an asynchronous serial port as a rule has no clock signal (which is why it is asynchronous) the serial port hardware runs at a multiple of the selected baud rate so that it can check the input signal multiple times during one bit time to check that the baud rate is correct, synchronize it's internal operation to the incoming signal, and supress errors. For instance, the typical scheme waits until a transition is detected on the serial input line, assuming that is the start of a start bit. But then it samples the state of the line at a multiple of the assumed baud rate, say 16x, checks that say 3 successive samples are the same and the level detected is not a result of some interference, then figures that the 'middle' of the bit is going to fall around the 8th or 9th sample. Then it also checks that around the middle samples, say 6, 9 and 12 it sampled the same state to consider that the line really is at that state and that it is not seeing some sort of interference or a spurious or just plain wrong signal. In the end the rate of the reference clock from which the baud rate is derived, is always either some power of two times the baud rate (from which lower baud rates are generated), times two, or times three. This is usually based on the number of samples used to detect a bit, like 6 or 12, or 8 or 16. So, to get to the point, 11.0592 is 19200 x 64 x 9 or, a bt more clearly in this case, 19200 x 2 x 2 x 2 x 2 x 3 x 12. 19200 is the highest baud rate on the QL, and the IPC internally divides it's crystal clock by 12 to get the machine clock. So we are left with 2 x 2 x 2 x 2 x 3, and depending how we want to oversample our serial input, we could write this as 16 x 3 for 16 times oversampling, or 12 x 4 for 12 times oversampling, etc... you get the idea. Also, The relatively high 11.0952M frequency is very flexible when it comes to baud rate generation because it breaks down into a rather high baud rate and still offers enough x2 or x3 factors for a good selection of oversampling rates, like 6, 9, 12, 16, 18. As an aside, 11.0952/3=3.686400MHz which is also a popular baud rate crystal frequency, but not as flexible.
Those of us who still remember modems, remember that we used to set up serial ports to 9600 and that was considered fast... but then came 14400, not 19200, though we then configured our ports to 19200 as the modem could do compression so in theory we could go faster than 14400 from the computer to the modem. Then it doubled to 38400 but not to 76800, rather 57600 (14400x4). Then 115200 and even 230400. All of these can be generated from 11.0592, by choosing how many times it is divided by 3 and 2 to get the oversampled maximum baud rate. Then all the lower baud rates would be derived from this by successively dividing by 2.

5e) Using an inverted signal on the keyboard row outputs on the IPC would have reduced parts count by 8 resistors and 8 diodes, and freed the required PCB real estate. This also would improve signal integrity somewhat, especially for joysticks. Given how stingy Sinclair was on parts count, one would not expect to see this in the design. Unfortunately, correcting this would require a minor change in the IPC firmware.
The QL keyboard is an 8x8 matrix, which is scanned by outputting a high level on the row outputs (port P1 on the IPC) and then checking if a key is pressed in that row that connects the row signal to one of the column lines. The column lines are pulled low to give them a low (logical 0) default, so when a key is pressed, the logic one on the row line that the key switch connects to the column line will over-ride the default low and read as high, which is how the IPC detects a key. This is repeated for all rows periodically and an internal map is maintained to translate key presses and releases into key codes, that are then relayed to the OS via serial communication to the 8302 ULA. This is more or less the usual way it is done, but for one oddity. The port pins of the IPC are open collector, i.e. they can only pull the level on the pin low. In order to get a high level, an external pull-up resistor has to be added, that pulls to the +5V power supply. Also, in a keyboard matrix application, diodes have to be put in line with the row line to prevent back-drive of an inactive row line through paths created by simultaneously pressed keys.
Given how the IPC port operates, the most logical setup would be using 'negative logic', rather than a default low, a default HIGH would be established on the column lines (using the exact same resistors but connecting them as pull up rather than down), so that the row line could then be activated by pulling it low. Since it can ONLY be pulled low, there is no problem with back-drive - an inactive row line is high impedance so it would not care if it is back-driven, to any level. In contrast, using a classic TTL level output pin to drive a row would in this case make the row pin source voltage close to the power supply (+5V) so connecting that through the matrix with a row that is active and driven low, i.e. to ground, would be a form of a short circuit. - hence diodes have to used to prevent that. From the standpoint of the IPC software, implementing the keyboard using negative logic is a negligible change, the key scan routine would use low as active for a row, high (which results in a high impedance port pin) for inactive, and the input from the columns is simply be inverted using a single instruction to account for that change. As it is now, in order to provide a high level on the row line, there has to be a pull-up resistor on it and to prevent back-drive there also has to be a diode in line with it on the way to the matrix, plus the pull up has to be 'more powerful' than the default pull down on the column lines so that it can over-ride the pull down sufficiently through the diode so that a proper high level is still detected by the IPC when a key is pressed. Taking everything into account, the keyboard then operates with about 2/3 of the available voltage, with limited drive capability due to resistors and diodes effectively being in line with the row signals, and as stated requires 16 extra parts.

5f) Unprotected joystick ports that may also confuse the keyboard scanning routine especially if both are used at the same time. One could say that the extra resistors and diodes mentioned in 5e) above can be viewed as a form of protection from static discharge and EMI when a joystick is used, given that it can be plugged in or unplugged at any time, probably has a rather long and unshielded lead, and basically connects directly to the keyboard membrane. Alas, I doubt that. As mentioned above, the signals from the keyboard matrix are rather sensitive. Also, powered joysticks could also not be used as they do not support reading as a matrix, so QL compatible joysticks are expected to just have pure switches.
There is a more problematic thing, though. With matrix keyboards, one has to be careful to prevent spillover of the signal from one part of the matrix to another through multiple pressed keys. The standard telltale of this problem is that if you press 3 out of 4 keys forming the corners of a rectangle on the 8x8 key grid, the 4th corner key that is not pressed will be detected as if it was pressed. In order to completely prevent this sort of thing, each key switch should have it's own series diode to prevent 'back-drive' through the matrix (which explains why some keyboard switch manufacturers offer versions of their keys with a built-in in-line diode!). This problem is rarely tackled that thoroughly as the VAST majority of keys are not intended to be and usually are not pressed at the same time with any other keys other than than SHIFT, CTRL, ALT. Excepting those three in all combinations, multiple key presses mostly happen as one key is released and another pressed, so it's extremely rare to encounter the 3 corner key problem. Unless keys can map to a joystick or even worse, to two joysticks with common columns, because a joystick CAN press two keys at once which otherwise make no sense being pressed together (diagonal movement!). This seems to have been forgotten and the joystick pins connect to the matrix at the wrong place, the common row pins should connect to the actual IPC port pins (not after the diodes) and at least the two column pins that are common to both joysticks should have their own diodes, like SHIFT, CTRL and ALT have. Which is where we can use 4 of the 8 diodes saved in the previous point

Better still, CMOS switches or even optocouplers could have been used to isolate the keyboard matrix completely from the joystick lines, which would have nicely prevented static discharge and enabled the use of active joysticks and possibly other interesting things. Admittedly, that would cost something but if we were building it RIGHT that cost is really not that high - and it could be optional as not everyone even wants a joystick port.

5g) Serial port inputs can be combined by wired and-ing the receiver chip outputs, no extra logic required. Again a strange missed optimization and savings opportunity. Given that the two serial port receive lines are combined as explained in 5a), extra logic was needlessly used to accomplish that. The actual serial port receive chip outputs are pull down devices with an internal pull-up resistor. Just connecting the two outputs of each receive input together will provide the required AND function with no extra parts, without violating any parameters of the chip, in fact there is an official application note about that and I have used it for Aurora, where it works very nicely. Someone was not paying attention? It would save 2 out of 4 gates of a TTL chip on issue 5, or 2 input and 1 output pin on the HAL chip - plus a number of tracks on the PCB.

Posted: **Wed Sep 08, 2021 12:20 pm**

Nasta wrote:The I/O subsystem: 8302 ULA and IPC
...

I remember having trouble using Albin Hessler's serial mouse together with a printer on my QL issue 6.
His suggestion was to cut the connection between pins 6 of Ser1 & Ser2 and add an extra 680 Ohm (R10) from pin 6 of Ser2 to the 12V line. This prevented a moving mouse from messing up the printing process.

BSJR

Posted: **Wed Sep 08, 2021 11:11 pm**

Using the IPC for serial receive was a huge improvement over the Spectrum's Interface 1 (which simply relied on bit-banging by the main processor) and even over traditional UARTs in use at the time, which only could buffer a single byte until the 16550 came along. Unfortunately, the flawed implementation made it almost useless for modems - serial input had lots of single-bit errors (a fault in the sampling technique?) and a receiver overrun caused input to be 'delayed' by several characters, which could only be cured by a hard reset. There were products that could 'rectify' this problems somewhat (the Tandata Qconnect or Miracle's Modaptor) but the definitive solution was indeed replacing the IPC with Hermes. Even then I remember having problems with devices that did not respond immediately to a DTR going low - the 16550 being one of these, it apparently kept sending until its 16-byte buffer was empty.

The Sinclair QL Forum

Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M

Re: Expanding the QL's address space beyond 1M

Re: Expanding the QL's address space beyond 1M

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)

Re: Expanding the QL's address space beyond 1M (+primer on building QL compatible machines)