Faster/wider CPU...

Nagging hardware related question? Post here!
User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Faster/wider CPU...

Post by Dave »

Stupid question...

Why can't I provision 16-bit memory for the video memory etc and just have the video take the bottom 8 bits? The top 8 bits would be invisible, no?

Alternatively, the video memory could be read out by a new 16-bit video system that gave VGA-compatible or better output? Flash would be lost, maybe, but who cares?

Sorry for being a smooth-brain! ;)


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: Faster/wider CPU...

Post by Nasta »

Dave wrote:Stupid question...

Why can't I provision 16-bit memory for the video memory etc and just have the video take the bottom 8 bits? The top 8 bits would be invisible, no?

Alternatively, the video memory could be read out by a new 16-bit video system that gave VGA-compatible or better output? Flash would be lost, maybe, but who cares?

Sorry for being a smooth-brain! ;)
The ULA expects the contents of the screen to reside contiguously inside the screen area, and assumes byte addresses. The CPU in your case turn assumes word-wide memory. If you connect the ULA to only one half of the data bus (either top or bottom 8 bits) it will display only the even or odd byte within a 64k (note, NOT 32k) area, i.e. either the even or odd byte of SCR0 in the top half of the picture, and then even or odd byte of SCR1 in the bottom half of the screen, depending on what byte you connect it to (upper or lower). So, you COULD connect it that way, but then all your screen drawing / printing routines would have to be modified to take this into account. Not that it could not be done but a display format where every other byte is unused is extremely cumbersome to work with, never mind the loss of memory space.

Let's examine the inner workings of the ZX8301 a bit further, to see if there is a way around this.

It has one RAS line to the RAM and two CAS lines. CAS can also be viewed as a chip select of sorts.
The ZX8301 never uses CAS1 for screen access (which IMHO was really stupid as it would have enabled 4 screen areas to be used with VERY little extra logic). It also has an enable line which is used to switch-off the CPU from the RAM while the screen memory contents are being transferred to the ZX8301 and to the screen.

The RAM itself is an array organized as 256x256 bytes (one 1-bit chip is used for each bit of the byte) adding up to 64k. Two such arrays are implemented with 16 chips, one is connected to CAS0 and this one normally resides at address 20000h, and holds the two screen areas. THe other array is connected to CAS1 and resides at address 30000h.

The RAM itself has only 8 address lines so the RAS and CAS lines are used to latch the upper and lower byte of the address.
Peculiar to the RAM used, and common to nearly all dynamic RAM, the internal organization is 'only' 256 words or 256 bit each, and the row address (which is signalled to the RAM by the RAS line), once latched, actually completes the reading of data within the RAM. The column address, which is signalled to the RAM by the CAS line, then only selects one out of 256 bits within this long word, and puts it on the data out pin of the RAM. Because the whole 256 bit word, once read by RAS, remains in a local 'buffer' within the RAM chip, it is possible to get to other bits of it MUCH faster than reading a new 256 bit word. This sort of access is called 'page mode' access, and the 256-bit word is normally called a RAM page.

Why all this?
Well, the ZX8301 uses page mode to read 4 consecutive bytes out of the RAM when it reads display data. It buffers the 4 bytes internally and assembles RGB pixels from this buffer, and while doing so, lets the CPU have RAM access, until it needs to fill the buffer again. The reason this is done is to shorten the time needed to read the data - it takes about half the time to do this kind of access than it would take reading each byte the usual way.

Incidentally, this is one reason why it is difficult to adapt the ZX8301 to a 16-bit wide bus. On a 16-bit bus, it would take only two accesses to get 4 bytes, not 4. Of course, the ZX8301 assumes an 8-bit bus so there is no way to stop it from doing it's 4 accesses. But, there is a way we could make them appear as 4 accesses, twice to each of two consecutive addresses, but it requires external logic.

Normally the ZX8301 starts at the lowest address and it generates it's screen accesses something like this:
1st. 4 bytes, address 20000h..20003h (note that only the least signifficant 16 bits are transferred to the RAM, in this example, 0000h..0003h, the upper bits are used to decode what we are accessing):
00h -> RAS
00h -> CAS0, 01h -> CAS0, 02h -> CAS0, 03h -> CAS0

2nd 4 bytes, address 20004h..20007h:
00h -> RAS
04h -> CAS0, 05h -> CAS0, 06h -> CAS0, 07h -> CAS0

etc, until the last 4 bytes, address 27FFCh..27FFFh
7Fh -> RAS
FCh -> CAS0, FDh -> CAS0, FEh -> CAS0, FFh -> CAS0

Now, in a 16-bit system, the RAM addresses address 16-bit words, not bytes. Hence, 64k addresses address 64k words, or 128k bytes. This means that an 8-bit chip such as the ZX8301 needs all it's addresses divided by 2, and the least signifficant bit of the address must be used to select the low or high byte. We need to make the ZX8301 look like this, to the RAM:

1st. 4 bytes, address 20000h..20003h (note that bits 15 to 1 of the address are transferred to the RAM, and bit 0 is used as a byte select within a word):
00h -> RAS
00h -> CAS0:H, 00h -> CAS0:L, 01h -> CAS0:H, 01h -> CAS0:L

2nd 4 bytes, address 20004h..20007h:
00h -> RAS
02h -> CAS0:H, 02h -> CAS0:L, 03h -> CAS0:H, 03h -> CAS0:L

etc, until the last 4 bytes, address 27FFCh..27FFFh
3Fh -> RAS
FEh -> CAS0:H, FEh -> CAS0:L, FFh -> CAS0:H, FFh -> CAS0:L

Where CAS0:H means the high byte is passed to the ZX8301 data bus, and CAS0:L means the low byte is passed to the ZX8301 data bus.
Some examination shows that what actually happens is, the address lines of the XZ8301 are shifted one down, so A7 connects to A6 on the RAM, A6 connects to A5 on the RAM etc, down to A1 connecting to A0 on the RAM.
But, what about A0? Well, this is where the problem lies. A0 has to be latched by RAS in order to use it as A7 when the CAS signals appear. Also, A0 during CAS must be used to select which of the bytes out of a word (upper or lower) should be sent to the ZX8301.

All of this pertains ONLY to the ZX8301 accesses to the screen RAM. Everything else one can leave as needed. Still, there are two problems that need to be addressed:
1) When the ZX8302 is not accessing the screen RAM, it's internal control register (the one holding the 3 display control bits) is accessed when the proper address appears. The connection to the ZX8301 should be such that the register appears at the proper address expected by the OS.
2) RAM refresh - I am not entirely certain if the ZX8301 generates refresh cycles as such, I've never investigated this. The screen refresh process is theoretically not enough to provide refresh since this requires all rows of the RAM to be read at least once every 4ms, and the screen refresh takes 20ms and only reads half the rows. In the above modification, the row address is shifted down so whatever row address sequence the ZX8301 uses, there would only be half rows used. I'm not sure if or how this would reflect on proper RAM refresh. The RAM also has a refresh mode where it counts it's rows itself, if the ZX8301 uses this method to refresh the RAM, the number of refresh cycles stays the same so refresh is guaranteed. The fact that tweaks to the logic around the ZX8301 were used to produce internal 512k RAM expansions using larger chips (which also have more rows to refresh!) do suggest that it might indeed be so, and that refresh might not be an issue.

All that being said, it's a big question weather it's even a good idea to use the ZX8301 since they are getting scarcer by the day, and are not the pinnacle of reliability to begin with.

Some additional data on how he GC and SGC do this:
Both only write to the QL's motherboard RAM, speciffically to the screen area(s) for the benefit of the ZX8301. At the same time, the same exact data is written to the internal GC/SCG memory at the same address. It is always read from the internal GC/SCG memory. The reason is of course speed - the GC RAM is at least 4 times as fast as QL motherboard RAM at it's fastest (and nearly 3x on top of that on the SGC). This method of access is called shadowing, in general it means two or more copies of the same memory space exist in various physical memory chips, for various purposes.

The sadly never realized design of the GoldFire added one more level of trickery, a write buffer, which enabled the CPU to write up to one long word to it's IO interface, and go about it's business while the actual data was transferred to the external bus. It would only have to wait if it needed to write something else before the current write was completed.
Aurora, on the other hand, used dual-port RAM for the screen (in all modes) in order to speed up access even with a standard 68008. This scheme only prevented the CPU from immediately accessing the RAM 5% of the time worst case, whereas with the ZX8301 it approaches 50%.


User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Faster/wider CPU...

Post by Dave »

The nice thing about cool tricks is, well, ummm...

If you're at Nasta's skill level, the tricks are easy, and it's just a matter of making a plan, a design, testing and implementing and testing, going through a few iterations at moderate but limited expense until you have something that works as intended.

At my skill and budget level, it's a different matter. I have not kept current, know little of modern components beyond their basic principles, and am happiest with 74-series logic. I can do PCB design to a higher level due to radio work I have done in the past. I successfully cloned Nasta's QubIDE to make a CF card fast storage device (neat, but lost now) but Aurora-level designs are beyond my sole capabilities.

The point of this thread is to produce a least cost QL clone with moderately better performance at least development cost. This includes dropping anything tricky or needless like microdrive support, etc. The point of this thread is to draw together like-minded people to throw together a "least effort" effort ;)


nichtsnutz
ROM Dongle
Posts: 24
Joined: Wed Apr 13, 2011 6:33 pm

Re: Faster/wider CPU...

Post by nichtsnutz »

Hello Dave,

as you would like to use the 68EC000,a first minimal expansion
you could build could use it in 8 bit mode but with the clock doubled
together with a 512KB sram for the address range $40000 to $BFFFF.
So to say,you would have a 68008 but with 15MHz when accessing the
expansion memory.
Controlling the speed would be done over DTACK.
When accessing the onboard memory you would use the DTACK generated
by the ZX8301 ula.
When accessing the expanded memory you have to disable the ULAs via DCMCL
high and would genearate a fast DTACK based on the 15MHz clock.
Things you would need :
1) 68EC000 in 8 bit mode via the MODE pin.
2) 512KB sram chip.
3) A PLL to double the 7.5MHz clock of the expansion port.
This is a bit tricky.Maybe use an ICS501 or a ICS9173B.
4) Some Address decoding and control logic.I thing this could be made
with some standard TTL.(gates,counter)
5) New DTACK generation.The cpu would always run with 15MHz,DTACK
is doing the magic!

I thing this would be the simplest possible to do,although the details
can of course cause some headache !

I have also a 68SEC000 as a spare part and will think about doing this,
but at the moment I am doing other measurements and have some other
smaller hardware tinkering running to assist me in measuring.

If you need,I can post measurements of the cpu access to the ZX8301 ula,
because at the time I and Daniele (the author of Q-emulator) are doing some
ula timing research to get the Q-emulator as close as possible to the real hardware.
If you need some timing support to get your project running I would like
to help as I can.

I have attatched a timing where you can see a cpu access that is stalled
by the ula and also the page mode access for the 4 bytes as the user Nasta
already explained.


Many Greetings,
Vassilis
cpu - ula timing
cpu - ula timing


User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Faster/wider CPU...

Post by Dave »

I think, sadly, this is the comment that broke the camel's back, and I am just going to quietly bow out of the thread in the safe knowledge that no matter what you do, someone will tell you you're wrong.


User avatar
vanpeebles
Commissario Pebbli
Posts: 2815
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: Faster/wider CPU...

Post by vanpeebles »

Maybe you should start a thread with your ideal setup and leave Dave with his setup which works for him? :)


MemoryLaneComputing
ROM Dongle
Posts: 9
Joined: Sat Mar 19, 2011 10:51 pm

Re: Faster/wider CPU...

Post by MemoryLaneComputing »

Brane2,

No, you're not missing anything, except a little bit of courtesy and restraint.

Why not just cool it down a bit? Everyone can see that you are "The Man". This is an interesting thread but you are really making it uncomfortable to read with the way in which you respond to people. There is absolutely no need to be so argumentative.


Adrian


MemoryLaneComputing
ROM Dongle
Posts: 9
Joined: Sat Mar 19, 2011 10:51 pm

Re: Faster/wider CPU...

Post by MemoryLaneComputing »

I rest my case.


User avatar
vanpeebles
Commissario Pebbli
Posts: 2815
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: Faster/wider CPU...

Post by vanpeebles »

Brane I think you really need to take a look at your posts and how you react to other posters. This a friendly, supportive community with a very easy going nature. Any QL project will be supported regardless of how it gets made or achieves it's goals.

Dave has his way of working and uses the tools/methods that are right for him, as we all do.

The forum has been going great lately so I don't want to get heavy handed.


User avatar
vanpeebles
Commissario Pebbli
Posts: 2815
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: Faster/wider CPU...

Post by vanpeebles »

No problem 8-)


Post Reply