SGC successor brainstorming

Nagging hardware related question? Post here!
Post Reply
User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: SGC successor brainstorming

Post by Peter »

stephen_usher wrote:So, replacing the video completely should mean that there's no need to directly access the QL memory at all.
Not memory as storage, but memory mapped I/O. The QL's address and data bus would still be there and access in the internal QL register area, so keyboard, speaker, network and microdrives can still be used (if software timings are re-adjusted).

Peter


stephen_usher
Gold Card
Posts: 429
Joined: Tue Mar 11, 2014 8:00 pm
Location: Oxford, UK.
Contact:

Re: SGC successor brainstorming

Post by stephen_usher »

Peter wrote:
stephen_usher wrote:So, replacing the video completely should mean that there's no need to directly access the QL memory at all.
Not memory as storage, but memory mapped I/O. The QL's address and data bus would still be there and access in the internal QL register area, so keyboard, speaker, network and microdrives can still be used (if software timings are re-adjusted).

Peter
Why not put these behind an I/O processor which acts as a bridge between the QL memory/data buses and the new system, fully decoupling them? The QL's 68008 could be used to actually run the real hardware, passing the information to the I/O bridge which the new system accesses.


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: SGC successor brainstorming

Post by Nasta »

Peter wrote:A few answers on Nasta's nice long post. Wish I had the time for more.
Nasta wrote:Currently 1a + 2a is covered with the Tetroid SGC clone. The problem here is the Ingot CPLD which is becoming impossible to find and to a lesser extent the floppy controller chip.
Even under the assumption CPLDs can be sourced, I'm not sure it is covered, lacking decent video. This is mainly why I opened the topic. And I would close it, if Tetroid or you added video.
This requires a complete re-design, so no cloning. What I meant is that clone SGCs can still use Aurora, and there are some that can be made available, the non-existence of SGCs being the bottleneck.
Adding anything over Aurora graphics requires a thorough redesign which by necessity of fast access locates the graphics hardware close to the CPU, i.e. on the SGC successor.
Nasta wrote:Also, 1b + 2b os covered by the Q68 which is it's own single board computer and has no ties to old hardware except virtually
Maybe the term "virtual" sounds like "emulated" for some readers. Let me clarify: The Q68 implements QL components in hardware. Like SGC+Aurora, just with different features, and CPU inside programmable logic.
Yes, it was a clumsy word, I am sorry for that. What i meant was more along the meaning of 'in concept', as you say, different hardware that implements the same function.
Nasta wrote:In fact, given that the 68EC030@40M tends to run just fine at 50M and is actually cheaper than 68020 variants, AND it supports dynamic bus sizing which lets it easily use various bus widths concurrently in the system (extremely useful if you are doing a SGC replacement!), AND it can also be interfaced to burst-capable RAM sich as SDRAM, it would be THE chip to use for this purpose, without breaking the bank. Given proper support for external hardware, this option actually runs significantly faster than the current Q68.
Agreed, but only due to DRAM waitstates. If we compare for fast SRAM (thereby looking only at CPU speed) the Q68 executes many instructions in a single clock cycle, including multiplication. Even if a 16-bit databus is kept for the Q68, I doubt the 68030 still beats the Q68 at this core discipline. I know this is hard to believe, but I put much effort in getting the CPU to run that fast inside a lowest-speedgrade decade-old FPGA. If someone had time, 68030 reference hardware, and benchmarks that fit the Q68 SRAM area, it can already be checked on the current Q68.
Oh, I do not find this hard to believe at all - while I don't know how exactly the Q68 is implemented, I know what standards I would attempt to hold myself to when doing such a project and I am sure you would hold yourself to same or better :)
I am also aware that a narrower bus and memory wait states may well be the only thing that would make the 68EC030 implementation faster, and that's also when the memory controller for the EC030 is really well implemented (eg. low latency SDAM at x2 CPU clock frequency, with proper burst access).
To be perfectly honest, your implementation of the 68k core is worth further developing, perhaps into a FPGA implementation of some sort of a real 68k CPU 'chip'. Aside from caching, we really don't use the extra features of the 68020/30, except implicitly (eg, no one would be against single cycle rotations and shifts as well as improved speed short loops). As far as i know, even the Q40/60 basically uses the CPU as a very fast fully 32-bit addressed 68000, under any QL OS.
To reduce Q68 DRAM waitstates, I work on cache. My first implementation has only 2 KB, but there is a tag entry for every single word, I did not split it into lines. This should be very appropriate for QL-style code. (Currently I'm struggling with a bug, and I fear it is inside the synthesis tools.) Hard to predict if a 40 MHz Q68 with cache beats a 50 MHz 68030 with 32-bitwide DRAM, but the chance has become realistic. After all, 2048 bytes cache is more than 2x256 bytes, and 1 clock cycle per basic instruction is less than 2.
...and it takes a hell of a clever memory controller to get the 030 to work at the shortest possible cycle which does get that performance even with cache.
Given that we know that even the 'tiny' 256b caches in the 020/030 rather significantly improves the speed of QL program code, 2k should be plenty to get the performance to virtually 0 wait state operation.
As I said, what we really need is a really fast 68k. While we could dream of various instructions that would be nice to have, if I was really pressed into that discussion, just about the only thing I would add is a fast multiplier (which you already did), and some sort of a MAC instruction as many things with a signal processing basis (like audio and picture related compression / decompression) really depend on the performance of this sort of thing.
If I was making a wish list, what I would do is add an on-chip RAM controller (which you already did), which is directly connected with graphics (which you already did), add a cache for the CPU core (which you are doing) and add two things (which would require a larger FPGA with regards at minimum to the number of IO pins):
1) 32-bit wide RAM bus (so, dedicated memory bus)
2) fast IO bus, probably multiplexed address/data in some manner. This is a dedicated bus to interface peripherals ONLY, so a large address space is not really required. The original (and/or new) peripherals such as serial ports, keyboard/mouse, I2C, SDC and similar would then be implemented in a smaller FPGA, including an interface to something like a simplified standard QL bus where various other peripherals and perhaps a boot flash could be added. An existing standard like LPC could be used, for example.
Nasta wrote:On the high end, there is a FPGA implementation of a new 68k+ sompatible core called the 68080, which includes a FPU and extension to 64-bits. It is not as fast as the fastest still available coldfires (which I have decidedly skipped for reasons I'll get to below), but still about 3-4 times faster than the fastest 68060, possibly more, depending on how much you are willing to pay for the FPGA to implement it.
And depending on how much you are willing to pay for licensing - the "68080" is a strictly commercial core. They offer a massively clock-reduced and otherwise limited version without royalties, in the hope to get money for upgrades. But once you want that outside their Amiga-accellerator boards, there is need for negotiation and probably paid support work, besides large amounts of work for platform and FPGA vendor adaption. Since my name is known from Q60 and Q68 designs, they seemed open for cooperation, although I doubt they are aware how extremely small the QL scene is, and how little money they could make. I got some questions answered, but postponed a closer look. At the moment I don't feel like another high-end design, and anyway a SGC successor would probably not be the right place for it.
Peter
Agreed, this was really only an example to show that IN THEORY a FPGA implementation can go really far, but the question is, who would take up the test(s) you outlined and for what sort of return, which would mostly be fame in a very small community :)
Someone did mention a software emulation of a CPU core brought to a hardware level. This is another possibility, again requiring a huge amount of work. I am sure that as far as really low level implementations of emulation go, Marcel would be able to tell us a lot about what that involves. It is certainly possible to emulate a different CPU that way, it was often done in days past. For instance, a well known manufacturer used to make a Z80 hardware emulator which actually had no Z80, but rather a 68020 to emulate the hardware exactly, with help form some external logic chips. The same could be done with an ARM SOC - emulation code AND a fair amount of RAM would easily fit inside just the cache on the chip. Also, funny enough, it's become difficult to find one with only one core, which opens up a whole different pandora's box.


User avatar
vanpeebles
Commissario Pebbli
Posts: 2815
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: SGC successor brainstorming

Post by vanpeebles »

What a great thread, it reminds of those old sci fi films like Earth vs the Flying Sauces, where a team of expert minds and specialists from all over the world get together to counter an invasion.

I do like the idea of a super expansion card.


User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: SGC successor brainstorming

Post by Peter »

Nasta wrote:This requires a complete re-design, so no cloning.
Tetroids latest work was more a redesign than a clone already! My thought was, that he might need/want to replace the "INGOT" PLD anyway, sooner or later.

At this point, adding simple graphics could make sense for him. I personally would be okay with the QL modes, no extra video RAM needed. If he can at least add 13 lines to the PLD (RGB, HSYNC, VSYNC, 8 data lines) I think it is doable. That's the route I'd go, if I was very familiar with the SGC as Tetroid certainly is, and had PCB/schematics already in my CAD.

Advantages: No new software timings for network, the good retro-feeling of an actual Motorola 68020 chip, and working microdrives!
Nasta wrote:What I meant is that clone SGCs can still use Aurora, and there are some that can be made available, the non-existence of SGCs being the bottleneck.
Aurora availability would be wonderful, and I would certainly buy one. But to be honest, I always found the Aurora very difficult for the original QL case, which is the only target of this discussion.

So much DIY work is needed to build the required cables and adaptors, that even I would not be in good mood... let alone persons that can not solder. Mind you: Not even the QL matrix keyboard can be attached directly to Aurora.

It's almost like an unmodified Q68 was easier than that... well of course the matrix keyboard adaptor would have to be active, not passive then.
Nasta wrote:To be perfectly honest, your implementation of the 68k core is worth further developing, perhaps into a FPGA implementation of some sort of a real 68k CPU 'chip'.
Q68 "as a chip" is indeed an intersting idea, it could be understood as a relatively fast and inexpensive 68K microcontroller. The chip is flash-based, so it could be pre-programmed in a TQFP socket and distributed just like a non-programmable microcontroller. What I don't like about it: There would be so much documentation work for registers, timings, etc. And the QL-style "ROM Loader" inside would also need an overhaul for more general use.
Nasta wrote:As far as i know, even the Q40/60 basically uses the CPU as a very fast fully 32-bit addressed 68000, under any QL OS.
Yes, the 68060 speficic parts are mostly initialization, very few other places. On the application side, there is FPSAVE and a few programs with FPU support.
Nasta wrote:Given that we know that even the 'tiny' 256b caches in the 020/030 rather significantly improves the speed of QL program code, 2k should be plenty to get the performance to virtually 0 wait state operation.
Unfortunately not, because my implementation is just writethrough.
Nasta wrote:1) 32-bit wide RAM bus (so, dedicated memory bus)
So many pins lead to a BGA package, a technology where I'm probably out.
Nasta wrote:2) fast IO bus, probably multiplexed address/data in some manner.
Mayby this could be done with the amount of pins already available at the extension bus? Theoretically a multiplexed mode, e.g. 16 bit address + 16 bit data could be added, if one of the existing pins gets a secondary function as "address latch enable" signal.
Nasta wrote:Someone did mention a software emulation of a CPU core brought to a hardware level. This is another possibility, again requiring a huge amount of work.
I use software emulation purely as a tool. Gives me no fascination whatsoever. Even going from Motorola chips to FPGA almost hurts emotions, especially when the black case is the target. But at least it is still hardware, and the flash-based solution feels more QL-style than those FPGA boards where a microcontroller soft-loads volatile logic at powerup.

All the best
Peter


FrancoisLanciault
Trump Card
Posts: 167
Joined: Mon Aug 08, 2011 11:08 pm

Re: SGC successor brainstorming

Post by FrancoisLanciault »

Peter wrote:
Agreed, but only due to DRAM waitstates. If we compare for fast SRAM (thereby looking only at CPU speed) the Q68 executes many instructions in a single clock cycle, including multiplication. Even if a 16-bit databus is kept for the Q68, I doubt the 68030 still beats the Q68 at this core discipline. I know this is hard to believe, but I put much effort in getting the CPU to run that fast inside a lowest-speedgrade decade-old FPGA. If someone had time, 68030 reference hardware, and benchmarks that fit the Q68 SRAM area, it can already be checked on the current Q68.

To reduce Q68 DRAM waitstates, I work on cache. My first implementation has only 2 KB, but there is a tag entry for every single word, I did not split it into lines. This should be very appropriate for QL-style code. (Currently I'm struggling with a bug, and I fear it is inside the synthesis tools.) Hard to predict if a 40 MHz Q68 with cache beats a 50 MHz 68030 with 32-bitwide DRAM, but the chance has become realistic. After all, 2048 bytes cache is more than 2x256 bytes, and 1 clock cycle per basic instruction is less than 2.
Peter
I have a Q68 and a Amiga 1200 with a 68030 50 Mhz accelerator. I also coded a small benchmark program in 68000 assembly that use a few multiplication and division instructions. I have benchmark results for the Gold Card, Super Gold Card and for a NeXT Station 68040 @ 25 Mhz.

Porting the benchmark program to the SRAM area of the Q68 should be easy enough. The program is quite small and should fit in the available space.

I need a few days to learn how to program in assembly on the Amiga... Never done it before.

I will report the results here if there is any interest.

François


User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: SGC successor brainstorming

Post by Peter »

Yes, would be very interesting! Thank you!


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: SGC successor brainstorming

Post by Nasta »

Peter wrote:
Nasta wrote:This requires a complete re-design, so no cloning.
Tetroids latest work was more a redesign than a clone already! My thought was, that he might need/want to replace the "INGOT" PLD anyway, sooner or later.
At this point, adding simple graphics could make sense for him. I personally would be okay with the QL modes, no extra video RAM needed. If he can at least add 13 lines to the PLD (RGB, HSYNC, VSYNC, 8 data lines) I think it is doable. That's the route I'd go, if I was very familiar with the SGC as Tetroid certainly is, and had PCB/schematics already in my CAD.
Advantages: No new software timings for network, the good retro-feeling of an actual Motorola 68020 chip, and working microdrives!
Well, the INGOT PLD is a very small and simple one by today's standards but it does have a few unusual features (like full connectivity - it's not block based).
As long as it's on there, it's really a clone. I don't have the EXACT schematic, but I have a fairly good idea of the SGC schematic - some time ago I did look into replacing the INGOT.
Adding graphics means replacing the CPLD with a FPGA. Even with a fairly large CPLD extra video RAM would be needed to implement QL graphics BUT using standard VESA timing, using existing DRAM would cripple the 68020. Once we are speaking FPGA, then there is the availability of internal RAM - if nothing else as a buffer for video data.
Also any change to timing on a SGC does require changes to net and mdv timing, some things are still software based, the advantage being, there is a template that can be modified, i.e. the original SGC.

Regarding CPLDs, today's development tools hardly give the developer any way to leverage the actual architecture of the CPLD and use the available logic in a manner most natural to how it is structured internally in the PLD. Basically, there are many ways to describe what logic has to do, but old ('less clever') compilers would not try to optimize the code you were writing if it was expressed according to the structure of the targeted CPLD. This can be very beneficial to cramming as much logic as possible inside the CPLD, or better said, using as much of the available resources as possible. On Aurora I had usage in the 90+% range on all chips and it was quite easy (though restrictive when it comes to preferred pinout, but those chips were never good at that). The INGOT is one of such examples and it's full connectivity plus some other features make the job of replacing it anything but straightforward.

BTW a curiosity: A large number of pins on GC INGOT are used just as an address buffer for the QL bus, so there is unused logic in that chip. Quite a pity, as re-doing it would have been a great oportunity to support more RAM (and same for the SGC for that matter).

Perhaps the simplest one to replace it with would be one of the Atmel offerings (ATF1500/2500) but since Atmel and microchip have merged, these are on the way out too :(
Nasta wrote:What I meant is that clone SGCs can still use Aurora, and there are some that can be made available, the non-existence of SGCs being the bottleneck.
Aurora availability would be wonderful, and I would certainly buy one. But to be honest, I always found the Aurora very difficult for the original QL case, which is the only target of this discussion.
So much DIY work is needed to build the required cables and adaptors, that even I would not be in good mood... let alone persons that can not solder. Mind you: Not even the QL matrix keyboard can be attached directly to Aurora.
It's almost like an unmodified Q68 was easier than that... well of course the matrix keyboard adapter would have to be active, not passive then.
Fair enough, but it was never really built for the QL case, just happened to be small enough that it could be built in with a lot of work.
Unless one re-works the whole motherboard, or replaces it, it is not the best basis for 'extreme' expansion. Signal integrity is... well, a lottery.
The original idea was to make a small backplane that would hold 3 or 4 boards, pretty much standard euro size, perhaps with slightly different length constraints.
The ultimate idea was to have a 'largely CPU board' a 'storage management board' and a 'graphics board'. Of course, some related peripherals could be integrated on any of these boards.
So, while we started at SGC + Aurora (with (super)Hermes perhaps) + Qubide, the aim was GoldFire (which put basic peripherals on the CPU board, like keyboard, mouse, serial, parallel, floppy, and optionally sound and ethernet), Qubide II (that could also have flash based media and optionally ethernet), Aurora II (which could also have extra and faster hard drive ports). The possible redundancies were there with a reason, to introduce 16 and 32 bit peripherals, but have 8-bit ones to start with.
Nasta wrote:To be perfectly honest, your implementation of the 68k core is worth further developing, perhaps into a FPGA implementation of some sort of a real 68k CPU 'chip'.
Q68 "as a chip" is indeed an intersting idea, it could be understood as a relatively fast and inexpensive 68K microcontroller. The chip is flash-based, so it could be pre-programmed in a TQFP socket and distributed just like a non-programmable microcontroller. What I don't like about it: There would be so much documentation work for registers, timings, etc. And the QL-style "ROM Loader" inside would also need an overhaul for more general use.
Well, yes, and less peripherals (perhaps none?) but things like interrupt pins. Perhaps a serial (SPI/QSPI) EPROM to load boot code?
Nasta wrote:Given that we know that even the 'tiny' 256b caches in the 020/030 rather significantly improves the speed of QL program code, 2k should be plenty to get the performance to virtually 0 wait state operation.
Unfortunately not, because my implementation is just writethrough.
Wellll... yes, but so is the 68030s. On the other hand, if a long word sized write buffer is included (a simple one, i.e. if a write is attempted while another is in progress, it will have to wait for write completion) it will save a lot of cycles on average. We don't do MOVEM that often :P
Nasta wrote:1) 32-bit wide RAM bus (so, dedicated memory bus)
So many pins lead to a BGA package, a technology where I'm probably out.
In this case I was thinking no 'on board' peripherals - which would probably save a number of pins, see below...
Nasta wrote:2) fast IO bus, probably multiplexed address/data in some manner.
Mayby this could be done with the amount of pins already available at the extension bus? Theoretically a multiplexed mode, e.g. 16 bit address + 16 bit data could be added, if one of the existing pins gets a secondary function as "address latch enable" signal.
Exactly - but this opens up a comparatively huge address space. Even if you just use a 16-bit multiplexed address/data bus with some control pins, you would have a 128k IO area for expansion. This does sound small, but for IO uses, this is enormous. There is rarely a device that needs more that 256 bytes of directly accessible addresses, most need a small fraction, although often use more just for ease of decoding. This 'link' can be very fast as it's intended to be local, which in turn also means it could be a bit more complex with regards to communications protocol, but then, use less pins - eg. a simplified LPC style bus.
Now, instead of integrating peripherals inside the 'main' FPGA, you could dedicate a separate small FPGA and put in as many peripherals you like or it fits. Or, do that and also convert the 'link' into something resembling a QL bus. I'm sure I don't have to further explain the possibilities :)
Since peripherals in the Q68 are internally addressed through a reduced bus (i.e. a dedicated portion of the entire addressing space) the aforementioned IO area would just be that same space, extended.
Nasta wrote:Someone did mention a software emulation of a CPU core brought to a hardware level. This is another possibility, again requiring a huge amount of work.
I use software emulation purely as a tool. Gives me no fascination whatsoever. Even going from Motorola chips to FPGA almost hurts emotions, especially when the black case is the target. But at least it is still hardware, and the flash-based solution feels more QL-style than those FPGA boards where a microcontroller soft-loads volatile logic at powerup.

All the best
Peter
Well, while I do agree, a long time ago such 'hardware assisted software emulation' did catch my interest, in the form of the Transmeta Crusoe CPU...
On the other hand, it seems that keeping the 68k CPU alive is destined to be a task for FPGAs...

I do have two questions:

1) Is the MOVEP instruction implemented - if yes, then successive bytes on 8-bit peripherals can be efficiently accessed, if such peripherals are capable of some sort of block mode data transfers. (Un?)fortunately using just half the 16-bit bus means code execution from 8-bit memory is not possible.
2) What is the fast SRAM on Q68 used for? Sounds like a great place to put some emulation code into :P


User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: SGC successor brainstorming

Post by Peter »

Nasta wrote:Adding graphics means replacing the CPLD with a FPGA. Even with a fairly large CPLD extra video RAM would be needed to implement QL graphics BUT using standard VESA timing, using existing DRAM would cripple the 68020.
I was under the first impression that 512x256x2 bit would still be okay with DRAM and PLD (Q40 also does it with DRAM and PLD). But I was probably wrong, it is not dual-ported, which makes the difference. I think moving to FPGA makes little sense for the original SGC, so forget my idea.
Nasta wrote:Fair enough, but it was never really built for the QL case, just happened to be small enough that it could be built in with a lot of work.
Yes, I did not mean to criticize Aurora at all. At design time everyone wanted to leave the original QL case!
Nasta wrote:Well, yes, and less peripherals (perhaps none?) but things like interrupt pins. Perhaps a serial (SPI/QSPI) EPROM to load boot code?
There are two interrupt pins. If you want the CPU only, it's probably easier to make your own FPGA - the CPU is under LGPL, I have contributed back my results.
Not sure an SPI EPROM is needed. There are very small, and also embeddable SD/MMC cards, so the loader could keep its principle.
Nasta wrote:Now, instead of integrating peripherals inside the 'main' FPGA, you could dedicate a separate small FPGA and put in as many peripherals you like or it fits.
Yes that's what I'd do if more pins were inevitable.
Nasta wrote:Well, while I do agree, a long time ago such 'hardware assisted software emulation' did catch my interest, in the form of the Transmeta Crusoe CPU...
Same here, I was also dreaming of letting it "morph" 68K code.
Nasta wrote:1) Is the MOVEP instruction implemented - if yes, then successive bytes on 8-bit peripherals can be efficiently accessed, if such peripherals are capable of some sort of block mode data transfers.
Yes.
Nasta wrote:(Un?)fortunately using just half the 16-bit bus means code execution from 8-bit memory is not possible.
A true 8 Bit bus could be implemented, it is an FPGA after all. Just that there are priorities how to spend sparetime.
Nasta wrote:2) What is the fast SRAM on Q68 used for? Sounds like a great place to put some emulation code into :P
Small part of SMSQ/E is inside (e.g. sound routines, so BEEP does not slow down when the hardware SLUG is used for old games). The rest is free for the user, Wolfgang even made commands for allocation, although it is really not much space.

If I get cache to work, there will be temptation to use it for larger cache.

Peter


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: SGC successor brainstorming

Post by Nasta »

Peter wrote:
Nasta wrote: Adding graphics means replacing the CPLD with a FPGA. Even with a fairly large CPLD extra video RAM would be needed to implement QL graphics BUT using standard VESA timing, using existing DRAM would cripple the 68020.
I was under the first impression that 512x256x2 bit would still be okay with DRAM and PLD (Q40 also does it with DRAM and PLD). But I was probably wrong, it is not dual-ported, which makes the difference. I think moving to FPGA makes little sense for the original SGC, so forget my idea.
Yes, well, if it was the old screen format and timing, it would not be too bad but not easy for a CPLD as it would need several counters, and a 32-bit buffer for the video data, and as you know this sort of thing eats up CPLD resources like crazy. On the other hand, easy for a FPGA.
The problem is precisely that it is not dual ported, unlike on the Q40/60. It would have to be managed similar to the page mode access the 8301 uses, just with a wider bus to get enough data for VESA timing, and it would take significant RAM bandwidth. The situation is somewhat better using more modern, or I should say, less old :P DRAM chips with faster access time (60ns), but a back-of-the-envelope calculation says it would slow down DRAM access by about 20-30% which is rather bad just to get the old stuff. That being said, it could, at same cost of bandwidth, perhaps have Aurora functionality. Even then, the logic would have to be split into two CPLDs - one to handle all addressing and DRAM control, the other to handle video data, with video timing probably split between the two. Again, the big problem is that doing a four long word buffer takes up 128 macrocells of a CPLD, which is about 1/4 of even a large CPLD. So we're back to FPGAs.
Having VRAM ('dual ported') added on as dedicated screen RAM of course makes this much more doable. There is also perhaps a way to make it very flexible and simplify hardware while adding block fill options (to put expanding single bit bitmaps such as fonts into color in hardware), by using some addressing trickery and non-maskable interrupts.
Nasta wrote:...Perhaps a serial (SPI/QSPI) EPROM to load boot code?
...Not sure an SPI EPROM is needed. There are very small, and also embeddable SD/MMC cards, so the loader could keep its principle.
Sorry, I miswrote - (Q)SPI flash. I would not call them small, you can get up to 64Mbytes in an 8-pin package. Of course, no need for that size. I think I used 16Mbyte sizes as far as 10 years ago.
QSPI (quad SPI) flash has been used to hold PC motherboard BIOS code at least the last 10 years, now they are used literally everywhere because PCs have driven the cost down. Other technologies such as phase change and ferroelectric RAM also are contenders in the same arena - the latter have a relatively small capacity (but not in QL terms!) but have instant and unlimited number of writes, it is basically a RAM that is non-volatile. Most QSPI flash chips available today work at 33 to 133 MHz clocks when reading, so they are very fast.
What I was aiming at is a fairly small capacity (in today's terms, but probably big in QL terms) flash to boot from - which unlike the SD card (which is really very closely related technology) is soldered on the board (and can be updated from SD!) , holds something like any default code needed, even a default OS, and because the actual part is completely under control of the person who assembles the boards, is not prone to problems due to odd manufacturer's SD cards quirks etc. Of course this by no means prevents a boot loader to be present, in fact it is very similar to, and possibly even simpler than SD. Most QSPI flash go into continuous read mode with a single command cycle (starts off as simple 1-bit SPI), and then you just continue reading words from beginning to whatever point you need.
Nasta wrote:1) Is the MOVEP instruction implemented - if yes, then successive bytes on 8-bit peripherals can be efficiently accessed, if such peripherals are capable of some sort of block mode data transfers.
Yes.
That's great because it enables using 8-bit wide peripherals and manipulating decoding to simply read or write successive bytes (in form of words or long words in the software) from devices that read blocks of data through a single byte wide address.
Nasta wrote:(Un?)fortunately using just half the 16-bit bus means code execution from 8-bit memory is not possible.
A true 8 Bit bus could be implemented, it is an FPGA after all. Just that there are priorities how to spend sparetime.
Given that there is a boot loading mechanism and if there is sufficient addressing space, implementing a true 8-bit bus is not a priority. It would be only if you wanted a FPGA based 'SGC' to access old 8301 RAM to display a screen, as then it has to do it exactly as a regular 68008 does or all graphics drivers would have to be changed.
Since we are (hypothetically) speaking about a SGC replacement WITH on-board graphics (that makes the 8301 unused), it may well be easier to just implement a larger address space to fit various peripherals, and modify the drivers to access the 8302 (unless that too is - and I hope it indeed is the case - replaced by more modern and advanced hardware that implements the same functionality plus more. TBH I would leave microdrive access to classic QLs.

So, if I can make a wish, it would be for a multiplexed (reduced pin count) real 16-bit expansion bus - based on the existing pin count for the non-multiplexed version. Even if not all of the 16 bits available during the 'address phase' on the bus are not used as an address, it would open up a lot of possibility for future expansion.
Nasta wrote:2) What is the fast SRAM on Q68 used for? Sounds like a great place to put some emulation code into :P
Small part of SMSQ/E is inside (e.g. sound routines, so BEEP does not slow down when the hardware SLUG is used for old games). The rest is free for the user, Wolfgang even made commands for allocation, although it is really not much space.
If I get cache to work, there will be temptation to use it for larger cache.
Peter
Ah, clever - I suppose things that produce OS overhead such as (a part of) the scheduler might also benefit.
And of course, using it for larger cache is an understandable temptation - I suppose some sort of cache freeze logic would unfortunately complicate things as with it you could have it both ways.
I also suppose that the FPGA being flash based, you essentially hard coded the boot loader code?


Post Reply