SGC successor brainstorming

Nasta · Post by **Nasta** » Thu Nov 22, 2018 2:16 am

Oh, BTW just recently I found out someone made expansion video cards for the Atari ST line of computers using older VGA chips.
I looked at several and found a promising one from Cirrus Logic, of course today it is long obsolete, but can still be found for a fairly decent price.
It is possible to implement it with an 8, 16 and 32-bit bus (the former using a completely different method than the latter, 32-bit access).
That bieng said, getting this to display old QL modes would be really tedious, as is getting around all the legacy VGA requirements before they figured out flat bitmaps and lack of paging. It is probably doable but it would be a chore, and although it would come with bit blit and perhaps other perks, it would still be limited to fairly low resolutions, such as 1280x1024, or perhaps some low frame rate 16:9 formats, and possibly less if 16-bit color is required.
Either way you look at it, graphics on the QL is not simple to upgrade.

Peter · Post by **Peter** » Thu Nov 22, 2018 9:23 am

Nasta wrote:The situation is somewhat better using more modern, or I should say, less old DRAM chips with faster access time (60ns), but a back-of-the-envelope calculation says it would slow down DRAM access by about 20-30% which is rather bad just to get the old stuff.

Okay let's do a two-line forum calculation: We need 32 x 32 horizontal data bits for 512 MODE 4 pixels. If fetched into 64 bit buffer, i.e. with 2 consecutive DRAM reads, that takes 16 x 125 ns = 2 µs per line for old (SGC generation) FPM DRAM. At 60 Hz and VESA 1024x768 signal (not resolution) that is 9.2% of memory bandwith only, lets say 10% for not exactly matching clock frequencies.

So I was not completely wrong with my idea. The microdrive and net timing loops running from cache would not suffer. (By the way, a 4:3 resultion like 512x384 would come without further RAM bandwidth loss, and even full 1024x768 might be acceptable, costing 20%. There is not enough buffer for a full line aynway. So if lines are single, doubled or tripled is irrelevant for bandwidth.)

Looking at typical system usage speed, the massively faster video should compensate the small loss of RAM speed, which is further reduced by the 68020 cache. E.g. if Tetroid grabbed a 5V tolerant PLD like the Lattice LC4256 in 144 pin TQFP, he'd have 256 macrocells instead of 48, and all the pins he needs, with the same board space as the EP1810 socket. In a first step, the new data and RGB/sync lines could stay unused, just a 1:1 replacement of the original pin functions in a new package.

Remember: I wrote this under the assumption that Tetroid works on replacing the EP1810 anyway, i.e. the logic equations were recovered or rewritten.

I'll try to answer more later.

Peter

Nasta · Post by **Nasta** » Thu Nov 22, 2018 3:01 pm

Peter wrote:
Nasta wrote:The situation is somewhat better using more modern, or I should say, less old DRAM chips with faster access time (60ns), but a back-of-the-envelope calculation says it would slow down DRAM access by about 20-30% which is rather bad just to get the old stuff.
Okay let's do a two-line forum calculation: We need 32 x 32 horizontal data bits for MODE 4. If fetched into 64 bit buffer, i.e. with 2 consecutive reads, that takes 16 x 125 ns = 2 µs per line for old (SGC generation) FPM DRAM. At 60 Hz and VESA 1024x768 that is 9.2% of memory bandwith only, lets say 10% for not exactly matching clock frequencies.

So I was not completely wrong with my idea. The microdrive and net timing loops running from cache would not suffer.

Looking at typical system usage speed, the massively faster video should compensate the small loss of RAM speed, which is further reduced by the 68020 cache. E.g. if Tetroid grabbed a 5V tolerant PLD like the Lattice LC4256 in 144 pin TQFP, he'd have 256 macrocells instead of 48, and all the pins he needs, with the same board space as the EP1810 socket. Maybe even half that PLD macrocells would do, espcially if faster DRAM allows 32 bit buffers. In a first step, the new data and RGB/sync lines could stay unused, just a 1:1 replacement of the original pin functions in a new package.

Remember: I wrote this under the assumption that Tetroid works on replacing the EP1810 anyway, i.e. the logic equations were recovered or rewritten.

I'll try to answer more later.

Peter

Never said the idea was wrong, the question is, is it worth doing QL resolution at that bandwidth cost, or adding VRAM and doing a LOT more graphics and nearly no bandwidth cost. When I say 'a lot', we are talking Q40/60/68 type graphics.

Now, my calculation was based on a 32-bit buffer since wide buses and 'registers' inside segmented PLDs tend to gobble up a lot of resources. It was also based on a 60ns DRAM and the availability of a double clock, at 48MHz. 24MHz this was chosen on the SGC rather than 25 because it's a small performance hit compared to maximum but relaxes a bit some timing and also is re-used for the floppy controller chip, and I put a double speed oscillator to be able to better target DRAM timing. Also, 60ns DRAM was used as a template. What you get is you can just about squeeze out 0 wait state operation for the 68020 at 24MHz, but it is so tight that PCB layout would have to be superb. Even so, let's go with this assumption.
Due to the fact that timings have to ultimately be expressed as a whole number of 48MHz clock cycles, fast page mode access of the DRAM shows fairly little improvement until you get to 3 consecutive long words. Surprisingly the situation is better in relative terms with the actual 80ns DRAM as used on the SGC but then the 68020 already takes a 1 wait state hit.
Taking into account what kind of realistic DRAM cycles can be generated from a 48MHz clock (and I did not go into a deep analysis), we can calculate that we can squeeze 133333 access cycles into one 60Hz refresh period. This would be the theoretical maximum bandwidth only for the 68020. Since we need 32 such access cycles per display line and there are 768 display lines within one 60Hz refresh period, this means we need to take 24576 cycles for video generation. A quick division gives us an 18.43% bandwidth taken for video. The need to express all timings in terms of 48M clock cycles imposes quite penalty compared to a quick theoretical calculation.

Even this is not a big problem, given that on the SGC, there is a 1 cycle wait state at 24MHz (as far as I can tell, possibly 2?) so that in itself would mean a 33333 cycle penalty out of the 133333 ideal number, so greater than what was calculated for video in the above example - such an improved SGC would actually be somewhat faster than the current one.

Now, that being said, the above calculation has some key factors that make it optimistic:
1) 60ns DRAM not the original 80ns, though this would not be a problem since nowadays if you find DRAM at all it's likely to be latest generation and that fast, or sometimes even faster.
2) Availability of double clock. Without this, the basic unit of time used to generate DRAM timing in the logic is too large and the DRAM bandwidth would be used quite poorly.
3) No further penalties were calculated, such as arbitration between screen and CPU accesses, which always go to the detriment of the CPU, it is the only one capable of inserting wait states - and also some few further cycles may be needed for DRAM refresh, though this may probably be avoided just by properly multiplexing address lines, and have the repeating screen area reads used as refresh. It
4) Complete and efficient rewrite of the INGOT logic. This is quite important because the EP1810 CPLD, while having only 48 macrocells, has quite extensive routing options for them (much more than most todays CPLDs) so resources that would remain orphan in more modern CPLDs (=not being able to be used because signal routing options have been exhausted) can be reused in it. While modern CPLDs have a LOT more resources, segmenting and the addition of a 32-bit bus to the CPLD would require a hefty re-hash of the logic, which might require other compromises. Directly emulating the INGOT would not get the most optimal logic implementation for a newer CPLD.
Also, one factor is responsible for the calculation possibly being pessimistic, and this is using a CPLD so implementing a more clever buffering scheme that would result in even less bandwidth penalty, however this is probably not feasible, as it would require twice as large a FPGA just for the buffering.

Finally, keep in mind that the calculation is from a 'maximum theoretical speed we can get out of a 68020 at 24MHz given 60ns DRAM', not 'actual SGC' baseline.
So, therefore my original guesstimate. Just re-using SGC assumptions (80ns DRAM, 24MHz clock) gets us to my original figures and likely much worse.
Of course, the point about cache and the 020 being a bit more clever in managing wait states is well made.

It should be noted that using a FPGA instead and being able to set up larger buffers for video data inside it, opens up a whole new world of possibility, even more so if SDRAM is used, like on the Q68 - quite probably one could do Q68 graphics modes without the 'SGC replacement' being any slower than the actual SGC, probably even a bit faster.

BTW I have looked into the source for the INGOT20. Unfortunately the file seems to be corrupt, but the gist of it can be figured out. Some of it is very clever, as one would no doubt expect given it's designers. However, without a manual for the EP1810 and the compiler used to produce the programming code, it is very difficult to figure out some important parts - for instance, that CPLD makes it possible to use a pin for combinatorial (and possibly even latched) input or output while it's associated macrocell with more advanced functions becomes burried and can be used independently. However, this is not declared by using syntax of the 'programming language' but rather explicitely, in the definition of the pin/macrocell configuration, 'by hand' as it were. The source code uses an abbreviated (single character per option select) form of declaration so it's difficult to be sure for some of the cells how they are configured. It would perhaps be simpler to completely re-create the logic from scratch, knowing what it's supposed to do. I might be wrong but it seems to me that Tetroid is for now limited to just using the JEDEC file to program the device, rather than being able to compile the logic from the source code, at least this is my (un)educated guess and I am glad to be corrected on this point.

Peter · Post by **Peter** » Thu Nov 22, 2018 7:14 pm

Nasta wrote:Taking into account what kind of realistic DRAM cycles can be generated from a 48MHz clock (and I did not go into a deep analysis), we can calculate that we can squeeze 133333 access cycles into one 60Hz refresh period. This would be the theoretical maximum bandwidth only for the 68020. Since we need 32 such access cycles per display line and there are 768 display lines within one 60Hz refresh period, this means we need to take 24576 cycles for video generation. A quick division gives us an 18.43% bandwidth taken for video. The need to express all timings in terms of 48M clock cycles imposes quite penalty compared to a quick theoretical calculation.

The turning point is "such access cycles" of 125 ns for just one video read. While I'd assume two video reads in 166ns worst case. With the -60 speedgrade you mentioned, I still see two video reads in 125 ns (RASL low, keep, CASL low, CASL high, CASL low, RASL+CASL high, in 20.8ns steps). PLD timings for DRAM are much easier than taking 68020 and PLD into account. I might have overlooked a DRAM detail of course - just did a 3 minutes check.

Nasta wrote:It should be noted that using a FPGA instead and being able to set up larger buffers for video data inside it, opens up a whole new world of possibility, even more so if SDRAM is used, like on the Q68 - quite probably one could do Q68 graphics modes without the 'SGC replacement' being any slower than the actual SGC, probably even a bit faster.

Since 5V tolerant FPGA are no possibility, this would break the idea to "just enlarge a EP1810 replacement PLD".

It makes little sense to surround the whole FPGA with level shifters - so once an FPGA is there, I think the old SGC circuitry is almost completely dead.

Nasta wrote:I might be wrong but it seems to me that Tetroid is for now limited to just using the JEDEC file to program the device, rather than being able to compile the logic from the source code, at least this is my (un)educated guess and I am glad to be corrected on this point.

I just guessed Tetroid has a lot of motivation to get rid of the EP1810, so I brought up the question of adding video as a "side effect". Maybe I was too optimistic to even consider it. Maybe not... Tetroid did not join the discussion - which could also mean that he is doing QL work while I just talk

Peter

Giorgio Garabello · Post by **Giorgio Garabello** » Fri Nov 23, 2018 7:28 am

I followed the discussion with great interest (and some difficulty).
First of all, sincere congratulations to everyone: this is the most important discussion of the last few years.
I do not have the skills to give a strictly technical opinion, I limit myself to making some general considerations.

Different people have proposed different things, there are those who would like a total backward compatibility of an enhanced supergoldcard and an extremely fast machine. (I really like the last idea)

Whatever you do is still a good thing, but I think you have to choose: either you produce something "retro" or you do something completely new.
Either you make a "nostalgia operation" or try to produce a modern machine. They are two different tastes and interests. The important thing is not to create a hybrid that would make both of them unhappy.

Just my two cents

Giorgio Garabello

pjw · Post by **pjw** » Fri Nov 23, 2018 9:27 am

Giorgio Garabello wrote:..but I think you have to choose: either you produce something "retro" or you do something completely new.
Either you make a "nostalgia operation" or try to produce a modern machine. They are two different tastes and interests. The important thing is not to create a hybrid that would make both of them unhappy.

Well said, Giorgio! A good compromise IMHO would be a high spec machine: You can drive a Ferrari at 30MPH, but my old banger wont do 250MPH unless you drop it from a helicopter.

Peter · Post by **Peter** » Fri Nov 23, 2018 11:56 am

Giorgio Garabello wrote:Either you make a "nostalgia operation" or try to produce a modern machine.

The problem with only making a "modern" machine is, that you are in danger of losing those users who are attracted by nostalgia. Many of them will never look at a "modern" machine, if they are not pulled toward the QL scene by something "nostalgic" they relate to, and which works without too much frustration.

Strategically, I found it sometimes necessary to interrupt "modern" projects, just to keep the QL scene alive. Those are difficult decisions, because interrupting work costs so much extra time and is very inefficient. Let me provide one example: My pioneering work to bring SD card technology to the QL. That work took away years from my "modern" Q68 project!

At first I designed a simple parallel port hardware, so I can access something from QL side at all. Then I wrote the low-level software for SD card initialization, read, write, etc. Then I planned a strategy to make it interoperable with PCs and Q40/Q60 also. Which included usage of FAT32 container files, and again I had to write the QL-side software for FAT32 myself. Next I needed a completely working QL driver, which I created from the QL-HD driver by adding the SD card and FAT32 routines. Next step was to make PCs also support the same container files, which was done by writing a format-independent interface for UQXL and Q-emulator to get things going without full emulator support for a specific filesystem (it was not yet clear which of several QL filesystems would eventually be possible). After that, I designed the final QL-SD hardware. My QL-SD invention itself, re-using the microdrive slots while adding an internal ROM adaptor finally came to life, but only as a last step in a long operation!

Even after my pioneering was done, and others took over with their great work, QL-SD continued to consume much time. But look at it now: SD cards can be used on all machines from QL, GoldCard, Q40, Q60, Q68 to all emulators. That helped keeping the QL scene alive. Not possible without a "nostalgia operation" for the QL.

In my initial post I wrote: "Is a SGC successor for the original QL case still important to keep the QL scene alive?"

Yes, it is a "nostalgia operation" that takes away time from "modern" projects, but we may need it. And I opened this topic to find out to what extent we do.

Peter

Giorgio Garabello · Post by **Giorgio Garabello** » Fri Nov 23, 2018 2:29 pm

I have nothing against the "nostalgia operation". Mine was a point of attention in not creating middle ways that nobody likes.

There is also to say that depends a lot on the perception of the single person, for many Q68 and 'modern, for me it is retrocomputing. they are "sensations".

Giorgio

Peter · Post by **Peter** » Fri Nov 23, 2018 6:13 pm

In a strict sense, everything QL related is retrocomputing. Even the 3 x 68060 speed machine Per prefers could never compete with modern computers, simply for lack of developers and time. Same goes for SMSQ/E. All we do will never catch up.

That's why I placed "modern" in quotation marks. But even by QL standards, there is sort of a timeline between QL (total nostalgy) ... GC ... QXL ... SGC ... Q40 ... Q60 ... Q68 ("modern"). That's what I meant.

Nasta · Post by **Nasta** » Sat Nov 24, 2018 11:07 am

Peter wrote:
Nasta wrote: Taking into account what kind of realistic DRAM cycles can be generated from a 48MHz clock (and I did not go into a deep analysis), we can calculate that we can squeeze 133333 access cycles into one 60Hz refresh period. This would be the theoretical maximum bandwidth only for the 68020. Since we need 32 such access cycles per display line and there are 768 display lines within one 60Hz refresh period, this means we need to take 24576 cycles for video generation. A quick division gives us an 18.43% bandwidth taken for video. The need to express all timings in terms of 48M clock cycles imposes quite penalty compared to a quick theoretical calculation.
The turning point is "such access cycles" of 125 ns for just one video read. While I'd assume two video reads in 166ns worst case. With the -60 speedgrade you mentioned, I still see two video reads in 125 ns (RASL low, keep, CASL low, CASL high, CASL low, RASL+CASL high, in 20.8ns steps). PLD timings for DRAM are much easier than taking 68020 and PLD into account. I might have overlooked a DRAM detail of course - just did a 3 minutes check.

60ns DRAM is just at the limit for not being able to use a single cycle CAS low, CAS high, so two cycles have to be used - and in fact taking into account the setup and hold parameters it becomes even more justified.
I was a bit lucky because I did do a more precise calculation for the 020/030. Also, because the CPU uses a whole number of cycles at 1/2 the CPLD, all timing has to be based on even number of clock cycles.
One thing which frustrates me is that some CPU timing specifications are so atrociously specified. The CPU uses both clock edges, but some delays are specified to be anything between almost zero and the entire half clock cycle?! I am sure it's not that bad for real but... oh well.
All of these conspire for extra cycles, so you end up with 6 for the CPU access (and this is VERY close to the limits), and it would be 10 for a double access using fast page mode. I surmised it's not much of an improvement so perhaps one could capitalize on logic simplification if both CPU and screen used the same basic cycle.

Nasta wrote: It should be noted that using a FPGA instead and being able to set up larger buffers for video data inside it, opens up a whole new world of possibility, even more so if SDRAM is used, like on the Q68 - quite probably one could do Q68 graphics modes without the 'SGC replacement' being any slower than the actual SGC, probably even a bit faster.
Since 5V tolerant FPGA are no possibility, this would break the idea to "just enlarge a EP1810 replacement PLD".
It makes little sense to surround the whole FPGA with level shifters - so once an FPGA is there, I think the old SGC circuitry is almost completely dead.

Agreed... though shifters can be optimized, fortunately there are 32-bit wide ones in manageable cases, but PCB layout would need a LOT of attention. That being said, once one does the bill of materials and factors in the work, unless you really want to put in some decent fearures to offset it, and make it worth for the user. There are some setups where it would likely be fine. On the other hand, if integrating a 68k core into the FPGA is a possibility, it becomes a very different project.
One BIG advantage in that case is that the memory bus can be completely decoupled from the expansion bus and dedicated to accessing memory as fast as possible.

Nasta wrote:I might be wrong but it seems to me that Tetroid is for now limited to just using the JEDEC file to program the device, rather than being able to compile the logic from the source code, at least this is my (un)educated guess and I am glad to be corrected on this point.
I just guessed Tetroid has a lot of motivation to get rid of the EP1810, so I brought up the question of adding video as a "side effect". Maybe I was too optimistic to even consider it. Maybe not... Tetroid did not join the discussion - which could also mean that he is doing QL work while I just talk
Peter

Hehe, me too (or rather write)

The Sinclair QL Forum

SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming

Re: SGC successor brainstorming