Peter wrote:Nasta wrote:The situation is somewhat better using more modern, or I should say, less old
DRAM chips with faster access time (60ns), but a back-of-the-envelope calculation says it would slow down DRAM access by about 20-30% which is rather bad just to get the old stuff.
Okay let's do a two-line forum calculation: We need 32 x 32 horizontal data bits for MODE 4. If fetched into 64 bit buffer, i.e. with 2 consecutive reads, that takes 16 x 125 ns = 2 µs per line for old (SGC generation) FPM DRAM. At 60 Hz and VESA 1024x768 that is 9.2% of memory bandwith only, lets say 10% for not exactly matching clock frequencies.
So I was not completely wrong with my idea. The microdrive and net timing loops running from cache would not suffer.
Looking at typical system usage speed, the massively faster video should compensate the small loss of RAM speed, which is further reduced by the 68020 cache. E.g. if Tetroid grabbed a 5V tolerant PLD like the Lattice LC4256 in 144 pin TQFP, he'd have 256 macrocells instead of 48, and all the pins he needs, with the same board space as the EP1810 socket. Maybe even half that PLD macrocells would do, espcially if faster DRAM allows 32 bit buffers. In a first step, the new data and RGB/sync lines could stay unused, just a 1:1 replacement of the original pin functions in a new package.
Remember: I wrote this under the assumption that Tetroid works on replacing the EP1810 anyway, i.e. the logic equations were recovered or rewritten.
I'll try to answer more later.
Peter
Never said the idea was wrong, the question is, is it worth doing QL resolution at that bandwidth cost, or adding VRAM and doing a LOT more graphics and nearly no bandwidth cost. When I say 'a lot', we are talking Q40/60/68 type graphics.
Now, my calculation was based on a 32-bit buffer since wide buses and 'registers' inside segmented PLDs tend to gobble up a lot of resources. It was also based on a 60ns DRAM and the availability of a double clock, at 48MHz. 24MHz this was chosen on the SGC rather than 25 because it's a small performance hit compared to maximum but relaxes a bit some timing and also is re-used for the floppy controller chip, and I put a double speed oscillator to be able to better target DRAM timing. Also, 60ns DRAM was used as a template. What you get is you can just about squeeze out 0 wait state operation for the 68020 at 24MHz, but it is so tight that PCB layout would have to be superb. Even so, let's go with this assumption.
Due to the fact that timings have to ultimately be expressed as a whole number of 48MHz clock cycles, fast page mode access of the DRAM shows fairly little improvement until you get to 3 consecutive long words. Surprisingly the situation is better in relative terms with the actual 80ns DRAM as used on the SGC but then the 68020 already takes a 1 wait state hit.
Taking into account what kind of realistic DRAM cycles can be generated from a 48MHz clock (and I did not go into a deep analysis), we can calculate that we can squeeze 133333 access cycles into one 60Hz refresh period. This would be the theoretical maximum bandwidth only for the 68020. Since we need 32 such access cycles per display line and there are 768 display lines within one 60Hz refresh period, this means we need to take 24576 cycles for video generation. A quick division gives us an 18.43% bandwidth taken for video. The need to express all timings in terms of 48M clock cycles imposes quite penalty compared to a quick theoretical calculation.
Even this is not a big problem, given that on the SGC, there is a 1 cycle wait state at 24MHz (as far as I can tell, possibly 2?) so that in itself would mean a 33333 cycle penalty out of the 133333 ideal number, so greater than what was calculated for video in the above example - such an improved SGC would actually be somewhat faster than the current one.
Now, that being said, the above calculation has some key factors that make it optimistic:
1) 60ns DRAM not the original 80ns, though this would not be a problem since nowadays if you find DRAM at all it's likely to be latest generation and that fast, or sometimes even faster.
2) Availability of double clock. Without this, the basic unit of time used to generate DRAM timing in the logic is too large and the DRAM bandwidth would be used quite poorly.
3) No further penalties were calculated, such as arbitration between screen and CPU accesses, which always go to the detriment of the CPU, it is the only one capable of inserting wait states - and also some few further cycles may be needed for DRAM refresh, though this may probably be avoided just by properly multiplexing address lines, and have the repeating screen area reads used as refresh. It
4) Complete and efficient rewrite of the INGOT logic. This is quite important because the EP1810 CPLD, while having only 48 macrocells, has quite extensive routing options for them (much more than most todays CPLDs) so resources that would remain orphan in more modern CPLDs (=not being able to be used because signal routing options have been exhausted) can be reused in it. While modern CPLDs have a LOT more resources, segmenting and the addition of a 32-bit bus to the CPLD would require a hefty re-hash of the logic, which might require other compromises. Directly emulating the INGOT would not get the most optimal logic implementation for a newer CPLD.
Also, one factor is responsible for the calculation possibly being pessimistic, and this is using a CPLD so implementing a more clever buffering scheme that would result in even less bandwidth penalty, however this is probably not feasible, as it would require twice as large a FPGA just for the buffering.
Finally, keep in mind that the calculation is from a 'maximum theoretical speed we can get out of a 68020 at 24MHz given 60ns DRAM', not 'actual SGC' baseline.
So, therefore my original guesstimate. Just re-using SGC assumptions (80ns DRAM, 24MHz clock) gets us to my original figures and likely much worse.
Of course, the point about cache and the 020 being a bit more clever in managing wait states is well made.
It should be noted that using a FPGA instead and being able to set up larger buffers for video data inside it, opens up a whole new world of possibility, even more so if SDRAM is used, like on the Q68 - quite probably one could do Q68 graphics modes without the 'SGC replacement' being any slower than the actual SGC, probably even a bit faster.
BTW I have looked into the source for the INGOT20. Unfortunately the file seems to be corrupt, but the gist of it can be figured out. Some of it is very clever, as one would no doubt expect given it's designers. However, without a manual for the EP1810 and the compiler used to produce the programming code, it is very difficult to figure out some important parts - for instance, that CPLD makes it possible to use a pin for combinatorial (and possibly even latched) input or output while it's associated macrocell with more advanced functions becomes burried and can be used independently. However, this is not declared by using syntax of the 'programming language' but rather explicitely, in the definition of the pin/macrocell configuration, 'by hand' as it were. The source code uses an abbreviated (single character per option select) form of declaration so it's difficult to be sure for some of the cells how they are configured. It would perhaps be simpler to completely re-create the logic from scratch, knowing what it's supposed to do. I might be wrong but it seems to me that Tetroid is for now limited to just using the JEDEC file to program the device, rather than being able to compile the logic from the source code, at least this is my (un)educated guess and I am glad to be corrected on this point.