Q68 speed vs 680X0

Nagging hardware related question? Post here!
FrancoisLanciault
Trump Card
Posts: 167
Joined: Mon Aug 08, 2011 11:08 pm

Q68 speed vs 680X0

Post by FrancoisLanciault »

Hi all,

This topic is copied from the "SGC successor brainstorming" thread in order not to deviate to much from the initial subject.

The following is the original post, followed by some additional thoughts by myself.

Original post
------------------------------------------------------------------------------------------------------------------

Hi Peter and all,

I have benchmark the Q68 (SRAM area) with a 68030 @ 56mhz (not 50mhz as I first thought) The results are surprising!

The benchmark program is pure 68000 machine code. It is a routine that compute the number of prime numbers smaller than the input value. It is not the fastest algorithm by far but it uses no memory, everything is computed using the D0 to D7 68000 registers. It is however much faster than checking all numbers for primality! The program is 210 bytes long so fits without problem in the SRAM area of the Q68. I can supply the code/listing if anyone wants to do more tests.

This is slightly OT so please feel free to start another thread if you wish to comment.

Here are the results. Again all computers use the same base 68000 assembly code. All QL based computers were running SMSQ\E. The Q68 and SGC (Aurora) where displaying a normal mode 4, 512x256 display.

For an input value of 2000000

Gold Card:
Processor: 68000 @ 16 Mhz
Time: 660 secs

SuperGoldCard
Processor: 68020 @ 24 Mhz
Time: 189,9 secs

Q68 (normal ram)
Processor: FPGA 40? Mhz
Time: 136,4 secs

Q68 (static RAM)
Processor: FPGA 40? Mhz
Time: 47.8 secs

Amiga 1200 (with accelerator board)
Processor: 68030, 56 Mhz
Time: 71,0 secs

NeXT Station Turbo
Processor 68040, 33Mhz
Time: 66.7 secs

*************** The Q68 beats them all !!! :-) ******************

The 68030 processor in the Amiga is no slouch. Here are the specification of the accelerator card for those who can understand such things as burst timing (source: big book of amiga hardware)

Individual Computers ACA-1230 introduced in 2010

CPU: 68EC030 @ 28MHz or 68030 @ 42 / 56 MHz, PGA
all processors are slightly overclocked to allow for a synchronous board design, the nominal speeds are 25 / 40 / 50 MHz
no FPU option
very fast burst timings: 2-1-1-1 (28 MHz), 3-1-1-1 (42 / 56 MHz) memory
64 MB SD-RAM, soldered to the board

memory clock: 56 MHz for 28/56 MHz versions, 42 MHz for 42 MHz CPU

the first processor card to feature a -1-1-1 burst synchronous design
no FPU option as this would have caused too much load on the data bus and would have increased burst timing

François

-----------------------------------------------------------------------------------------------------------
End of original post.

The program is so small it must run 100% in the 68020+ caches. Q68 speedup should be even more significant if the program is > 680X0 cache size but < Q68 SRAM size.
The Q68 SRAM running speed is faster than I would have expected. That brings the Q68 well into the 68040 leagues.
That same program can be coded to take advantage of the 680X0 specific instructions. As there is a lot of 32bits multiplications and divisions, the speed gain would be significant for those processors.

Well done Peter.

François


User avatar
Peter
QL Wafer Drive
Posts: 1948
Joined: Sat Jan 22, 2011 8:47 am

Re: Q68 speed vs 680X0

Post by Peter »

Hi François,

thanks for the interesting results. You can remove the question marks at 40 MHz for the Q68 CPU, it is the actual clock rate. I have a question:

Since we came from the SGC successor discussion, where a 50 MHz 68EC030 was considered versus a Q68 with (yet unfinished) cache:
Could you provide the Amiga 68EC8030 accelerator board result with 28 MHz? At this clock rate, it supports an ideal 2-1-1-1 cycle burst which is the fastest possible operation of a 68EC030. If we multiply that by 0.56 we should get the figure for a perfect 50 MHz 030 system.

Remarks:

The relatively simple Q68 cache I'm working on, does not support DRAM burst cycles, which is less efficient than the 68EC030. On the other hand, it would be at least 4 times larger in size (possibly 32 times larger if I sacrifice the fixed SRAM area and use the resources for cache instead). As an additional advantage, Q68 cache would be organized in 16 bit short words, not in lines of 4 x 32 bit long words, giving a 16 times higher granularity, which might suite QL style code very well. If I can complete my work, it will be very interesting how this compares to 030 cache in "real life".

It is possible to provide a 68020 style CPU inside the Q68, which would allow a comparison with 32 bit MUL/DIV. I decided for the 68000 because of the higher compatibility for old "retro" software and the smaller size, which leaves room to add other features in a possible upgrade. Let me add that the CPU inside the Q68 is open source and my achievement is only to debug it, and to optimize system integration and timings, so even the relatively old Q68 FPGA can run it at 40 MHz.

Peter


User avatar
tofro
Font of All Knowledge
Posts: 2679
Joined: Sun Feb 13, 2011 10:53 pm
Location: SW Germany

Re: Q68 speed vs 680X0

Post by tofro »

François,

could you possibly post your benchmark program here, so we could give it a run on other platforms?

Thanks,
Tobias


ʎɐqǝ ɯoɹɟ ǝq oʇ ƃuᴉoƃ ʇou sᴉ pɹɐoqʎǝʞ ʇxǝu ʎɯ 'ɹɐǝp ɥO
User avatar
Peter
QL Wafer Drive
Posts: 1948
Joined: Sat Jan 22, 2011 8:47 am

Re: Q68 speed vs 680X0

Post by Peter »

François,

and out of curiosity: Do you own a NeXT Station Turbo (rare machine) or did you just read the figure somewhere?

Peter


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: Q68 speed vs 680X0

Post by Nasta »

68000, Q68 FPGA implementation, SRAM @40Mhz; Time: 47.8 secs
68030, @56Mhz, we can assume execution from cache; Time: 71,0 secs
68040, @33Mhz, we can assume execution from cache; Time: 66.7 secs

This is a VERY interesting comparison.

Let's start with a comparison between 030 and 040 - It has been postulated many times that the 040 is a highly integrated (MMU + FPU) clock doubled 030 with larger caches. Given that the benchmark is so small it runs from the cache entirely, we can assume the fastest execution, or close enough. To test the clock doubled theory, lets assume that the 040 @33MHz is a 030 @66MHz. To compare, let's scale the 030 result to 66MHz, and lo an behold we get:
68040 @33MHz - 66.7 secs vs 68030 @66MHz - 60.24 secs. This pretty much proves the thesis, except that a 68030 at twice the 040 clock is actually 10% faster!

Now the Q68 - The 030 approaches 1 instruction per 2 cycles since the whole benchmark fits inside the cache. If we scale it's result to 80MHz (to match 2x 40MHz of the Q68), to compare with Q68 running from SRAM at near 1 instruction per cycle, we get 49.7 secs, so the Q68 is actually faster by about 4% when running from full speed memory. That is an excellent result.

Of course the 4% and 10% differences may well be initial cache loading, but the figures are close enough to do a valid comparison.
Last edited by Nasta on Mon Nov 26, 2018 9:54 pm, edited 3 times in total.


FrancoisLanciault
Trump Card
Posts: 167
Joined: Mon Aug 08, 2011 11:08 pm

Re: Q68 speed vs 680X0

Post by FrancoisLanciault »

Peter wrote:François,

and out of curiosity: Do you own a NeXT Station Turbo (rare machine) or did you just read the figure somewhere?

Peter
I own all the machines that were benchmarked in my post. I wrote the benchmark program myself.

I also have 6 other 680X0 based machines that were not benchmarked. Sadly, this is not a joke...

My 680X0 based machines:

Amiga 500
Amiga 1000
Amiga 1200

GC Sinclair QL
SGC Sinclair QL
Q68

Machintosh SE

Next Station 25mhz
Next Station Turbo 33 mhz

HP-9816
HP-9817


François


FrancoisLanciault
Trump Card
Posts: 167
Joined: Mon Aug 08, 2011 11:08 pm

Re: Q68 speed vs 680X0

Post by FrancoisLanciault »

tofro wrote:François,

could you possibly post your benchmark program here, so we could give it a run on other platforms?

Thanks,
Tobias
Ok I will do that tonight. QL ready to run version or source ?

François


User avatar
Peter
QL Wafer Drive
Posts: 1948
Joined: Sat Jan 22, 2011 8:47 am

Re: Q68 speed vs 680X0

Post by Peter »

QL ready to run for me, please. I'm very curious how the 80 MHz Q60 performs.


FrancoisLanciault
Trump Card
Posts: 167
Joined: Mon Aug 08, 2011 11:08 pm

Re: Q68 speed vs 680X0

Post by FrancoisLanciault »

Peter wrote:Hi François,

Since we came from the SGC successor discussion, where a 50 MHz 68EC030 was considered versus a Q68 with (yet unfinished) cache:
Could you provide the Amiga 68EC8030 accelerator board result with 28 MHz? At this clock rate, it supports an ideal 2-1-1-1 cycle burst which is the fastest possible operation of a 68EC030. If we multiply that by 0.56 we should get the figure for a perfect 50 MHz 030 system.

Peter
Peter,

Individual Computers were offering three flavor of the 1230 accelerator, one at 28mhz, one at 42mhz and one at 56mhz, depending on the 68030 chip clock rate that was installed (and how much you were willing to pay).

I don't know if my 56mhz board can be operate at 28mhz. Maybe there is a jumper somewhere to chose the frequency. I will check.

François


User avatar
Peter
QL Wafer Drive
Posts: 1948
Joined: Sat Jan 22, 2011 8:47 am

Re: Q68 speed vs 680X0

Post by Peter »

Nasta wrote:On the other hand, 040 is kind of like a clock doubled 030. Let's test this by scaling up the 030 results from 56 to 66MHz (33MHz on the 040, x2) We get 65.33secs.
Interesting conclusion and amazingly close to the measured figure...


Post Reply