Fast plotting (and unplotting) sprites / Q68

Anything QL Software or Programming Related.
User avatar
Zarchos
Trump Card
Posts: 152
Joined: Mon May 08, 2017 11:49 am

Fast plotting (and unplotting) sprites / Q68

Post by Zarchos »

Hi all.

I start this thread, which in a way is a follow up to this one : http://qlforum.co.uk/viewtopic.php?f=3&t=2190


I propose we think together about what could be the fastest way to fast plot and fast unplot sprites,
on the Q68.
I have developped some fast routines on the Archie, which in a way is a machine very close to the Q68.
Why ?
Well, both share the following :
- Very fast CPU
- 'Chunky' or 'linear' screen modes, where 1 byte, or 2 consecutive bytes, define a pixel on screen, and
the screen memory layout is 'easy' or 'logical' to understand, meaning :
what you see displayed on screen is the reading of the screen memory area, with top left corner corresponding
to data read from the 1st address of the memory dedicated to screen memory, and reading consecutively in memory,
the entire visible screen is built linearly, from left to right, reading in memory linearly the contents of the memory,
until the last pixel is displayed (bottom right), corresponding to the last byte, or 2 consecutive bytes, of the screen memory area.

- The Q68 has the advantage of hi colour screen modes, and also
° A lot of memory
° An extra super fast area of memory .... Very, very, very promising ... it will have to be used, over used, intensively, until Death ;-)

Please read this to understand the method I have used :
http://www.stardot.org.uk/forums/viewto ... 13+sprites
On the Archimedes, mode 13 is 320 x 256, 8 bit per pixel (and linear, or 'chunky' as are all modes on this machine)
Basically :
a sprite is 1st fully scanned to get all its infos describing it inside the rectangle it is fitted into.(With a tool in BASIC).
Its visible pixels are extracted for each of its lines.
These infos are what will be used by your game or demo, and they are stored on disk.

For each possible length of pixels (I did from 1 to 384), in your game or demo, there exists the ASM code, fastest possible, to load the pixels data and plot them on screen.
Yes, we are working with generated code, but my idea is to have the code generator in your game or demo, so that, when you know in a few frames you will need to display this or that sprite, you call your generated code generator, so
that the creation of the shifted data, and the actual ASM instructions to plot your sprite, segment by segment, and line by line, is 'sliced' on a series of frames, using the available CPU cycles you have got, left per frame. ( This way your action on screen will not suffer from framerate dropping while generating the opcodes ).
I have chosen that the smallest granularity of calling the code generator is creating the opcodes to deal with one horizontal segment of sprite ( understand a sprite can have many horizontal segments on a given scanline if there exists some transparent pixels on its scanline : between consecutive segments of pixels ).

The unplotting side ( so far I am unhappy with the algo I have ) is thought to also generate the ASM instructions, to partially delete your sprite ( restoring the background by reading pixels from the area of memory your background, unaltered, is stored) that is : only what is necessary, representing the visual difference of the seen background, between the previous displayed frame, with this next frame to display.

I hope you will enjoy the idea of sharing some ideas with me to decide what can be the fastest possible 68000 mnemonics to use.
I have spent a lot of time on the Archie to find the best ARM instructions to do so.
You can see the results ( well, only the plotting algos ) on my YT channel, if you search for 'WIP' or 'POC' ( for Proof Of Concept ) ***** in my uploaded videos.

Most tools I have used are in BBC BASIC ( I have only ever learnt BASIC on the Speccy ), very crude and not sophisticated at all, their conversions to Super Basic should be very easy.

My knowledge of 68000 is far too poor to do that alone, and well I believe it can be a great community project.
Everything we'll do will be freely usable by everyone, be it for PD software, or commercial usage, as long as there is a 'thank you' in both the software,
and / or the accompanying manual / documentation, with the names of all contributors.

Kindest regards,
Xavier.


***** : Much easier, I have also coded on the Archies what I believe are the fastest ever horizontal segments filling routines for example for 3D ( if shapes are big enough, ie not suitable for a game like Zarch, but most certainly for a game like Chocks Away or Chopper Force. Perfect also for a Bad Apple demo. )
Adapting them to the Q68 should be very easy, as there is no need of of a code generator at all.
https://www.youtube.com/results?search_ ... g+routines

Still uncomplete, I also have some routines for 1 load N store of sprites, and 'see through like through a window' routines.
Great for fast 2D arcade games, or demos.
Something, once ported to the Q68, which could be used very easily by everybody, as for all of the above, the calling routines are to use with only a handful of parameters.


Owner of various QLs including accelerated beasts, and also a happy Q68 owner ;)
Now porting SOTB to the Archies, to then port it to the Q68.
https://www.youtube.com/user/Archimedes ... +%28100%25
Derek_Stewart
Font of All Knowledge
Posts: 3929
Joined: Mon Dec 20, 2010 11:40 am
Location: Sunny Runcorn, Cheshire, UK

Re: Fast plotting (and unplotting) sprites / Q68

Post by Derek_Stewart »

Hi Xavier,

What sort of sprites are you talking about?

There are sprite facilities in the extended environment.


Regards,

Derek
User avatar
Zarchos
Trump Card
Posts: 152
Joined: Mon May 08, 2017 11:49 am

Re: Fast plotting (and unplotting) sprites / Q68

Post by Zarchos »

Derek_Stewart wrote:Hi Xavier,

What sort of sprites are you talking about?

There are sprite facilities in the extended environment.
Sprites : that is, any kind of graphics, whatever heigth or width, whatever the shape, that you intend to plot and possibly move on screen, like, for example in a game like r-type : your ship, enemy ships, weapons, animated explosions.
It can also be text characters, whatever.
Unless I have missed something in the specifications of the Q68, there are no hardware sprites plotting facilities, so, every thing you want to plot and move on screen (without altering the background 'picture') has to be done by the CPU.
The goal of my routines is to be the fastest possible for the Q68. On the Archie, the technique I have used is at least 4 times faster than the commonly admitted optimised way to plot sprites on screen, in a background with a content.
Main reasons are :
- no loading of the background content, except very rare cases
- no loading and usage of sprite mask (to make a 'hole' in the background to later insert your sprite)
- algo avoids dealing with transparent pixels inside or surrounding your sprite

To make a long story short : to plot a sprite what you want is plotting on screen all the horizontal segments of pixels composing the sprite, no ?
Well : it is exactly what I do : I load them, I plot them, and nothing more ;-)
KISS principle, in a way.
So yes, it is fast.


Owner of various QLs including accelerated beasts, and also a happy Q68 owner ;)
Now porting SOTB to the Archies, to then port it to the Q68.
https://www.youtube.com/user/Archimedes ... +%28100%25
User avatar
tofro
Font of All Knowledge
Posts: 2685
Joined: Sun Feb 13, 2011 10:53 pm
Location: SW Germany

Re: Fast plotting (and unplotting) sprites / Q68

Post by tofro »

No, there is no hardware sprite support in the Q68 (as well as in any QL compatible I know).

And yes, there is software support for sprites in the extended environment, "software support", however, is a relative term. Naturally, the EE implements sprites for a WIMP, and not for a game, so it is neither optimized for speed nor for specific game purposes. The only fast-moving screen object a WIMP needs is the mouse pointer ;) The EE also needs to be generic, as it needs to support basically all known graphics modes, while a game library could easily set other priorities (like trade memory footprint for speed) that the EE cannot.

A fast-drawing sprite library for the QL and lookalikes would be something useful, in my opinion.

Whether a sprite library that uses self-modifying code would necessarily be faster than a library that does it more "in a traditional way" is something up for experiments. Self-modifying code is, however, traditionally a no-no in QDOS and SMSQ/E as it collides with multi-tasking (QDOS considers code a potentially shared resource between jobs) - Games could be an exception - They are typically not run in several instances. On a standard QL with its weird bitplane layout, I'd doubt a bit such an approach would help, on a Q68 with its very linear screen memory, I suppose this approach could be faster because you can dynamically unroll loops in SMC which can give you a speed boost.

The 68000 is, however, very different from the ARM in the Archie, one being the high point of CISC development, the other being one of the first RISC architectures. So anything that's valid for ARM doesn't necessarily need to be true for m68k (like the ability for byte, word, or longword access, for example). The 68k with its way more complicated (and comfortable) instruction set and addressing modes might make SMC-based sprite routines much more complicated and at the same time might benefit much less from them than the ARM (just an assumption). A wide field for experiments opens up.

Tobias


ʎɐqǝ ɯoɹɟ ǝq oʇ ƃuᴉoƃ ʇou sᴉ pɹɐoqʎǝʞ ʇxǝu ʎɯ 'ɹɐǝp ɥO
User avatar
Zarchos
Trump Card
Posts: 152
Joined: Mon May 08, 2017 11:49 am

Re: Fast plotting (and unplotting) sprites / Q68

Post by Zarchos »

tofro wrote:No, there is no hardware sprite support in the Q68 (as well as in any QL compatible I know).

And yes, there is software support for sprites in the extended environment, "software support", however, is a relative term. Naturally, the EE implements sprites for a WIMP, and not for a game, so it is neither optimized for speed nor for specific game purposes. The only fast-moving screen object a WIMP needs is the mouse pointer ;) The EE also needs to be generic, as it needs to support basically all known graphics modes, while a game library could easily set other priorities (like trade memory footprint for speed) that the EE cannot.

A fast-drawing sprite library for the QL and lookalikes would be something useful, in my opinion.

Whether a sprite library that uses self-modifying code would necessarily be faster than a library that does it more "in a traditional way" is something up for experiments. Self-modifying code is, however, traditionally a no-no in QDOS and SMSQ/E as it collides with multi-tasking (QDOS considers code a potentially shared resource between jobs) - Games could be an exception - They are typically not run in several instances. On a standard QL with its weird bitplane layout, I'd doubt a bit such an approach would help, on a Q68 with its very linear screen memory, I suppose this approach could be faster.

The 68000 is, however, very different from the ARM in the Archie, one being the high point of CISC development, the other being one of the first RISC architectures. So anything that's valid for ARM doesn't necessarily need to be true for m68k (like the ability for byte, word, or longword access, for example). The 68k with its way more complicated (and comfortable) instruction set and addressing modes might make SMC-based sprite routines much more complicated and at the same time might benefit much less from them than the ARM (just an assumption). A wide field for experiments opens up.

Tobias
Well said Tobias !
It is why I think the study will be interesting.
I propose we start early 2018.
In the meantime I should have read the MC68000 assembly language book I have started.
I am very confident even if the inner guts of the ARM code will be different on the Q68, the ideas used in the algos are still pertinent.
To me it is going to be a win-win 'deal' : you'll see the (very easy to understand) ARM instructions, and in return I'll get a better knowledge of MC68000.
Furthermore it will be free for the community, and I hope a great incentive, or at least an important brick, for demos makers, or games makers.

Once done, I believe porting Pacmania to the Q68 could be a nice next step, using the hires colour mode.
The logic in a game like Pacmania isn't complex, and I think it should be the Archie version the easiest to disassemble, understand, and convert to the Q68.
You could find that strange, but compared to all other MC68000 versions which use specific hardware ressources ** the Q68 hasn't got (video dma, h-sync, hardware sprites) the Archie does everything 100% software ; it means, to me, it is the perfect candidate for an enhanced port to the Q68.

** Or no hardware ressources at all ( or next to nothing ) like on the Atari ST, but the gameplay area is too limited : using this code would imply a lot to rewrite, contrary to the ARM2 version.


Owner of various QLs including accelerated beasts, and also a happy Q68 owner ;)
Now porting SOTB to the Archies, to then port it to the Q68.
https://www.youtube.com/user/Archimedes ... +%28100%25
User avatar
tofro
Font of All Knowledge
Posts: 2685
Joined: Sun Feb 13, 2011 10:53 pm
Location: SW Germany

Re: Fast plotting (and unplotting) sprites / Q68

Post by tofro »

When faced with the problem to shove bytes/words/longwords between contiguous and non-contiguous space (what is what sprite routines do) on the 68k, the difficulty actually is not "find the fastest instructions" to do that - under ideal conditions (that is: all registers free to use, amount of memory (sprite byte count horizontally) evenly divisable by registers free * 4), this is undisputably something like

Code: Select all

    ; a0 pointing to sprite data, a1 points into screen memory
    movem.l (a0)+,d0-d7/a2-a6
    movem.l d0-d7/a2-a6,(a1)
    adda.l #linelen,a1			; linelen is the constant diff between screen lines
That piece of code moves 52 bytes of sprite data at (a0) into the screen at (a1) and also adjusts the pointers for the next sprite and screen line with basically only 3 instructions - That is what we want to be able to use under "ideal conditions" (I'd guess ARM needs some more instructions to do that...).

The point on the 68k is rather: How much of the penalty (i.e. register saving, setting up data in a form suitable for such a subroutine) are we willing to pay and where's the point in doing that when the penalty is more expensive than the gain over a very straightforward (maybe dumb)

Code: Select all

	; a0 pointing to sprite data, a1 points to screen memory, d0 holds the pixel count -1
loop: move.w (a0)+,(a1)+
	dbra d0,loop
The latter piece of code is easily faster than the former for smaller pixel counts. The setup for the former only pays off for vast chunks of memory sized in multiples of 52 bytes. Obviously, there is a lot of in-between space between these two extremes.

The point I'm up to is rather: With a CPU that has such a high code density as the 68k, it might just not be worth generating code on the fly - you can easily preallocate all possible various combinations of code in your fixed program space (re-write routine A to all possible line lengths, and maybe even sprite heights) and call them on the fly - memory is plenty on the Q68. And that is probably what you want to find out. To me (but that's just a guts feeling atm, I can't prove it) it looks more promising to write assembler macros that expand to the "best code" according to your sprite layout at compile time than generating code at run time.

Tobias
Last edited by tofro on Fri Dec 08, 2017 11:10 am, edited 2 times in total.


ʎɐqǝ ɯoɹɟ ǝq oʇ ƃuᴉoƃ ʇou sᴉ pɹɐoqʎǝʞ ʇxǝu ʎɯ 'ɹɐǝp ɥO
User avatar
vanpeebles
Commissario Pebbli
Posts: 2816
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: Fast plotting (and unplotting) sprites / Q68

Post by vanpeebles »

The Q68 has a small area of fast ram. Maybe that could be used in some way? :)


User avatar
tofro
Font of All Knowledge
Posts: 2685
Joined: Sun Feb 13, 2011 10:53 pm
Location: SW Germany

Re: Fast plotting (and unplotting) sprites / Q68

Post by tofro »

vanpeebles wrote:The Q68 has a small area of fast ram. Maybe that could be used in some way? :)
It definitely can and it definitely will help speeding up the code.

But: The faster RAM only helps speeding up instruction fetch, so can definitely speed up code - sprite routines will alwas shove more memory around in slow RAM than fetch instruction memory - The above three lines have 52 word accesses to "slow" memory and only 7 word fetches from "fast memory" when run from that memory area. So, only 13% of the memory accesses will be sped up by fast memory by roughly 100% (just a guess, and don't trust my math) - code would definitely be faster, but not very much faster. (but every bit counts). Also, the "fast memory" will speed up "bad" sprite code much more than it would be speeding up optimized sprite code, because it speeds up loops and unnecessary instructions, exactly what you want to avoid in "good" sprite routines.

And, to be honest: The Q68 is by far fast enough to produce some juicy games without squeezing out the last bit of speed - We're just talking about making them a bit more juicy ;) The linear video memory in 8-bit and 16-bit modes alone (I'd prefer the former) makes writing fast sprite routines much easier and faster than on the original QL.

Tobias


ʎɐqǝ ɯoɹɟ ǝq oʇ ƃuᴉoƃ ʇou sᴉ pɹɐoqʎǝʞ ʇxǝu ʎɯ 'ɹɐǝp ɥO
User avatar
Zarchos
Trump Card
Posts: 152
Joined: Mon May 08, 2017 11:49 am

Re: Fast plotting (and unplotting) sprites / Q68

Post by Zarchos »

It is true generating code at run time makes more sense when your target machine hasn't got much RAM (like 2 Mbytes for an Archie).
It also means having the need of a crude memory handler, to kill the generated codes of sprites you do not need to display anymore, move what you must keep to free contiguous memory space in order to mount the ASM instructions for the next sprites you want to plot.
On the Q68, everything could already be present in your RAM, loaded once for all from the mass storage media.
It is less elegant, but of course it'll work a treat and it will save hassle !


Owner of various QLs including accelerated beasts, and also a happy Q68 owner ;)
Now porting SOTB to the Archies, to then port it to the Q68.
https://www.youtube.com/user/Archimedes ... +%28100%25
User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: Fast plotting (and unplotting) sprites / Q68

Post by Peter »

My interest would be to see a new arcade game at all. As far I can remember, this has not happened for the QL in about 20 years.

The original QL was already fast enough for arcade games. So a machine which is roughly 20 times faster (Q-Top benchmark) does not technically need highly optimized sprites code for this. But working on such code could be a motivation, a starting point, to actually write a game.

As for the optimization itself: The Q68 "likes" loop unrolling. Branching is relatively slow.

The CPU cycle timer of the Q68 might help optimization. A simple way to measure delays in very high resolution. See User's Manual for details.


Post Reply