Floating a thought to understand the issues...

Nagging hardware related question? Post here!
User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Floating a thought to understand the issues...

Post by Dave »

So, Qdos and its OS babies all multitask using jobs and tasks. These are handled by a task manager and interrupt system to create something that still comes close to the definition of a real time OS. When I say Qdos from now on, I am referring to everything from JS to SMSQ/E.

One problem we have, in terms of performance, is that the 68K family ended* prematurely. Unless we stretch the limits using FPGA implementations of 680X0 running at 100s of MHz, we're not going to gain much CPU power. For our current tasks that's not really a problem. The memory limitation of a practical 128MB is, though.

After listening to others, the only clear way forward that I can see is to have multiple instances of Qdos running on multiple CPUs, each with their own private memory and a common area so they can communicate.

How would you implement this, and what subtle or obvious implications do you see? Is it even a realistic goal?

This is just a theoretical conversation of the options of running Qdos on more parallel hardware.

(I personally think that if we ported the OS to ARM eight years ago when I first suggested it, we'd have a million users by now. I know it's hard, but being a community of under 1000 active users is also hard.)


User avatar
vanpeebles
Commissario Pebbli
Posts: 2816
Joined: Sat Nov 20, 2010 7:13 pm
Location: North East UK

Re: Floating a thought to understand the issues...

Post by vanpeebles »

Dave wrote:(I personally think that if we ported the OS to ARM eight years ago when I first suggested it, we'd have a million users by now. I know it's hard, but being a community of under 1000 active users is also hard.)
Not a chance! Riscos has barely seen an increase in usage and that's native to ARM and all the gazillions of Pi's out there. :lol:


User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Floating a thought to understand the issues...

Post by Dave »

RiscOS is crippled by being a cooperative multitasking system. I will be bringing one piece of Acorn tech to the QL though.


User avatar
Pr0f
QL Wafer Drive
Posts: 1298
Joined: Thu Oct 12, 2017 9:54 am

Re: Floating a thought to understand the issues...

Post by Pr0f »

So you are effectively talking about a hyper-visor layer - similar to what VMware / Hyper-V bring to the PC world.

Running virtual instances of QL's within a much more capable hardware platform.

The drivers at the lower layer need to be much more capable. It would be interesting to virtualize the QL Net though ;-)


User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: Floating a thought to understand the issues...

Post by Peter »

Dave wrote:So, Qdos and its OS babies all multitask using jobs and tasks. These are handled by a task manager and interrupt system to create something that still comes close to the definition of a real time OS.
Definitely not. The concept offers nothing toward realtime applications! It can just be considered a small preemptive multitasking OS.
(Assuming mass storage with background operation - most usual drivers/hardware hinder multitasking severely.)
Dave wrote:Unless we stretch the limits using FPGA implementations of 680X0 running at 100s of MHz, we're not going to gain much CPU power. For our current tasks that's not really a problem.
For file transfer by ethernet, it is a problem. Only the two fastest platforms, Q40 and Q60, are fully up to the task. For Q68 there are limitations, let alone (S)GC.
Dave wrote:The memory limitation of a practical 128MB is, though.
I didn't see practical restrictions even with 16 MB yet.


User avatar
Andrew
Aurora
Posts: 786
Joined: Tue Jul 17, 2018 9:10 pm

Re: Floating a thought to understand the issues...

Post by Andrew »

Dave wrote: Unless we stretch the limits using FPGA implementations of 680X0 running at 100s of MHz, we're not going to gain much CPU power. For our current tasks that's not really a problem. The memory limitation of a practical 128MB is, though.
I'm an analyst, so my first question is always WHY ?
Why would you need more than 128Mb ? What software do we have - or will have in a foreseeable future - that will need more than that ? Or more than 4Mb, that a SGC has ?

I do not want to offend anyone, but it seems to me that the QL is a playground for some very knowledgeable hardware developers. We have a lot of new hardware - but we have very few new software. And, as I see it, a computer with no new software has no future. I might be wrong - and I hope I'm wrong, because I love the QL. But I still feel that we do not need a new architecture or a new operating system, but we need more software for what we already have - and especially expanded QLs (including Q68)


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: Floating a thought to understand the issues...

Post by Nasta »

Well, that's a rather involved question :)

Under QDOS the CPU is virtualized to each job (each sees it as completely available to itself to a very large extent), but memory is not in the strictest sense (that part is in a sense co-operative, as it relies on the CPU being able to address everything relative to a base address).
The virtualization is provided in a time-sliced manner under the control of a scheduler, which gives every job a certain length of time to run before it's suspended for others to run. From the standpoint of the job, it does not know this except through various system calls which can be time limited (that's one of the RTOS elements - most of those were never used).
Now, in principle, it should not be particularly difficult for jobs to run on multiple CPUs, using a central scheduler (although that would actually be a set of schedulers connected by a central data structure) - this is how most current OSs work on multi-core CPUs (and those are actually a notional equivalent to multiple discrete CPUs).
One thing that would be different is a consequence of non-virtualized memory - and in fact to an extent it's a simplification. Under QDOS, multiple CPUs would have a common memory, or at least a part of common memory. Among other reasons, it's because the jobs communicate with each other and the real world through areas in RAM. While jobs can ask the OS for memory to use for whatever, and this could be RAM private to a given CPU, it's not easily discernible how the RAM will be used, to know if it should be allocated from shared RAM (at least as far as I remember).

On the other side of all of this is the part of the OS that has to do with the real world - which is not time sliced and scheduled according to a priority system, but rather driven by real world events and protocols that arise from that. This is not easy to distribute among multiple CPUs, though it can be done, but not completely automatically. It is also a completely different way to exploit parallelism, as handling IO has a lot to do with abstracting real hardware into data structures, and can be handled independently by dedicated CPUs which may not even be 68k. One consequence of this is that general multi-processing (or multi-core) solutions tend to have a rather involved interrupt system, mostly dedicating one CPU to doing most IO tasks, with most interrupts routed to it. This is also usually the 'main' or 'boot' core, which starts the whole computer, in turn working as a single-CPU (or single core) machine until everything is set up fot he scheduler.

Finally, there is one aspect of modern CPUs that makes multiprocessing a bit more complex if it's not catered for in some way by the CPU and system design, in addition to handling interrupts (i.e. indirectly real-world events) and that's memory caching.
Under most multitasking OSs, and also under QDOS, it's possible to run multiple copies of a job using the same code (which than has to be re-entrant, in most cases meaning no assumption about data spaces and absolutely no self-modifying code, most systems that are memory managed do not let this happen anyway), so extra steps have to be taken when a job is started and ended to handle leftover copies of code in the caches as well as leftover copies of code in RAM no longer used by actual jobs.
And of course, all data structures that are accessible by multiple CPUs or cores (or indeed jobs since in this context there is no direct way to know if a job is executing time-scheduled, core-scheduled or indeed, both) and commonly accessible, must not be cached, or, in the more complex way to so things, cache snooped (not always possible nor feasible - this is when a CPU cache becomes common RAM so that when one CPU writes to common memory, the actual contents of all CPU caches that represent that memory are also updated). Given the way QDOS works, cache snooping would most likely not be used or required.

So, what would be required, assuming we can somehow tie multiple CPUs together?
On the OS side, the common mutex function the OS already provides to prevent deadlock by jobs handling resources, would have to remain, which means that part is centralized so resource allocation probably always runs on the boot CPU (or core). In other words, some OS calls would have to be 'queued' and then sequentially run one by one on a single core in order to manage resources properly. IO becomes more complex as some things can and others can't be distributed, for instance screen handling. The scheduler would become more complex but also faster once there are enough jobs to occupy the available cores as scheduling overhead happens only on one core (the others are instructed at interrupt which job to run next, without the need to calculate which one's turn it is). The OS also needs to be aware of common memory and private memory, though in the interest of simplicity, the best approach would be to have common memory and rely on caching to give the CPUs faster access to commonly accessed data.
On the hardware side, private vs common RAM has to be handled, but as i said, it is possible to make all of it look (or indeed be) common RAM. That being said, in order to access some parts of RAM and be able to cache the contents, but also being able to force non-caching to guarantee the common data is always a real copy, the available RAM would have to have a cache inhibited alias.

This was something I experimented while planning the GF, the entire RAM had an alias, so that if one wanted to access it as a non-cached area, a fixed offset was added to the required address and it would access that address directly, no caching. This would keep the memory allocation mechanism the same, and the job or OS code or driver or whatever could then decide as needed to use it cached or not, on fly - making that distinction also in a manner of speaking co-operative between whatever needs the memory in question.

======================

Re RAM limit for QDOS, that was only 2M, also nearing the limit for the slave block table :)
As far as I know, there was no direct limit and it could in principle use the entire 4G address space, but I have been warned that some parts of the OS treat address pointers as signed numbers so that would limit the available space to 2G.
That being said, there seems to be one application that lowers this limit further and that is (to my knowledge) the Qliberator Basic compiler, which used the top 3 bits of 32-bit addresses as some sort of debug info, counting on those bits not being available as a real address on 68k CPUs available at the time. In other words, it expects the top 3 address bits, A29, 30, 31 to be 'don't care' which reduces the usable address space to 512M and all 8 possible states of the top 3 bits need to alias to the same 512M.
The OS and applications themselves certainly do not need a lot or RAM UNTIL high res and simultaneously high color display is available and you want to run multiple programs under the PE. When a full screen uses up 32k of RAM, 4M is plenty, but try 1024x768 in 16 bit, so it becomes 1.5M all of a sudden. Some expansions to the screen drivers such as ProWess also require a decent amount of RAM once the display becomes higher resolution and deep color.

One more avenue of making things quicker is with ColdFire V3 and V4 CPUs. The problem with those is incomplete 68k compatibility and the need to write an emulator for what is not supported. It has recently been brought to my attention that the MicroAPL 68k emulation has become freely available, so that may actually make it possible to use these as a faster 68k - that being said, ColdFire is also becoming a dead end as ARM is conquering everything.


User avatar
Peter
QL Wafer Drive
Posts: 1953
Joined: Sat Jan 22, 2011 8:47 am

Re: Floating a thought to understand the issues...

Post by Peter »

Andrew wrote:But I still feel that we do not need a new architecture or a new operating system, but we need more software for what we already have - and especially expanded QLs (including Q68)
This is exactly the point. For us, even to partly catch up with software features other platforms already had 20 years ago, would be a heroic task. We could be glad if a single piece of major QL software was released today. Even that is not very likely.

Dreaming and philosphy should have it's place here on the forum. But the reality is: Our system has become just a hobby playground... even a much smaller one than for the other Sinclair computers.


User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Floating a thought to understand the issues...

Post by Dave »

Andrew wrote:
Dave wrote: Unless we stretch the limits using FPGA implementations of 680X0 running at 100s of MHz, we're not going to gain much CPU power. For our current tasks that's not really a problem. The memory limitation of a practical 128MB is, though.
I'm an analyst, so my first question is always WHY ?
Why would you need more than 128Mb ? What software do we have - or will have in a foreseeable future - that will need more than that ? Or more than 4Mb, that a SGC has ?

I do not want to offend anyone, but it seems to me that the QL is a playground for some very knowledgeable hardware developers. We have a lot of new hardware - but we have very few new software. And, as I see it, a computer with no new software has no future. I might be wrong - and I hope I'm wrong, because I love the QL. But I still feel that we do not need a new architecture or a new operating system, but we need more software for what we already have - and especially expanded QLs (including Q68)
1. To allocate "blocking" tasks like large file transfers on their own CPU. This suggests four CPUs - one for blocking tasks, one for non-blocking tasks, one for jobs and one dedicated to handling windows/screen transfers.
2. To increase the proportion of jobs happening inside cache memory. This provides a handy performance boost.
3. Large memory doesn't get used because it has never been available. If it were available, we'd start using the QL as a tool to handle large data-sets.

4. Getting a faster CPU is very expensive, both in cost and in design effort. Using multiple £3 CPUs is MUCH cheaper from the design point of view, and multiple copies of a simpler thing. Four 68EC030s will outperform a 68060 for 1/4 the cost.

It strikes me that if you accept a machine will use SMSQ/E and not be Qdos/Minerva compatible. the OS was almost written with this jump in mind.


User avatar
Dave
SandySuperQDave
Posts: 2765
Joined: Sat Jan 22, 2011 6:52 am
Location: Austin, TX
Contact:

Re: Floating a thought to understand the issues...

Post by Dave »

Peter wrote:Dreaming and philosphy should have it's place here on the forum. But the reality is: Our system has become just a hobby playground... even a much smaller one than for the other Sinclair computers.
Indeed, and to be clear, this is a very philosophical questions. It came out of two things happening together: your explanation of how fast transfers where problematic, and Nasta's redundant explanation of the difference between jobs and tasks. It just struck me as very natural that there were a lot of User space tasks that had one set of needs, eg: continuous access to resources, continuous availability of resources, and Supervisor space tasks that had one set of problems, eg: their ability to consume the resources.

I realised having User tasks on one CPU and Supervisor tasks on another would have some utility. And then I thought "Why stop there?" The CPUs do not need to be symmetric: A 68030 could be running the user space tasks and a 68000 could run the OS tasks. They could have separate memory maps, ROM images, etc. The communication portal between them would simply be the IO area. An area of dual port RAM of just a few bytes would be more than adequate.

It's not even that exotic.

Of course, I understand that the OS/software obstacles would be quite large. Chances of anyone doing anything major, beyond one or two hobbyists hacking away for their own personal amusement is very low. I mean, we have the potential for hardware pointers now, which would offer quite size-able visual and performance benefits to windows systems we all know and *cough* love.

I'm just interested in what this hybrid system might look like from a hardware and OS point of view. To software, it would obviously appear to be an unchanged system.


Post Reply