Copyright 2008 Sony Corporation of America
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
DISCLAIMER
THIS DOCUMENT IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, OR TITLE; THAT THE CONTENTS OF THE DOCUMENT ARE SUITABLE FOR ANY PURPOSE; NOR THAT THE IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE DOCUMENT OR THE PERFORMANCE OR IMPLEMENTATION OF THE CONTENTS THEREOF.
Additional Resources
MARS assumes a target multicore architecture where there is a single host processor (host) that is managing or controlling the execution of programs or processes on 1 or more separate microprocessing units (MPUs).
MARS assumes a target audience of application developers focusing on multicore architectures.
Fig. 1
The host program is responsible for the initialization of all sub programs to be run on the various microprocessing units (MPU) available on the target multicore architecture.
The memory area accessible by the host processor will be referred to as the host storage.
The MPU program is the sub program that is initialized by the host program and executed on the MPU.
The MPU program should be in the ELF format. When the host program initializes the MPU program for execution, it will need to know the address of the MPU program ELF image in host storage. The procedures to get the MPU program ELF image loaded into host storage is platform independent and outside the scope of MARS.
The memory area accessible by the MPU will be referred to as the MPU storage.
(1) Memory Size of MPU Storage
First, the memory size of the MPU storage is limited. As each application processing gets more complex and the code sizes of MPU programs get larger, the size of MPU programs offloaded to the MPUs may exceed the physical memory size of the MPU storage.
If the size exceeds the memory size of the MPU storage, the offloaded MPU processing must be partitioned into smaller pieces of code in order to reduce the code size for each MPU program. As the result of this partitioning of code, some collaborative processing such as transferring computation results or waiting for processing completion between various MPU programs becomes necessary.
(2) Number of Physical MPUs
Second, the number of physical MPUs is limited. Although multi-MPU parallelization for many processing is required, the allowable number of MPU processors is limited. If application processing is multi-threaded and many different MPU programs are run simultaneously on the MPUs, the number of MPUs necessary for executing MPU programs will easily run out.
To run more MPU programs in parallel than there are physical number of MPUs available, we need a mechanism to switch currently running MPU programs depending on the situation. Also, if programs interact with each other like above (1), the program execution order should be considered when switching and running MPU programs.
Thus, a complex mechanism to control execution of MPU programs is required for applications where multi-MPU parallelization is needed.
Fig. 1.4
As shown in Fig. 1.4, when using such a host processor centric programming model, the host processor load becomes heavily utilized in the managing of all MPU program execution and control operations. Not only will this tie up the host processor from processing other tasks, but the MPU programs will also experience a decrease in performance as they wait for the host processor to finish managing of all MPU programs.
To have finer control over the execution of MPU programs by this host-centric approach, MPU control code (for loading/switching MPU programs or making interactions between MPU programs) in host programs become more complex. Furthermore, this results in the decreased performance of MPU programs due to the waiting for completion of host programs.
For example, if a host program is running, other host programs in this application may need to wait for their turn to be processed until the host program has been completed. This causes a delay in the loading/switching of MPU programs or sending/receiving data between MPU programs, and consequently MPU must wait idle even if it is free to run other programs during that idle wait time.
This is the result of controlling the execution of MPU programs via the host programs. If such host processing flow can be eliminated, and MPUs can directly send/receive data and load/switch MPU programs for execution, the MPUs can be used more efficiently.
Fig. 1.5
As shown in Fig. 1.5, loading/switching MPU programs and sending/receiving data is performed independently of the host program. In making the MPUs self managing, there is no longer a need to wait for the host processor to finish MPU management and thus increasing the MPU utilization and performance.
This also frees up the host processor from much of the MPU management. However the host processor is still responsible for some of the setup management necessary for MPU program execution. Other operations which cannot be performed by the MPU, such as file input/output must also be processed by the host program.
By using the MARS library, multiple MPU programs can be run cooperatively. This means that applications that run a large number of MPU programs one after another can be created without taking into account the physical number of MPUs available while leaving the reponsibility of efficiently switching MPU program execution up to the MARS library.
The kernel is a relatively simple and small piece of code that stays resident on each MPU's storage area. Each kernel has its own scheduler that determines which workload to process. Based on the scheduled workload, the kernel will load the necessary MPU program to MPU storage and execute it.
Fig. 2.1
As shown in Fig. 2.1, the kernel has 3 basic states of operation. Once loaded and started, the only reponsibility of the kernel is to search for a workload to schedule, jump execution to the MPU program of the workload, and context switch the workload if necessary, then return back to the scheduling state.
The kernel is a non-preemptive kernel and therefore workloads that are executed on each MPU will continue to run and use up the MPU's resources until it finishes execution or enters a wait state. When a workload enters a wait state, the kernel must handle the context switch. This involves saving the workload context into host storage for continued execution when the context is scheduled for execution at a later stage.
A workload is the term used to refer to a single unit of an MPU program or multiple MPU programs that must be scheduled for execution on the MPUs. The actual design and behavior of how a workload will be processed after the workload is scheduled by the kernel will vary based on the workload model.
MARS aims to provide various workload models not specific to just one. Therefore an abstract MARS workload is necessary to accommodate various workload models.
One example of a workload model may be a single large process that is executed on a single MPU, while another example of a workload model may define a large number of small processes that are executed on various MPUs.
Fig. 2.2
Each workload model needs its own workload module implementation that handles the model specific processing of workloads. The workload module will make use of the module API provided by the MARS kernel. These kernel system calls are only accessible by the workload module. The interface between user programs and the workload module is left up to the workload model design.
Fig. 2.3
The MARS kernel is responsible for searching for a schedulable workload in this queue and when found it loads the workload into MPU storage for processing. Once the workload is loaded, the MARS kernel passes responsibility of workload processing to the workload module specified in the workload.
When a workload is scheduled by the kernel, the workload's state within the queue is set to a reserved state so no other kernel will attempt to schedule the same workload.
Since this queue is shared by both host and MPU, its access is protected by atomic operations.
Fig. 2.4
Before any of the MARS functionalities can be utilized, an instance of a MARS context must be initialized. When the system is completely done with MARS functionality, the context must be finalized.
When a context is initialized within a system by the host processor, each MPU (depending on how many MPUs are initialized for the context) is loaded with the MARS kernel that stays resident in MPU storage and continues to run until the host processor finalizes the context.
The context also creates the workload queue in host storage. Each kernel, through the use of atomic synchronization primitives, will reserve and schedule workloads from this queue.
When the context is finalized, all kernels running on the MPUs are terminated and all resources are freed.
In a system, multiple MARS contexts may be initialized and the kernels and workloads of each context will be independent of each other. However, one of the main purposes of MARS is to avoid the high cost of process context switches within MPUs initiated by the host processor. If multiple MARS contexts are initialized, there will be an enormous decrease in performance as each MARS context is context switched in and out. In the ideal scenario, there should be a single MARS context initialized for the whole system.
Fig. 2.5
Depending on the target platform, MARS should install the necessary host headers and libraries to the appropriate host paths.
Fig. 3.1
In order to use any of the host processor library API, the user must include the necessary library API headers:
#include <mars/task.h> /* header for task workload library API */
The host program written for the host processor needs to link in the MARS host libraries.
MARS provides both static and dynamic libraries for the host processor.
The following are the libraries for the MARS base and task workload model:
libmars_base.a /* MARS base static library */ libmars_base.so /* MARS base dynamic library */ libmars_task.a /* MARS task static library */ libmars_task.so /* MARS task task dynamic library */
MARS provides these libraries for both 32-bit and 64-bit runtimes.
The actual procedure to compile a MARS host program and to link the MARS host library may vary depending on the target platform.
/* Example host 32-bit compile on Cell B.E. platform */ HOST_CC = ppu-gcc HOST_CFLAGS = -m32 $(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars_task -lmars_base /* Example host 64-bit compile on Cell B.E. platform */ HOST_CC = ppu-gcc HOST_CFLAGS = -m64 $(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars_task -lmars_base
Depending on the target platform, MARS should install the necessary MPU headers and libraries to the appropriate MPU paths.
Fig. 3.2
The MPU programs written for the MPUs need to link in the MARS MPU library. In order to use any of the MPU library API, the user must include the necessary library API headers:
#include <mars/task.h> /* header for task workload library API */
The MPU program written for the MPU needs to link in the MARS MPU library.
MARS provides only a static library for the MPU.
The following are the libraries for the MARS base and task workload model:
libmars_base.a /* MARS base static library */ libmars_task.a /* MARS task static library */
When compiling the MPU programs, it is also necessary to specify the '.init' section to the workload base address specified for the workload model.
For example, a MARS task program should specify the task base address equal to MARS_TASK_BASE_ADDR (currently 0x4000).
The actual procedure to compile a MARS MPU program and to link the MARS MPU library may vary depending on the workload model and target platform.
/* Example MPU compile on Cell B.E. platform for task program */
MPU_CC = spu-gcc
MPU_LD_FLAGS = -Wl,-N -Wl,-gc-sections -Wl,--section-start,.init=0x4000
$(MPU_CC) $(MPU_LD_FLAGS) mpu_prog.c -lmars_task -lmars_base
1. Create a MARS context.
2. Create a MARS workload.
3. Process necessary synchronizations between host and MPU programs.
4. Process other host program tasks asynchronous to MPU processing.
5. Destroy the MARS workload instance (waits until MARS workload completion).
6. Destroy the MARS context.
Fig. 3.3
When all processing is completed, the host program must also be responsible for destroying the created MARS context.
/* sample host processor side host_prog.c */ struct mars_context *mars_ctx; /* mars context pointer */ /* Create a MARS context */ int ret = mars_context_create(&mars_ctx, 0, 0); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* create failed */
Context creation parameters:
int mars_context_create( struct mars_context **mars, uint32_t num_mpus, uint8_t shared);
mars
This is the address of the pointer to MARS context. A MARS context will be allocated and its address stored in this pointer.
num_mpus
This is the number of MPUs you want utilized by this MARS context. The number of MPUs specified must be available by the system or an error is returned. You can specify 0 to have MARS utilize all the available MPUs for the context.
shared
This specifies if you are requesting a shared context. If you request a shared context, a global context is returned which can be shared by any libraries that your application links to that also request a shared context.
/* Destroy the MARS context previously created */ ret = mars_context_destroy(mars_ctx); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* destroy failed */
The MARS mutex is independent of the MARS context or MARS workload model. A MARS mutex can be used in a host program without even creating a MARS context. A MARS mutex can also be used in an MPU program independent of any MARS workload model or API. However, an MPU program independent of any MARS workload model means the user will be responsible for the loading and execution of such a program and has close to no meaning with regards to the usage of MARS.
The MARS mutex does not call into the MARS kernel's scheduler. This means that when some entity attempts to lock a mutex that is already locked, the mutex will block execution of the entity until the lock can be obtained. For the MPU-side, this means that the MARS kernel can not schedule any other workloads while a MARS mutex is waiting to lock.
If you want to make use of synchronization methods that call into the MARS kernel's scheduler and allow for other workloads to be scheduled during the time a synchronization object waits, refer to the synchronization methods provided by the various workload models.
Fig. 5.1
/* sample host processor side host_prog.c */ struct mars_mutex *mutex; /* mars mutex pointer */ /* Create a MARS mutex */ int ret = mars_mutex_create(&mutex); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* create failed */
/* Lock the MARS mutex previously created */ ret = mars_mutex_lock(mutex); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* destroy failed */ /* critical code section */ /* Unlock the MARS mutex previously locked */ ret = mars_mutex_unlock(mutex); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* destroy failed */
/* Destroy the MARS mutex previously created */ ret = mars_mutex_destroy(mutex); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* destroy failed */
1. Workload Model Host Library
The host-side library of the workload model will provide the user with the interface to create workload contexts and add them to the workload queue so that the workload can be scheduled for execution by the MARS kernel.
It is the responsibility of this host-side library to populate the contents of the workload context structure with all the necessary information specific to the workload model design.
Fig. 6.1a
Fig. 6.1a, shows that for the task workload model, the host program depends on the MARS task host library and MARS base host library.
2. Workload Model MPU Library
The MPU-side library of the workload model will provide the user with the interface to handle any workload model specific functionalities.
It is the responsibility of this MPU-side library to handle any processing of the workload specific to the workload model design. This library will also need to call into the workload module implemented specifically for the workload model. It is left completely up to the design of each workload model as to what interfaces should be provided between the workload module and MPU-side library.
Fig. 6.1b
Fig. 6.1b, shows that for the task workload model, the MPU task program depends on the MARS task MPU library and MARS base MPU library.
3. Workload Model Module
The workload module is the MPU program that is loaded and executed by the MARS kernel when a specific workload context is scheduled and ready to be executed. Each workload context needs to know the corresponding workload module that will be responsible for the execution and mangement of the workload.
The workload module will remain resident in the MPU storage as long as the workload it is responsible for remains in the running state. Its main function is to load and execute the MPU program specified by the currently scheduled workload context. The workload module also serves as the communication layer between the user's workload specific MPU program and the MARS kernel.
workload module entry
The entry point for the workload module must be mars_module_entry.
workload module base address
The workload module is loaded into MPU storage by the MARS kernel at the address specified by MARS_WORKLOAD_MODULE_BASE_ADDR. The size of the workload module varies for each workload model implementation. Therefore, each workload model will have the workload module load the workload program to a different address in MPU storage.
workload module stack
The stack symbol for the workload module stack must also be specified. The stack address should be immediately below the base address of the workload program that the workload module will load and execute.
Example of how to compile a workload module on a Cell B.E. platform:
/* Example MPU compile on Cell B.E. platform for workload module */
MPU_CC = spu-gcc
MPU_LD_FLAGS = -Wl,-N -Wl,-gc-sections \
-Wl,--entry,mars_module_entry -Wl,-u,mars_module_entry \
-Wl,--section-start,.init=0x3000 \
-Wl,--defsym=__stack=0x3ff0
$(MPU_CC) $(MPU_LD_FLAGS) workload_module.c -lmars_base
The workload queue API provides the basic funtions to create, schedule, remove a workload context within the workload queue. It also provides APIs to do signal handling of workloads and to wait for specific workloads to complete.
Fig. 6.2
Fig. 6.2 above shows a sample sequence of how the workload queue API can be used to implement the task workload model's host library.
The workload module API provides the basic functions to get various workload information, schedule other workloads, handle workload signals, and also functions to transition the workload state and return execution back to the MARS kernel.
Fig. 6.3
Fig. 6.3 above shows a sample sequence of how the workload module API can be used to implement the task workload model's MPU library and task workload module.
Tasks can be used to run a small MPU program many times. However the primary usage of the task model is for large grained programs that take long amounts of time to process. Since tasks may occupy the MPU for a long time and prevent other workloads to be executed on that MPU, it has the ability to yield the MPU to other workloads.
The MARS task synchronization API also provides various methods that when used to wait for certain events, allows it to enter a wait state. When tasks have yielded or are waiting, the task state is saved into host storage and the MPU is freed up to process other available workloads.
Fig. 7.1
As shown in Fig. 7.1, the MARS kernel switches which MPU task programs are being executed on the MPUs. The kernel autonomously executes the tasks on the MPUs independently from the host. Whenever an MPU is free, the kernel will load any available task into the MPU storage for execution.
The general flow for using the MARS task is as follows:
1. (host) Prepare the task program ELF image in host storage.
2. (host) Create task instances.
3. (host) Schedule tasks for execution.
4. (task) Schedule sub tasks for execution.
5. (task) Wait for sub task completion.
6. (task) Resume execution when all sub tasks have completed.
7. (task) Process and finish task execution.
8. (host) Wait for all tasks to complete.
9. (host) Destroy all task instances.
The MARS task program must define the mars_task_main function, as that is the main entry point of the program. This function is what gets called when the kernel is ready to run the task.
A task program finishes execution when it calls mars_task_exit or returns from the mars_task_main function.
The arguments (mars_task_args) passed into the mars_task_main function is specified in the host program when calling mars_task_schedule to allow the task to be scheduled for execution. If no args are specified when calling mars_task_schedule, the args passed into the mars_task_main function is uninitialized and its state is undefined.
/* sample MPU side mpu_prog.c */ #include <stdio.h> #include <mars/task.h> int mars_task_main(const struct mars_task_args *task_args) { (void)task_args; printf("Hello World!\n"); return 0; }
/* sample host processor side host_prog.c */ struct mars_context *mars_ctx; /* MARS context pointer */ struct mars_task_id task_id; /* MARS task id instance */ ... /* Assume MARS context is created as shown above */ ... /* Create the task instance */ int ret = mars_task_create(mars_ctx, &task_id, "Task", elf_image, 0); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* create failed */
MARS task creation will initialize a workload instance in the MARS context's workload queue.
The returned task id is returned to the user. The task id needs to be saved for management of the task.
Once a task is created, it must be scheduled for execution before it is ever executed by calling mars_task_schedule.
Any created tasks should be properly cleaned up with a call to mars_task_destroy when the task will no longer be scheduled for execution.
Task creation parameters:
int mars_task_create( struct mars_context *mars, struct mars_task_id *id, const char *name, const void *elf_image, uint32_t context_save_size);
mars
This is the pointer to a created MARS context.
id
This is the address of a task id instance that will be initialized upon successful task creation.
name
This specifies a string identifier for the task. The string length must be no longer than MARS_TASK_NAME_LEN_MAX.
elf_image
This specifies the address to the MPU program ELF image loaded into host storage. This MPU program needs to be a MARS task program.
context_save_size
The size of context save area to allocate on host storage to be used during a task context switch (See 7.5 Task Switching).
/* sample host processor side host_prog.c */ struct mars_task_args task_args; /* MARS task args */ /* Sets the task to a schedulable state */ ret = mars_task_schedule(&task_id, &task_args, 0); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* schedule failed */ /* Host processor can process something while the MPUs execute the tasks asynchronously. */ ... /* Blocks until the scheduled task has finished execution */ ret = mars_task_wait(&task_id, NULL); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* wait failed */
MARS task execution is done by scheduling a created task to be run by the MARS kernel. The MARS kernels running on the MPUs will automatically schedule it and load the task over to the MPU to begin execution.
While the MARS kernels process various workloads on the MPU side, the host is free to do any other processing asynchronous to any workload processing on the MPUs.
When the user chooses to do so, they can wait for a specific scheduled task to finish execution.
Any number of host threads or tasks can wait for a specific task to complete execution as long as it holds the task's id. However, the task being waited on should not be re-scheduled until all wait calls for the task have returned. Otherwise it is not guaranteed that all wait calls will return after the completion of the initial schedule call.
After a MARS task is created, it may be scheduled for execution any number of times until it is destroyed. However, a task can only be scheduled if it is not currently in the process of execution.
A MARS task that has been created by the host can be scheduled for execution by both the host and MPU-side APIs. The behavior of scheduling a task from host or MPU is identical in nature. If a task schedules a sub task for execution, and waits for the sub task to finish execution (assuming the use of a blocking wait call), it will yield its own execution until the sub task has completed. This allows for other workloads to be processed on the MPU that was executing the waiting task.
Task scheduling parameters:
int mars_task_schedule( struct mars_task_id *id, struct mars_task_args *args, uint8_t priority);
id
This is the pointer to the initialized task id of the task to be scheduled for execution.
args
This specifies the argument structure that will be passed into the task program's mars_task_main function. If NULL is specified for args, the args passed into the mars_task_main function is uninitialized and its state is undefined. You should specify NULL only if you are certain the task program will not access the args passed into mars_task_main function.
priority
This specifies the priority of the task. Task priorities range from 0 to 255, from lowest to highest priority. Higher priority tasks will be scheduled over lower priority tasks if both are available to be scheduled for execution.
When the task is no longer in a waiting state and is scheduled by the kernel to run again, the saved task context will be restored from the host storage back into MPU storage for resuming of task execution where it left off.
This task switching allows the kernel to schedule other workloads to be executed on the MPU without wasting valuable processing time while some tasks are left in a waiting state.
Fig. 7.5a
Limitations
It is important to note the limitations of a task switch:
1. A task is only capable of doing a task switch if it is created with a context save area (See mars_task_create). If no context save area is specified for the task, yield calls and any blocking calls that may put the task into a waiting state will result in error.
2. All MPU-side task API that may call into the MARS kernel scheduler to enter a waiting state is referred to as a Task Switch Call call in 10 API Reference. Before calling any MPU-side Task Switch Call, the user must be responsible to make sure that all memory transfer operations are completed. If there are incomplete memory transfer operations while a task switch occurs, the effects are undefined.
3. All MPU-side task API calls that internally handle memory transfers (*_begin/*_end) must not call any other MPU-side Task Switch Call in between the pair of *_begin and *_end calls. The reason for this limitation is the same as (2). The *_begin call, whether it be a Task Switch Call call or not, may begin a memory transfer. The memory transfer is not guaranteed to be completed until a paired *_end call.
Context save size
When creating the task context, you must specify the size of the context save area that will be allocated and used during the task switch. By default, the task module will only save and restore the used areas of MPU storage necessary to perform the task switch.
You can specify one of the following for 'context_save_size' when creating the task with mars_task_create :
1. 0 - No context save area will be allocated. Use this to create a run complete task that never does a task switch
2. MARS_TASK_CONTEXT_SAVE_SIZE_MAX - maximum necessary area will be allocated for a context save. This option will always allocate the maximum area required for any task to context switch, regardless of whether all of the area will be necessary or not by the particular task being created.
Fig. 7.5b
3. user specified size - user can specify the size of the context save area necessary to task switch their specific task. For example, the task's text, data, heap and stack occupies only N bytes of MPU storage, context_save_size = N can be specified to avoid having to allocate MARS_TASK_CONTEXT_SAVE_SIZE_MAX bytes that would waste MARS_TASK_CONTEXT_SAVE_SIZE_MAX - N bytes of unused host storage space.
Fig. 7.5c
/* sample host processor side host_prog.c */ /* Destroy the task previously created */ ret = mars_task_destroy(&task_id); if (ret != MARS_SUCCESS) /* error checking */ return USER_DEFINED_ERROR; /* destroy failed */
MARS task destroy will cleanup the created task and finalize the workload instance in the workload queue. Once the task is destroyed, the task's resources will be freed.
This function should be called when the task will no longer be scheduled for execution by a call to mars_task_schedule. Once a task is destroyed, the task and task id will become obsolete.
As described previously, enabling MARS tasks to send/receive data directly between each other independently of the host is the important factor in improving the usability and efficiency of MPUs. MARS provides various synchronization and communication functions which can make efficient interaction between MARS tasks or between MARS tasks and host programs.
The MARS Task Synchronization API provides the following types of synchronization objects:
(1) MARS Task Barrier
This is used to make multiple MARS tasks wait at a certain point in a program and to resume the task execution when all tasks are ready.
(2) MARS Task Event Flag
This is used to send event notifications between MARS tasks or between MARS tasks and host programs.
(3) MARS Task Queue
This is used to provide a FIFO queue mechanism for data transfer between MARS tasks or between MARS tasks and host programs.
(4) MARS Task Semaphore
This is used to limit the number of concurrent accesses to shared resources among MARS tasks.
(5) MARS Task Signal
This is used to signal a MARS task in the waiting state to change state so that it can be scheduled to continue execution.
Fig. 8.1
As shown in Fig. 8.1, task synchronization instances are created in host storage. Both the host program and MPU program's MARS task access these instances resident on the host storage.
Fig. 8.2
In Fig. 8.2, the semaphore synchronization method is used as an example to show the benefit of using the MARS task synchronization over a simple synchronization method.
When using simple synchronization methods within a MARS task, if the synchronization method blocks, it will force the task to wait until the synchronization method allows for execution to resume. If a task must wait on some synchronization method for a very long time, the MPU executing the task will be forced to block without being able to process anything else during that time.
The MARS task synchronization methods prevent the wasting of valuable MPU processing time during the time a task blocks on some synchronization method. When a MARS task blocks on some synchronization method, the task itself will enter a waiting state. This allows for the MPU executing the task to do a task switch, allowing it to execute some other task that is not in a waiting state. Once the original task in the waiting state receives the synchronization event it was waiting for its state will be returned to a runnable state and will be scheduled for resumed execution when the MPU becomes available.
The general flow for using the MARS task barriers is as follows:
1. (host) Allocate memory for task barrier structure.
2. (host) Create task barrier.
3. (host) Create tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Notify barrier of synchronization point arrival.
6. (task) Wait until all tasks notify barrier and barrier is released.
7. (task) Finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Destroy task barrier and free allocated memory.
Fig. 8.3
In Fig. 8.3, there is a MARS task barrier created to wait on notifications from 3 separate tasks.
First, Task A reaches the synchronization point first and notifies the barrier. Since the barrier has not yet been released, it enters a wait state and yields the MPU to execute another Task X.
Next, Task C reach the synchronization point soon after and yields MPU execution to another Task Y after notifying the barrier. Finally, Task B reaches the synchronization point, at which point it notifies the barrier and the barrier is released.
Once the barrier is released, Task B continues with execution while both Tasks A and C are available to be scheduled for execution as soon as there is an available MPU.
The event flags can be sent from host program to MARS task or vice versa, as well as between multiple MARS tasks. While waiting on certain event flags to be received, the task transitions to the waiting state until the event flag is received.
The general flow for using the MARS task event flag is as follows:
1. (host) Allocate memory for task event flag structure.
2. (host) Create task event flag.
3. (host) Create tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Wait until specified event flag bit is set.
6. (host or task) Set the specified event flag bit.
7. (task) Finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Destroy task event flag and free allocated memory.
Fig. 8.4
In Fig. 8.4, there are 2 separate MARS event flags created. One event flag is created for host to MPU communication, while the other is created for MPU to MPU communication.
First, Task A reaches the synchronization point first and waits for a specific event flag bit to be set. As it waits for the event, it enters the wait state and yields execution of the MPU so that Task X can run.
Next, Task B reaches its synchronization point and allows for Task Y to run while it waits for the event.
Next, the host program sets the event flag bit Task A is waiting on, at which point Task A becomes available for resumed execution.
Finally, as Task A becomes scheduled and resumes execution it then sets the event flag bit Task B is waiting on, at which point Task B becomes available for resumed execution.
From either a host program or MARS task you can push data into the queue and also from either a host program or MARS task you can pop data out from the queue as soon as it becomes available.
The advantage of the MARS task queue is that when a MARS task requests to do a pop and no data is available yet to be received from the queue, the MARS task will enter a waiting state. As soon as data is available to be popped from the queue, the MARS task can be scheduled for resumed execution with the received data.
The general flow for using the MARS task queue is as follows:
1. (host) Allocate memory for task queue structure.
2. (host) Create task queue.
3. (host) Create tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Pop queue and wait until data is available.
6. (host or task) Push queue with data.
7. (task) Receive data and finish task execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Destroy task queue and free allocated memory.
Fig. 8.5
In Fig. 8.5, there is a MARS queue instance is created to send and receive data between a host program and MARS tasks.
First, Task A reaches the synchronization point first where it requests to pop data from the queue. At this point in time, nobody has pushed data into the queue, and the queue is empty. This causes Task A to enter a wait state and yield MPU execution to Task X.
Next, Task B reaches its synchronization point and requests to pop data. Since the queue is still empty, it also enters the waiting state and yields MPU execution to another Task Y.
Next, the host program push some data into the queue, at which point Task A becomes available for resumed execution with the data from the host received.
Finally, as Task A becomes scheduled and resumes execution, it then pushes some other data into the queue, at which point Task B becomes available for resumed execution with the data from Task A received.
Whenever a task wants to access some semaphore protected shared resource, it must first request to acquire the semaphore access (P operation) of the semaphore. When done accessing the shared resource it must then release access (V operation) of the semaphore. If attempting to request a a semaphore and other tasks have already requested the total number of allowed accesses, the task will transition to the waiting state until some other tasks release the semaphore and access is obtained.
The general flow for using the MARS task semaphore is as follows:
1. (host) Allocate memory for task semaphore structure.
2. (host) Create task semaphore.
3. (host) Create tasks and schedule for execution.
4. (task) Process until synchronization point.
5. (task) Acquire sempahore and wait until semaphore is obtained.
6. (task) Modify shared resource data.
7. (task) Release semaphore and finish execution.
8. (host) Wait for task completion and finalize tasks.
9. (host) Destroy task semaphore and free allocated memory.
Fig. 8.6
In Fig. 8.6, there is a MARS semaphore created to be shared between 2 MARS tasks. This semaphore is used to prevent simultaenous access of some shared data in the host storage.
First, Task A reaches the synchronization point first where it requests to acquire the semaphore. Since no other task holds the semaphore, Task A successfully acquires the semaphore without having to wait. It then continues execution to modify some shared data in the host storage.
Next, Task B reaches the synchronization point where it requests to acquire the same semaphore to modify the same shared data in host storage. At the time of the request to acquire the semaphore, Task A still holds the semaphore, causing Task B to enter a waiting state. As Task B is waiting, it yields MPU execution to another Task X.
Next, Task A completes modifying the shared data in host storage and releases the semaphore. This allows Task B to become available for resumed execution.
Finally, as Task B becomes scheduled for resumed execution, it continues to modify the shared data in host storage. Task B then releases the semaphore when access to the shared data is complete.
From either a host program or MARS task you can specify a certain task to signal. When the task waits for a signal to be received it will be transitioned to the waiting state until the signal is received.
The general flow for using the MARS task signal is as follows:
1. (host) Create tasks and schedule for execution.
2. (task) Process until synchronization point.
3. (task) Wait for signal.
4. (host or task) Send signal to the waiting task.
5. (task) Resume and finish execution.
6. (host) Wait for task completion and destroy tasks.
Fig. 8.7
In Fig. 8.7, there is a host program using signals to synchronize execution between 2 MARS tasks.
First, Task A reaches the synchronization point first where it waits on a signal. At this point in time, nothing has signalled Task A and causes it to enter a wait state and yield MPU execution to Task X.
Next, Task B reaches its synchronization point and waits on a signal. Since nothing has signalled Task B, it also enters the waiting state and yields MPU execution to another Task Y.
Next, the host program sends a signal to Task A, at which point Task A becomes available for resumed execution.
Finally, as Task A becomes scheduled and resumes execution, it signals Task B, at which point Task B becomes available for resumed execution.
The sample code creates and schedules a task that prints that prints "Hello!" to stdout and exits.
(host program)
1 #include <mars/task.h> 2 3 static void *task_program_elf_image; 4 5 int main(void) 6 { 7 struct mars_context *mars_ctx; 8 struct mars_task_id task_id; 9 int task_exit_code; 10 11 mars_context_create(&mars_ctx, 0, 0); 12 mars_task_create(mars_ctx, &task_id, "Task", task_program_elf_image, 0); 13 mars_task_schedule(&task_id, NULL, 0); 14 mars_task_wait(&task_id, &task_exit_code); 15 mars_task_destroy(&task_id); 16 mars_context_destroy(mars_ctx); 17 18 return 0; 19 }
Line:1 | Include the header file "mars/task.h" necessary for utilizing the MARS task library.
|
Line:3 | Pointer to the task program's ELF image in host storage. The procedure to load the task program into host storage is platform specific. Therefore, the code to do so is not shown anywhere in this sample code.
|
Line:7 | Declare the MARS context pointer.
|
Line:8 | Declare the structure for storing the MARS task id.
|
Line:9 | Declare the instance to store the task exit code.
|
Line:11 | Create the MARS context instance. int mars_context_create ( arg1: This is the address of the pointer to MARS context declared at Line:7. A MARS context will be created and its address stored in this pointer. arg2: This is the number of MPUs you want utilized by this MARS context. The number of MPUs specified must be available by the system or an error is returned. Here 0 is specifified to have MARS utilize all the available MPUs for the context. arg3: This specifies if you are requesting a shared context. Here 0 is specified since we do not require sharing the MARS context for this sample. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:12 | Create the MARS task instance. int mars_task_create ( arg1: Pass in the MARS context pointer. arg2: Pass in the pointer to MARS task id structure declared at Line:8. Upon successful completion, the task id will be initialized as required. arg3: Specify the NULL terminated string name of the task you want to create. arg4: Specify the address of the task program's ELF image that is loaded into host storage. The task program specified here is what will be loaded into MPU storage for execution when this task is scheduled to run by the MARS kernel. arg5: Specify the context save area size for this task. Since this task will not task switch, we do not need to specify a context save size so specify 0. Otherwise, if we want to create a task that can task switch we must specify a context save size or specify MARS_TASK_CONTEXT_SAVE_SIZE_MAX. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:13 | Schedule the task for execution. int mars_task_schedule ( arg1: Pass in the pointer to the task id initialized at Line:12. arg2: Pass in the pointer to the task arg structure we want to pass into the task program's mars_task_main function. For this sample we do not need to pass any args into the task program so specify NULL. arg3: Pass in the value for the scheduling priority this task. Since we only schedule 1 task for execution, the scheduling priority has no effect in this example. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:14 | Wait for the completion of the task. int mars_task_wait ( arg1: Pass in the pointer to the task id we want to wait for. arg2: Pass in the address of the variable to store the task exit code declared at Line:9. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) This call will block until the task previously scheduled in Line:13 completes execution. If we want to process some other tasks in the host program while waiting for the task to complete, we can do so before calling wait. Similarly, a non-blocking wait function mars_task_try_wait is also provided to poll for task completion.
|
Line:15 | Destroy the completed task. int mars_task_destroy ( arg1: Pass in the pointer to the task id we want to destroy. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) We can only call this function when we are sure the task has finished. In this example we are sure of completion because we properly waited for task completion in Line:18. After the task is destroyed, we can no longer schedule this task for execution.
|
Line:16 | Destroy the MARS context. int mars_context_destroy ( arg1: Pass in the pointer to the MARS context we want to finalize. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) This unloads all running MARS kernels from the MPUs and handles any necessary cleanup for the MARS library. No more MARS API calls can be made after this function until the MARS context is created once again.
|
(task program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 (void)task_args; 7 8 printf("MPU(%d): %s - Hello!\n", 9 mars_task_get_kernel_id(), mars_task_get_name()); 10 11 return 0; 12 }
Lines:1-2 | Include the header file "stdio.h" for printf and "mars/task.h" necessary for utilizing the MARS task library.
|
Line:6 | Since we specified NULL for the task args in Line:13 of the host program above, the state of task_args is undefined. In this program we do not and should not access the task_args.
|
Lines:8-9 | Print out message to stdout. The calls to mars_task_get_kernel_id returns the id of the kernel that the current task is running on. The calls to mars_task_get_name return the string name of the current running task specified during task creation at Line:12 of the host program above.
|
Line:11 | Returning from mars_task_main completes execution of the task. This will signal anything waiting for this task's completion to resume execution. In this example, the host program's call to mars_task_wait in Line:14 will return. Equivalent to returning from mars_task_main, we can also call mars_task_exit. The return value will be returned to the host program in the variable passed into mars_task_wait.
|
The sample code creates 3 separate task instances. One instance of the main task 1 program is created and 2 instances of a sub task 2 program is created.
The first main task is scheduled for execution by the host. The main task then schedules 2 instances of the sub task for execution using the sub task's id's specified by the arguments passed in by the host during scheduling.
Each instance of the sub task will print out "Hello!" and a unique value specified by the arguments passed in by the main task during scheduling.
(host program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 #define NUM_SUB_TASKS 2 5 6 static void *task1_program_elf_image; 7 static void *task2_program_elf_image; 8 9 int main(void) 10 { 11 struct mars_context *mars_ctx; 12 struct mars_task_id task1_id; 13 struct mars_task_id task2_id[NUM_SUB_TASKS]; 14 struct mars_task_args task_args; 15 int i; 16 17 mars_context_create(&mars_ctx, 0, 0); 18 19 mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 20 21 for (i = 0; i < NUM_SUB_TASKS; i++) { 22 char name[16]; 23 sprintf(name, "Task 2.%d", i); 24 mars_task_create(mars_ctx, &task2_id[i], name, task2_program_elf_image, 0); 25 } 26 27 task_args.type.u64[0] = mars_ptr_to_ea(&task2_id[0]); 28 task_args.type.u64[1] = mars_ptr_to_ea(&task2_id[1]); 29 30 mars_task_schedule(&task1_id, &task_args, 0); 31 mars_task_wait(&task1_id, NULL); 32 mars_task_destroy(&task1_id); 33 34 for (i = 0; i < NUM_SUB_TASKS; i++) 35 mars_task_destroy(&task2_id[i]); 36 37 mars_context_destroy(mars_ctx); 38 39 return 0; 40 }
Line:13 | Declare an instance of the task id structure for each sub task we want to create and schedule.
|
Lines:19 | Create the main task instance with the ELF image of task program 1. The main task needs to provide a context save area in order to allow for context switching while waiting for sub task completion. Specify MARS_TASK_CONTEXT_SAVE_SIZE_MAX for the context save area size so a context save area is initialized for the main task context.
|
Lines:21-25 | Create the 2 sub task instances with the ELF image of task program 2. The sub task does not need to do a context switch so no context save area size needs to specified.
|
Lines:27-28 | The main task needs to know the addresses of the task ids it plans to schedule for execution. Store each sub task id address into the task args passed into the main task's mars_task_main function.
|
Line:30 | Schedule the main task for execution. Pass in the task args we initialized with the sub task id addresses at Lines:27-28. Since we only schedule 1 main task for execution, and the main task is waiting when any one of its sub task's is being executed, the scheduling priority specified has no effect in this example.
|
Line:31 | Wait for the completion of the main task.
|
Line:32 | Destroy the completed main task.
|
Line:34-35 | Destroy the completed sub tasks also.
|
(task 1 program)
1 #include <mars/task.h> 2 3 int mars_task_main(const struct mars_task_args *task_args) 4 { 5 struct mars_task_id task2_0_id; 6 struct mars_task_id task2_1_id; 7 struct mars_task_args args; 8 9 get(&task2_0_id, task_args->type.u64[0], sizeof(task2_0_id)); 10 get(&task2_1_id, task_args->type.u64[1], sizeof(task2_1_id)); 11 12 args.type.u32[0] = 123; 13 mars_task_schedule(&task2_0_id, &args, 0); 14 15 args.type.u32[0] = 321; 16 mars_task_schedule(&task2_1_id, &args, 0); 17 18 mars_task_wait(&task2_0_id, NULL); 19 mars_task_wait(&task2_1_id, NULL); 20 21 return 0; 22 }
Line:3 | Since the task args were passed into mars_task_schedule at Line:30 of the host program, task_args is pointing to an initialized mars_task_args structure.
|
Line:5 | Declare an instance to store the task id of the first sub task to execute.
|
Line:6 | Declare an instance to store the task id of the second sub task to execute.
|
Line:7 | Declare an instance or the task arg structure we want to initialize with unique IDs to pass into the sub tasks.
|
Lines:9-10 | Memory transfer from host storage to MPU storage the task id structures of the initialized sub tasks. The host storage addresses of these task id structures were specified at Lines:27-28 of the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
|
Lines:12-13 | Initialize the task args structure with a unique value. Schedule the first sub task instance using the task id obtained at Line:9. Pass in the task args and priority of 0.
|
Lines:15-16 | Initialize the task args structure with a unique value. Schedule the second sub task instance using the task id obtained at Line:10. Pass in the task args and priority of 0.
|
Lines:18-19 | Wait for the completion of both sub tasks. If the first sub task has not finished execution by the time of the call to mars_task_wait at Line:18, this main task will enter a wait state and its context will be switched out. When the first sub task completes execution, this main task will resume execution and continue on to wait for the second sub task to complete. Similarly, at the time of the call to mars_task_wait at Line:19, if the second sub task has not yet completed it will enter a wait state once again until completion of the second sub task.
|
(task 2 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 printf("MPU(%d): %s - Hello! (%d)\n", 7 mars_task_get_kernel_id(), mars_task_get_name(), 8 task_args->type.u32[0]); 9 10 return 0; 11 }
Line:4 | Since the task args were passed into mars_task_schedule at Line:13 and Line:16 of the main task 1 program, task_args is pointing to an initialized mars_task_args structure. This structure contains the unique value specified by the main task 1 program.
|
Lines:6-8 | Print out message to stdout. Print out the unique value specified by the main task 1 program. This value should be unique for each sub task program.
|
The sample code creates a task barrier and 10 task instances of a task program. Each task program must do several iterations of some pre-processing work and some post-processing work. For each iteration, all tasks must complete the pre-processing work before any tasks can continue to do the post-processing work. In order to synchronize the tasks to accomplish this, a task barrier will be used. After finishing the pre-processing and before starting the post-processing work, the tasks will notify arrival to the barrier. Once all tasks notify the barrier and the barrier is released, all tasks can proceed to finish the post-processing work.
(host program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 #define NUM_TASKS 10 5 6 static void *task_program_elf_image; 7 8 int main(void) 9 { 10 struct mars_context *mars_ctx; 11 struct mars_task_id task_id[NUM_TASKS]; 12 struct mars_task_args task_args; 13 uint64_t barrier_ea; 14 int i; 15 16 mars_context_create(&mars_ctx, 0, 0); 17 18 mars_task_barrier_create(mars_ctx, &barrier_ea, NUM_TASKS); 19 20 for (i = 0; i < NUM_TASKS; i++) { 21 char name[16]; 22 sprintf(name, "Task %d", i); 23 24 mars_task_create(mars_ctx, &task_id[i], name, task_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 25 26 task_args.type.u64[0] = barrier_ea; 27 mars_task_schedule(&task_id[i], &task_args, 0); 28 } 29 30 for (i = 0; i < NUM_TASKS; i++) { 31 mars_task_wait(&task_id[i], NULL); 32 mars_task_destroy(&task_id[i]); 33 } 34 35 mars_task_barrier_destroy(barrier_ea); 36 37 mars_context_destroy(mars_ctx); 38 39 return 0; 40 }
Line:11 | Declare an array of 10 task ids for each instance of the task program we plan to create and schedule.
|
Line:13 | Declare an instance of the task barrier ea.
|
Line:18 | Create the task barrier instance. int mars_task_barrier_create ( arg1: Pass in the MARS context pointer. arg2: Pass in the address to the barrier ea we declared at Line:13. arg3: Pass in the total number of task notifications to wait for before the barrier is released. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:24 | Create each of the 10 task instances. Specify the task program ELF image for these task instances and a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
|
Lines:26-27 | Initialize the task args we want passed into the task program's mars_task_main function. Store the barrier ea in the task args. Schedule the task instance for execution, passing in the task args.
|
Lines:30-37 | Wait for completion and destroy all 10 task instances. Finally destroy the barrier instance and the MARS context.
|
(task program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 #define ITERATIONS 3 5 6 int mars_task_main(const struct mars_task_args *task_args) 7 { 8 int i; 9 uint64_t barrier_ea = task_args->type.u64[0]; 10 11 for (i = 0; i < ITERATIONS; i++) { 12 pre_barrier_process(); 13 14 mars_task_barrier_notify(barrier_ea); 15 mars_task_barrier_wait(barrier_ea); 16 17 post_barrier_process(); 18 } 19 20 return 0; 21 }
Line:6 | Since the task args were passed into mars_task_schedule at Line:27 of the host program, task_args is pointing to an initialized mars_task_args structure.
|
Line:9 | Grab the ea of the barrier initialized in the host program from the task arg structure.
|
Line:11 | Do several iterations of processing with the task. Each iteration of processing will be synchronized by the barrier.
|
Line:12 | Do some pre barrier processing. For this sample assume it processes some dummy work.
|
Line:14 | Notify the barrier that we have arrived at the synchronization point. int mars_task_barrier_notify ( arg1: Pass in the ea of the barrier. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:15 | Wait for the barrier to be released. int mars_task_barrier_wait ( arg1: Pass in the ea of the barrier. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If the barrier has not been released by the time of this call, this means the other tasks have not yet finished the pre barrier processing and notified the barrier yet. If this is the case, this task will enter a wait state and its context will be switched out. When all tasks notify the barrier and the barrier is released, this task will resume execution and continue.
|
Line:17 | Do some post barrier processing. For this sample assume it processes some dummy work.
|
This sample code creates 2 task instances for task 1 program and task 2 program and creates 3 event flags.
The first event flag is used to synchronize between the host program and task 1. Task 1 can only begin processing after the host program has waited 1 second and sets the event flag for task 1 to begin.
The second event flag is used to synchronize between the 2 tasks. Task 2 can only begin processing after task 1 has completed its processing and sets the event flag for task 2 to begin.
The third event flag is used to synchronize between task 2 and the host program. The host program waits until task 2 has completed its processing and sets the event flag for the host program to continue and finish execution.
(host program)
1 #include <unistd.h> 2 #include <mars/task.h> 3 4 static void *task1_program_elf_image; 5 static void *task2_program_elf_image; 6 7 int main(void) 8 { 9 struct mars_context *mars_ctx; 10 struct mars_task_id task1_id; 11 struct mars_task_id task2_id; 12 struct mars_task_args task_args; 13 uint64_t host_to_mpu_ea; 14 uint64_t mpu_to_host_ea; 15 uint64_t mpu_to_mpu_ea; 16 17 mars_context_create(&mars_ctx, 0, 0); 18 19 mars_task_event_flag_create(mars_ctx, &host_to_mpu_ea, 20 MARS_TASK_EVENT_FLAG_HOST_TO_MPU, 21 MARS_TASK_EVENT_FLAG_CLEAR_AUTO); 22 23 mars_task_event_flag_create(mars_ctx, &mpu_to_host_ea, 24 MARS_TASK_EVENT_FLAG_MPU_TO_HOST, 25 MARS_TASK_EVENT_FLAG_CLEAR_AUTO); 26 27 mars_task_event_flag_create(mars_ctx, &mpu_to_mpu_ea, 28 MARS_TASK_EVENT_FLAG_MPU_TO_MPU, 29 MARS_TASK_EVENT_FLAG_CLEAR_AUTO); 30 31 mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 32 mars_task_create(mars_ctx, &task2_id, "Task 2", task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 33 34 task_args.type.u64[0] = host_to_mpu_ea; 35 task_args.type.u64[1] = mpu_to_mpu_ea; 36 mars_task_schedule(&task1_id, &task_args, 0); 37 38 task_args.type.u64[0] = mpu_to_mpu_ea; 39 task_args.type.u64[1] = mpu_to_host_ea; 40 mars_task_schedule(&task2_id, &task_args, 0); 41 42 sleep(1); 43 44 mars_task_event_flag_set(host_to_mpu_ea, 0x1); 45 mars_task_event_flag_wait(mpu_to_host_ea, 0x1, MARS_TASK_EVENT_FLAG_MASK_AND, NULL); 46 47 mars_task_wait(&task1_id, NULL); 48 mars_task_wait(&task2_id, NULL); 49 50 mars_task_destroy(&task1_id); 51 mars_task_destroy(&task2_id); 52 53 mars_task_event_flag_destroy(host_to_mpu_ea); 54 mars_task_event_flag_destroy(mpu_to_host_ea); 55 mars_task_event_flag_destroy(mpu_to_mpu_ea); 56 57 mars_context_destroy(mars_ctx); 58 59 return 0; 60 }
Lines:13-15 | Declare 3 instances of the task event flag structure we plan to create.
|
Lines:19-29 | Create the 3 task event flag instances. int mars_task_event_flag_create ( arg1: Pass in the MARS context pointer. arg2: Pass in the address of the event flag ea we declared at Lines:13-15. arg3: Pass in the direction of events for each instance. The direction must be MARS_TASK_EVENT_FLAG_HOST_TO_MPU, MARS_TASK_EVENT_FLAG_MPU_TO_HOST, or MARS_TASK_EVENT_FLAG_MPU_TO_MPU. arg4: Pass in the clear mode for each instance. Specify MARS_TASK_EVENT_FLAG_CLEAR_AUTO so the event flag bit is automatically cleared when the first task waiting on the event receives the event. To specify not clearing the event bits automatically so that the event flag bits are set until some task manually clears it, specify MARS_TASK_EVENT_FLAG_CLEAR_MANUAL. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) The first event flag is created for host program to task program events. The second event flag is created for task program to host program events. The third event flag is created for task program to task program events.
|
Lines:31-32 | Create the task instance for both the task 1 program and task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
|
Lines:34-36 | Initialize the task args we want passed into task 1 program's mars_task_main function. Store the event flag ea for both host to mpu and mpu to mpu communication. These event flags will be used to receive events from the host program and also to send events to task 2 program. Schedule the task instance for execution, passing in the task args.
|
Lines:38-40 | Initialize the task args we want passed into task 2 program's mars_task_main function. Store the event flag ea for both mpu to mpu and mpu to host communication. These event flags will be used to receive events from the task 1 program and also to send events to the host program. Schedule the task instance for execution, passing in the task args.
|
Line:42 | Sleep for 1 second before continuing. This allows enough time for the tasks to be scheduled and begin execution. This is only to demonstrate the task entering the wait state when waiting for a specific event.
|
Line:44 | Set the event that task 1 is waiting for to allow task 1 to continue execution. int mars_task_event_flag_set ( arg1: Pass in the pointer to the event flag instance we created for host to MPU communication. arg2: Pass in the value specifying which bits to set in the event flag. These bits are logically OR'ed with the bits already set in the event flag. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:45 | Wait for an event from task 2 before continuing execution. int mars_task_event_flag_wait ( arg1: Pass in the pointer to the task instance we created for MPU to host communication. arg2: Pass in the value specifying which bits to check in the event flag. Specify MARS_TASK_EVENT_FLAG_MASK_OR to wait for any of the specified bits to be set. Specify MARS_TASK_EVENT_FLAG_MASK_AND to wait for all of the specified bits to be set. arg3: Pass in NULL because it is not necessary to know the bits status upon returning from the event wait in this sample. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If the event flag has not been set by the time of this call, this call will block until the specific event flag bit is set.
|
Lines:47-57 | Wait for completion and destroy the 2 task instances and all event flag instances and finally destroy the MARS context.
|
(task 1 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 uint64_t host_to_mpu_ea = task_args->type.u64[0]; 7 uint64_t mpu_to_mpu_ea = task_args->type.u64[1]; 8 9 mars_task_event_flag_wait(host_to_mpu_ea, 0x1, 10 MARS_TASK_EVENT_FLAG_MASK_AND, NULL); 11 12 printf("MPU(%d): %s - Hello!\n", 13 mars_task_get_kernel_id(), mars_task_get_name()); 14 15 mars_task_event_flag_set(mpu_to_mpu_ea, 0x1); 16 17 return 0; 18 }
Line:6 | Grab the ea of the event flag initialized in the host program for host to MPU communication from the task arg structure.
|
Line:7 | Grab the ea of the event flag ea initialized in the host program for MPU to MPU communication from the task arg structure.
|
Lines:9-10 | Wait for an event from the host program before continuing execution. Make sure to check for the proper bit set from the host program. If the event flag has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the event flag bit this task is checking for is set, this task will resume execution and continue.
|
Line:15 | Set the event that task 2 is waiting for to allow task 2 execution to resume.
|
(task 2 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 uint64_t mpu_to_mpu_ea = task_args->type.u64[0]; 7 uint64_t mpu_to_host_ea = task_args->type.u64[1]; 8 9 mars_task_event_flag_wait(mpu_to_mpu_ea, 0x1, 10 MARS_TASK_EVENT_FLAG_MASK_AND, NULL); 11 12 printf("MPU(%d): %s - Hello!\n", 13 mars_task_get_kernel_id(), mars_task_get_name()); 14 15 mars_task_event_flag_set(mpu_to_host_ea, 0x1); 16 17 return 0; 18 }
Line:6 | Grab the ea of the event flag initialized in the host program for MPU to MPU communication from the task arg structure.
|
Line:7 | Grab the ea of the event flag initialized in the host program for MPU to host communication from the task arg structure.
|
Lines:9-10 | Wait for an event from the task 1 program before continuing execution. Make sure to check for the proper bit set from the task 1 program. If the event flag has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the event flag bit this task is checking for is set, this task will resume execution and continue.
|
Line:15 | Set the event that the host program is waiting for to allow the host program execution to resume.
|
This sample code creates multiple task instances for task 1 program and task 2 program and creates 3 queues.
The first queue is created for host to MPU communication, so the host program can send data to the task 1 program. The second queue is created for MPU to MPU communication, so the task 1 program can send data to the task 2 program. The third queue is created for MPU to host communication, so the task 2 program can send data to the host program.
First the host program creates and schedules all task instances for execution. It then immediately begins pushing data into the host to MPU queue for task 1 program to process.
The task 1 program instances wait for data to arrive from the host and pop the data as it arrives. After popping the data, it handles some processing before pushing data into the MPU to MPU queue for task 2 program to process.
The task 2 program instances wait for data to arrive from the first task program and pop the data as it arrives. After popping the data, it handles some processing before pushing data into the MPU to host queue for the host program to receive the resulting data.
The program is completed when the host pops and receives all result data from the task 2 program.
(host program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 #define NUM_TASKS 3 5 #define NUM_ENTRIES 10 6 #define QUEUE_DEPTH (NUM_TASKS * NUM_ENTRIES) 7 8 struct queue_entry { 9 char text[64]; 10 }; 11 12 static void *task1_program_elf_image; 13 static void *task2_program_elf_image; 14 15 int main(void) 16 { 17 struct mars_context *mars_ctx; 18 struct mars_task_id task1_id[NUM_TASKS]; 19 struct mars_task_id task2_id[NUM_TASKS]; 20 struct mars_task_args task_args; 21 uint64_t host_to_mpu_ea; 22 uint64_t mpu_to_host_ea; 23 uint64_t mpu_to_mpu_ea; 24 struct queue_entry data; 25 int i; 26 27 mars_context_create(&mars_ctx, 0, 0); 28 29 mars_task_queue_create(mars_ctx, &host_to_mpu_ea, 30 sizeof(struct queue_entry), QUEUE_DEPTH, 31 MARS_TASK_QUEUE_HOST_TO_MPU); 32 33 mars_task_queue_create(mars_ctx, &mpu_to_host_ea, 34 sizeof(struct queue_entry), QUEUE_DEPTH, 35 MARS_TASK_QUEUE_MPU_TO_HOST); 36 37 mars_task_queue_create(mars_ctx, &mpu_to_mpu_ea, 38 sizeof(struct queue_entry), QUEUE_DEPTH, 39 MARS_TASK_QUEUE_MPU_TO_MPU); 40 41 for (i = 0; i < NUM_TASKS; i++) { 42 char name[MARS_TASK_NAME_LEN_MAX]; 43 44 snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 1.%d", i + 1); 45 mars_task_create(mars_ctx, &task1_id[i], name, task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 46 47 snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 2.%d", i + 1); 48 mars_task_create(mars_ctx, &task2_id[i], name, task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 49 50 task_args.type.u64[0] = host_to_mpu_ea; 51 task_args.type.u64[1] = mpu_to_mpu_ea; 52 task_args.type.u32[4] = NUM_ENTRIES; 53 mars_task_schedule(&task1_id[i], &task_args, 0); 54 55 task_args.type.u64[0] = mpu_to_mpu_ea; 56 task_args.type.u64[1] = mpu_to_host_ea; 57 task_args.type.u32[4] = NUM_ENTRIES; 58 mars_task_schedule(&task2_id[i], &task_args, 0); 59 } 60 61 for (i = 0; i < QUEUE_DEPTH; i++) { 62 sprintf(data.text, "Host Data %d", i + 1); 63 mars_task_queue_push(host_to_mpu_ea, &data); 64 } 65 66 for (i = 0; i < QUEUE_DEPTH; i++) { 67 mars_task_queue_pop(mpu_to_host_ea, &data); 68 printf("%s\n", data.text); 69 } 70 71 for (i = 0; i < NUM_TASKS; i++) { 72 mars_task_wait(&task1_id[i], NULL); 73 mars_task_wait(&task2_id[i], NULL); 74 75 mars_task_destroy(&task1_id[i]); 76 mars_task_destroy(&task2_id[i]); 77 } 78 79 mars_task_queue_destroy(host_to_mpu_ea); 80 mars_task_queue_destroy(mpu_to_host_ea); 81 mars_task_queue_destroy(mpu_to_mpu_ea); 82 83 mars_context_destroy(mars_ctx); 84 85 return 0; 86 }
Lines:8-10 | Define the data entry structure. For this sample this is a 64-byte char array.
|
Lines:21-23 | Declare 3 instances of the task queue pointer.
|
Line:24 | Declare a local instance of the task queue data entry structure.
|
Lines:29-39 | Create the 3 task queue instances. int mars_task_queue_create ( arg1: Pass in the MARS context pointer. arg2: Pass in the address of the queue ea instance. arg3: Pass in size of each queue data entry. The size must be a multiple of 16 and not greater than MARS_TASK_QUEUE_ENTRY_SIZE_MAX. arg4: Pass in depth of queue which is the maximum number of data entries allowed in the queue at any time. arg5: Pass in the direction of queue for each instance. The direction must be MARS_TASK_QUEUE_HOST_TO_MPU, MARS_TASK_QUEUE_MPU_TO_HOST, or MARS_TASK_QUEUE_MPU_TO_MPU. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) The first queue is created for host program to task program data passing. The second queue is created for task program to host program data passing. The third queue is created for task program to task program data passing.
|
Lines:44-48 | Create multiple task instances for both the task 1 program and task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch when there is no data available to be popped from the queues.
|
Lines:50-53 | Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage addresses of the queue instances for both host to mpu and mpu to mpu communication. These queues will be used to receive data from the host program and also to send data to task 2 program. Also store the number of data entries task 1 program should expect to process. Schedule the task instance for execution, passing in the task args.
|
Lines:55-58 | Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage addresses of the queue instances for both mpu to mpu and mpu to host communication. These queues will be used to receive data from the task 1 program and also to send data to the host program. Also store the number of data entries task 2 program should expect to process. Schedule the task instance for execution, passing in the task args.
|
Lines:61-64 | Loop to push data into the queue for task 1 program to receive. The data is a string identifying the data id number. Initialize the data queue entry structure with some string identifying the data id. int mars_task_queue_push ( arg1: Pass in the ea of the task queue instance when we created for host to MPU communication. arg2: Pass in the pointer to the data queue entry instance we initialized. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Lines:66-69 | Loop to pop data from the queue that task 2 program populates with the final result data. int mars_task_queue_pop ( arg1: Pass in the ea of the task queue instance we created for host to MPU communication. arg2: Pass in the pointer to the data queue entry to store the data from the queue. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) On Line:68 print the resulting data that has been processed by task 1 program and task 2 program. The final data should be a string identifying the processing path of the data from host program, to task 1 program, to task 2 program.
|
Lines:71-83 | Wait for task completion and destroy all task instances and queue instances and finally destroy the MARS context.
|
(task 1 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 struct queue_entry { 5 char text[64]; 6 }; 7 8 int mars_task_main(const struct mars_task_args *task_args) 9 { 10 int i; 11 uint64_t host_to_mpu_ea = task_args->type.u64[0]; 12 uint64_t mpu_to_mpu_ea = task_args->type.u64[1]; 13 uint32_t num_entries = task_args->type.u32[4]; 14 struct queue_entry data; 15 16 for (i = 0; i < num_entries; i++) { 17 mars_task_queue_pop(host_to_mpu_ea, &data); 18 19 sprintf(&data.text[strlen(data.text)], " -> %s Data %d", 20 mars_task_get_name(), i + 1); 21 22 mars_task_queue_push(mpu_to_mpu_ea, &data); 23 } 24 25 return 0; 26 }
Lines:4-6 | Define the data entry structure. For this sample this is a 64-byte char array. This is a redefinition of host program Lines:8-10.
|
Line:11 | Grab the ea of the queue created in the host program for host to MPU communication from the task arg structure.
|
Line:12 | Grab the ea of the queue created in the host program for MPU to MPU communication from the task arg structure.
|
Line:13 | Grab the number of entries this task needs to pop from the queue and process.
|
Line:14 | Declare a local instance of the task queue data entry structure.
|
Line:16 | Loop the number of data entries this task needs to processed specified by the task_args specified at Line:12.
|
Line:17 | Pop data from the queue being sent from the host program to be processed. If the queue is empty by the time of this call, this task will enter a wait state and its context will be switched out. When the host program pushes new data into the queue and is available to be popped by this task, this task will resume execution and continue.
|
Lines:19-20 | Take the data string received from the host program and append a string identifier for this task.
|
Line:22 | Push the processed data into the queue for the task 2 program to receive.
|
(task 2 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 struct queue_entry { 5 char text[64]; 6 }; 7 8 int mars_task_main(const struct mars_task_args *task_args) 9 { 10 int i; 11 uint64_t mpu_to_mpu_ea = task_args->type.u64[0]; 12 uint64_t mpu_to_host_ea = task_args->type.u64[1]; 13 uint32_t num_entries = task_args->type.u32[4]; 14 struct queue_entry data; 15 16 for (i = 0; i < num_entries; i++) { 17 mars_task_queue_pop(mpu_to_mpu_ea, &data); 18 19 sprintf(&data.text[strlen(data.text)], " -> %s Data %d", 20 mars_task_get_name(), i + 1); 21 22 mars_task_queue_push(mpu_to_host_ea, &data); 23 } 24 25 return 0; 26 }
Lines:4-6 | Define the data entry structure. For this sample this is a 64-byte char array. This is a redefinition of host program Lines:8-10.
|
Line:11 | Grab the ea of the queue created in the host program for MPU to MPU communication from the task arg structure.
|
Line:12 | Grab the ea of the queue created in the host program for MPU to host communication from the task arg structure.
|
Line:13 | Grab the number of entries this task needs to pop from the queue and process.
|
Line:14 | Declare a local instance of the task queue data entry structure.
|
Line:16 | Loop the number of data entries this task needs to processed specified by the task_args specified at Line:12.
|
Line:17 | Pop data from the queue being sent from the task 1 program to be processed. If the queue is empty by the time of this call, this task will enter a wait state and its context will be switched out. When the task 1 program pushes new data into the queue and is available to be popped by this task, this task will resume execution and continue.
|
Lines:19-20 | Take the data string received from the task 1 program and append a string identifier for this task.
|
Line:22 | Push the processed data into the queue for the host program to receive.
|
This sample code creates 10 task instances of the same task program and creates a single semaphore to protect access of a shared resource integer counter located in main storage. As each task runs, it tries to obtain the semaphore and increments the shared resource counter before releasing the semaphore. Since the shared resource is protected from concurrent accesses, the resulting value of the counter should equal to the number of total tasks, d, when the program has completed.
(host program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 #define NUM_TASKS 10 5 6 static void *task_program_elf_image; 7 8 int main(void) 9 { 10 struct mars_context *mars_ctx; 11 struct mars_task_id task_id[NUM_TASKS]; 12 struct mars_task_args task_args; 13 uint64_t semaphore_ea; 14 uint32_t shared_resource __attribute__((aligned(16))); 15 int i; 16 17 mars_context_create(&mars_ctx, 0, 0); 18 19 mars_task_semaphore_create(mars_ctx, &semaphore_ea, 1); 20 21 shared_resource = 0; 22 23 printf("HOST : Main() - Shared Resource Counter = %d\n", shared_resource); 24 25 for (i = 0; i < NUM_TASKS; i++) { 26 char name[16]; 27 sprintf(name, "Task %d", i); 28 29 mars_task_create(mars_ctx, &task_id[i], name, task_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 30 31 task_args.type.u64[0] = semaphore_ea; 32 task_args.type.u64[1] = mars_ptr_to_ea(&shared_resource); 33 mars_task_schedule(&task_id[i], &task_args, 0); 34 } 35 36 for (i = 0; i < NUM_TASKS; i++) { 37 mars_task_wait(&task_id[i], NULL); 38 mars_task_destroy(&task_id[i]); 39 } 40 41 printf("HOST : Main() - Shared Resource Counter = %d\n", shared_resource); 42 43 mars_task_semaphore_destroy(semaphore); 44 45 mars_context_destroy(mars_ctx); 46 47 return 0; 48 }
Line:11 | Declare an array of 10 task ids for each instance of task 1 program we plan to create and schedule.
|
Line:13 | Declare an instance of the task semaphore ea.
|
Line:14 | Declare an instance of a shared resource counter we plan to modify from various tasks.
|
Line:19 | Create the task semaphore instance. int mars_task_semaphore_create ( arg1: Pass in the MARS context pointer. arg2: Pass in the address of the sempahore ea. arg3: Pass in the total number of simultaneous task accesses allowed. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Line:21 | Initialize the shared resource counter to 0.
|
Line:23 | Print the current value of the shared resource counter to stdout.
|
Line:29 | Create 10 task instances of task 1 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context switch.
|
Lines:31-33 | Initialize the task args we want passed into task 1 program's mars_task_main function. Store the ea of the semaphore instance. Also store the host storage address of the shared resource instance, so that each task can modify it.
|
Lines:36-39 | Wait for completion and destroy all task instances.
|
Lines:41 | Print the current value of the shared resource counter to stdout. Since each one of 10 tasks should have incremented the shared resource counter one time with no simultaneous access allowed, the resulting shared resource counter should equal the number of tasks of 10.
|
Lines:43-45 | Destroy the semaphore and MARS context.
|
(task program)
1 #include <mars/task.h> 2 3 int mars_task_main(const struct mars_task_args *task_args) 4 { 5 uint64_t semaphore_ea = task_args->type.u64[0]; 6 uint64_t shared_resource_ea = task_args->type.u64[1]; 7 uint32_t shared_resource __attribute__((aligned(16))); 8 9 mars_task_semaphore_acquire(semaphore_ea); 10 11 get(&shared_resource, shared_resource_ea, sizeof(uint32_t)); 12 13 shared_resource++; 14 15 put(&shared_resource, shared_resource_ea, sizeof(uint32_t)); 16 17 mars_task_semaphore_release(semaphore_ea); 18 19 return 0; 20 }
Line:5 | Grab the ea of the semaphore initialized in the host program from the task arg structure.
|
Line:6 | Grab the ea of the shared resource counter declared in the host program.
|
Line:7 | Declare a local instance of the shared resource counter.
|
Line:9 | Attempt to acquire access to the semaphore. int mars_task_semaphore_acquire ( arg1: Pass in the ea of the sempahore instance initialized at Line:5. return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If the semaphore cannot be acquired at the time of this call, this task will enter a wait state and its context will be switched out. When the semaphore is released by another task and available for this task to acquire, this task will resume execution and continue.
|
Line:11 | Memory transfer from host storage to MPU storage the shared resource counter instance. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
|
Line:13 | Increment the shared resource counter. Since the shared resource is proteced by the semaphore, it is guaranteed that no other tasks have access to the same shared resource during the time this task holds the semaphore.
|
Line:15 | Memory transfer from MPU storage to host storage the modified shared resource counter instance. The function "put" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from MPU storage to host storage on your specific platform.
|
Line:17 | Release the access to the semaphore. int mars_task_semaphore_release ( arg1: Pass in the ea of the sempahore instance. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
This sample code creates 2 separate task instances. Task 1 can only begin processing after the host program has waited 1 second and signals for task 1 to begin. Task 2 can only begin after task 1 has completed its processing and signals for task 2 to begin. Task 1 must also wait to receive a signal back from task 2 notifying that it has finished processing before it itself can finish execution. The host program waits for completion of both tasks before finishing.
(host program)
1 #include <unistd.h> 2 #include <mars/task.h> 3 4 static void *task1_program_elf_image; 5 static void *task2_program_elf_image; 6 7 int main(void) 8 { 9 struct mars_context *mars_ctx; 10 struct mars_task_id task1_id; 11 struct mars_task_id task2_id; 12 struct mars_task_args task_args; 13 14 mars_context_create(&mars_ctx, 0, 0); 15 16 mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 17 mars_task_create(mars_ctx, &task2_id, "Task 2", task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 18 19 task_args.type.u64[0] = mars_ptr_to_ea(&task2_id); 20 mars_task_schedule(&task1_id, &task_args, 0); 21 22 task_args.type.u64[0] = mars_ptr_to_ea(&task1_id); 23 mars_task_schedule(&task2_id, &task_args, 0); 24 25 sleep(1); 26 27 mars_task_signal_send(&task1_id); 28 29 mars_task_wait(&task1_id, NULL); 30 mars_task_wait(&task2_id, NULL); 31 32 mars_task_destroy(&task1_id); 33 mars_task_destroy(&task2_id); 34 35 mars_context_destroy(mars_ctx); 36 37 return 0; 38 }
Lines:16-17 | Create task instances for task 1 program and task 2 program each with context save areas.
|
Lines:19-20 | Initialize the task args we want passed into task 1 program's mars_task_main function. Store the host storage address of the task id structure of task 2. Schedule task 1 for execution.
|
Lines:22-23 | Initialize the task args we want passed into task 2 program's mars_task_main function. Store the host storage address of the task id structure of task 1. Schedule task 2 for execution.
|
Line:25 | Sleep for 1 second before continuing. This allows enough time for the tasks to be scheduled and begin execution. This is only to demonstrate the task entering the wait state when waiting for a signal.
|
Line:34 | Send a signal to task 1 that is waiting for a signal to allow it to continue execution. int mars_task_signal_send ( arg1: Pass in the pointer to the task id instance of the created task we want to signal. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Lines:29-35 | Wait for completion and destroy the 2 task instances and finally finalize the MARS context.
|
(task 1 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 struct mars_task_id task2_id; 7 8 get(&task2_id, task_args->type.u64[0], sizeof(struct mars_task_id)); 9 10 mars_task_signal_wait(); 11 12 printf("MPU(%d): %s - Hello!\n", 13 mars_task_get_kernel_id(), mars_task_get_name()); 14 15 mars_task_signal_send(&task2_id); 16 17 mars_task_signal_wait(); 18 19 return 0; 20 }
Line:6 | Declare a local task id instance to store the task 2's id.
|
Line:8 | Memory transfer from host storage to MPU storage the task id instance of task 2. The host storage address of the task id for task 2 is obtained from the task_args passed in from the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
|
Line:10 | Wait for a signal from the host program before continuing execution. int mars_task_signal_wait ( return: MARS_SUCCESS is returned on success and a negative error value otherwise. ) If a signal has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the task receives a signal, this task will resume execution and continue.
|
Line:15 | Send a signal to task 2 that is waiting for a signal to allow task 2 execution to resume. int mars_task_signal_send ( arg1:: Address of the local task id instance of task 2. return: MARS_SUCCESS is returned on success and a negative error value otherwise. )
|
Lines:9-10 | Wait for a signal from the task 2 program before continuing execution.
|
(task 2 program)
1 #include <stdio.h> 2 #include <mars/task.h> 3 4 int mars_task_main(const struct mars_task_args *task_args) 5 { 6 struct mars_task_id task1_id; 7 8 get(&task1_id, task_args->type.u64[0], sizeof(struct mars_task_id)); 9 10 mars_task_signal_wait(); 11 12 printf("MPU(%d): %s - Hello!\n", 13 mars_task_get_kernel_id(), mars_task_get_name()); 14 15 mars_task_signal_send(&task1_id); 16 17 return 0; 18 }
Line:6 | Declare a local task id instance to store the task 1's id.
|
Line:8 | Memory transfer from host storage to MPU storage the task id instance of task 1. The host storage address of the task id for task 1 is obtained from the task_args passed in from the host program. The function "get" shown here is a generic place holder for the platform specific function to do the memory transfer. Please refer to your platform specific API to learn how to do the memory transfer from host storage to MPU storage on your specific platform.
|
Line:10 | Wait for a signal from the host program before continuing execution. If a signal has not been set by the time of this call, this task will enter a wait state and its context will be switched out. When the task receives a signal, this task will resume execution and continue.
|
Line:15 | Send a signal to task 1 that is waiting for a signal to allow task 1 execution to resume.
|
In this program, the data partitioning process of the input image is handled by the MARS main task 1 program, and the actual grayscale conversion process is handled by multiple instances of the MARS sub task 2 program. As a result, the major processes of grayscale conversion processing can be executed all on the MPUs. The following describes detailed processing of each program.
The host program is executed as follows:
1. Create a MARS context.
2. Create both the main and sub tasks.
3. Create a task queue and task event flag to be used for communication between the tasks.
4. Schedule the main task for execution.
5. Wait for completion of the main task.
6. Destroy the tasks, synchronization objects and MARS context.
To create the main task, the host program passes the following information to the main task:
1. parameters for grayscale conversion processing (effective addresses and number of pixels of input/output buffers)
2. host addresses of the created sub task ids
3. host addresses of synchronization objects (queue and event flag) to be used for communication between the tasks
The main task 1 program is executed as follows:
1. Retrieves data to be passed from host program to sub tasks.
2. Schedule instances of sub tasks for execution.
3. Partition grayscale conversion processing.
4. Insert parameters for partitioned processing to task queues.
5. Wait for completion of sub tasks using task event flag.
Only the host address of the task queue is passed from the main task to the sub tasks through the task arguments. Other information is passed to the sub tasks through the task queue.
The main task and sub tasks pass the following parameters for the partitioned grayscale conversion processing through the task queue:
1. host addresses of task event flag
2. host addresses of partitioned input data
3. host addresses of partitioned output data
4. number of pixels of partitioned input/output data
5. identification numbers to be used for sending completion notification of partitioned data
Finally, the sub task 2 program instances are executed as follows.
1. Get parameters for processing partitioned by main task from the task queue.
2. Execute grayscale conversion processing.
3. Send completion notification to main task using task event flag.
By using MARS, the MPUs can perform all the processing necessary except for the initialization of the MARS execution environment, and efficient applications for MPU-centric program execution and control can be created.
In this tutorial program, MARS instances are processed in the function rgb2y() in the host program so that readers can easily understand the program. However, this method is not generally recommended because if the function is frequently called (such as when multiple images are processed in an application), the MARS isntances are initialized every time the function is called and becomes very inefficient. Ideally, programs should be designed so that MARS instances are needed to be initialized only once.
1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <string.h> 4 #include <malloc.h> 5 #include <sys/stat.h> 6 #include <libspe2.h> 7 #include <mars/task.h> 8 9 #define IN_FILENAME "in.ppm" 10 #define OUT_FILENAME "out.ppm" 11 #define PPM_MAGIC "P6" 12 13 #define NUM_TASKS 4 14 #define QUEUE_DEPTH 4 15 16 typedef struct _image_t { 17 int width; 18 int height; 19 unsigned char *src; 20 unsigned char *dst; 21 } image_t; 22 23 typedef struct { 24 uint64_t ea_task_id; 25 uint64_t ea_event; 26 uint64_t ea_queue; 27 uint64_t ea_src; 28 uint64_t ea_dst; 29 uint32_t num; 30 uint32_t pad; 31 } grayscale_params_t; 32 33 typedef struct { 34 uint64_t ea_event; 35 uint64_t ea_src; 36 uint64_t ea_dst; 37 uint32_t num; 38 uint32_t id; 39 } grayscale_queue_elem_t; 40 41 extern struct spe_program_handle task1_spe_prog; 42 extern struct spe_program_handle task2_spe_prog; 43 44 static struct mars_context *mars_ctx; 45 static struct mars_task_id task1_id; 46 static struct mars_task_id task2_id[NUM_TASKS]; 47 static struct mars_task_args task_args; 48 static uint64_t ea_event; 49 static uint64_t ea_queue; 50 51 static grayscale_params_t grayscale_params __attribute__((aligned(16))); 52 53 /* initialize MARS execution environment for rgb2y processing */ 54 void rgb2y(unsigned char *src, unsigned char *dst, int num) 55 { 56 int ret, i; 57 58 ret = mars_context_create(&mars_ctx, 0, 0); 59 if (ret) { 60 printf("Could not create MARS context! (%d)\n", ret); 61 exit(1); 62 } 63 64 ret = mars_task_event_flag_create(mars_ctx, &ea_event, 65 MARS_TASK_EVENT_FLAG_MPU_TO_MPU, 66 MARS_TASK_EVENT_FLAG_CLEAR_AUTO); 67 if (ret) { 68 printf("Could not create MARS task event flag! (%d)\n", ret); 69 exit(1); 70 } 71 72 ret = mars_task_queue_create(mars_ctx, &ea_queue, 73 sizeof(grayscale_queue_elem_t), 74 QUEUE_DEPTH, 75 MARS_TASK_QUEUE_MPU_TO_MPU); 76 if (ret) { 77 printf("Could not create MARS task queue! (%d)\n", ret); 78 exit(1); 79 } 80 81 ret = mars_task_create(mars_ctx, &task1_id, 82 "Grayscale Main Task", 83 task1_spe_prog.elf_image, 84 MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 85 if (ret) { 86 printf("Could not create MARS main task! (%d)\n", ret); 87 exit(1); 88 } 89 90 for (i = 0; i < NUM_TASKS; i++) { 91 ret = mars_task_create(mars_ctx, &task2_id[i], 92 "Grayscale Sub Task", 93 task2_spe_prog.elf_image, 94 MARS_TASK_CONTEXT_SAVE_SIZE_MAX); 95 if (ret) { 96 printf("Could not create MARS sub task! (%d)\n", ret); 97 exit(1); 98 } 99 } 100 101 /* initialize grayscale params */ 102 grayscale_params.ea_task_id = mars_ptr_to_ea(&task2_id); 103 grayscale_params.ea_event = ea_event; 104 grayscale_params.ea_queue = ea_queue; 105 grayscale_params.ea_src = mars_ptr_to_ea(src); 106 grayscale_params.ea_dst = mars_ptr_to_ea(dst); 107 grayscale_params.num = num; 108 109 /* initialize task args */ 110 task_args.type.u64[0] = mars_ptr_to_ea(&grayscale_params); 111 112 ret = mars_task_schedule(&task1_id, &task_args, 0); 113 if (ret) { 114 printf("Could not schedule MARS main task! (%d)\n", ret); 115 exit(1); 116 } 117 118 ret = mars_task_wait(&task1_id, NULL); 119 if (ret) { 120 printf("Could not wait for MARS main task! (%d)\n", ret); 121 exit(1); 122 } 123 124 ret = mars_task_destroy(&task1_id); 125 if (ret) { 126 printf("Could not destroy MARS main task! (%d)\n", ret); 127 exit(1); 128 } 129 130 for (i = 0; i < NUM_TASKS; i++) { 131 ret = mars_task_destroy(&task2_id[i]); 132 if (ret) { 133 printf("Could not destroy MARS sub task! (%d)\n", ret); 134 exit(1); 135 } 136 } 137 138 ret = mars_context_destroy(mars_ctx); 139 if (ret) { 140 printf("Could not destroy MARS context! (%d)\n", ret); 141 exit(1); 142 } 143 } 144 145 /* read ppm data from input file */ 146 void read_ppm(image_t *img, char *fname) 147 { 148 char *token, *pc, *buf, *del = " \t\n"; 149 int i, w, h, luma, pixs, filesize; 150 struct stat st; 151 unsigned char *dot; 152 FILE *fp; 153 154 /* read raw data */ 155 stat(fname, &st); 156 filesize = (int) st.st_size; 157 buf = (char *) malloc(filesize * sizeof(char)); 158 159 if ((fp = fopen(fname, "r")) == NULL) { 160 fprintf(stderr, "error: failed to open file %s\n", fname); 161 exit(1); 162 } 163 164 fseek(fp, 0, SEEK_SET); 165 fread(buf, filesize * sizeof(char), 1, fp); 166 fclose(fp); 167 168 /* validate file format */ 169 token = (char *) (unsigned long) strtok(buf, del); 170 if (strncmp(token, PPM_MAGIC, 2) != 0) { 171 fprintf(stderr, "error: invalid file format\n"); 172 exit(1); 173 } 174 175 /* skip comments */ 176 token = (char *) (unsigned long) strtok(NULL, del); 177 if (token[0] == '#') { 178 token = (char *) (unsigned long) strtok(NULL, "\n"); 179 token = (char *) (unsigned long) strtok(NULL, del); 180 } 181 182 /* read picture size (and luma) */ 183 w = strtoul(token, &pc, 10); 184 token = (char *) (unsigned long) strtok(NULL, del); 185 h = strtoul(token, &pc, 10); 186 token = (char *) (unsigned long) strtok(NULL, del); 187 luma = strtoul(token, &pc, 10); 188 189 img->width = w; 190 img->height = h; 191 192 /* allocate an aligned memory */ 193 pixs = w * h; 194 img->src = (unsigned char *)memalign(16, pixs*4); 195 img->dst = (unsigned char *)memalign(16, pixs*4); 196 197 /* read rgb data with 'r,g,b,0' formatted */ 198 dot = img->src; 199 pc++; 200 for (i = 0; i < pixs*4; i++) { 201 if (i % 4 == 3) { 202 *dot++ = 0; 203 } else { 204 *dot++ = *pc++; 205 } 206 } 207 208 return; 209 } 210 211 /* write ppm data to output file */ 212 void write_ppm(image_t *img, char *fname) 213 { 214 int i; 215 int w = img->width; 216 int h = img->height; 217 unsigned char *dot = img->dst; 218 FILE *fp; 219 220 if ((fp = fopen(fname, "wb+")) == NULL) { 221 fprintf(stderr, "failed to open file %s\n", fname); 222 exit(1); 223 } 224 225 fprintf(fp, "%s\n", PPM_MAGIC); 226 fprintf(fp, "%d %d\n", w, h); 227 fprintf(fp, "255\n"); 228 229 for (i = 0; i < (w * h * 4); i++) { 230 if (i % 4 == 3) { 231 dot++; 232 } else { 233 putc((int) *dot++, fp); 234 } 235 } 236 237 fclose(fp); 238 239 return; 240 } 241 242 void delete_image(image_t *img) 243 { 244 free(img->src); 245 free(img->dst); 246 247 return; 248 } 249 250 int main(int argc, char **argv) 251 { 252 image_t image; 253 254 printf(INFO); 255 256 read_ppm(&image, IN_FILENAME); 257 258 rgb2y(image.src, image.dst, image.width * image.height); 259 260 write_ppm(&image, OUT_FILENAME); 261 262 delete_image(&image); 263 264 return 0; 265 }
Line:9 | Filename of input source image.
|
Line:10 | Filename of output source image.
|
Line:13 | Define the number of the sub task 2 programs as a constant NUM_TASKS. In this tutorial, 4 instances of the sub task are created to allocate the grayscale conversion processing to each.
|
Line:14 | Define the depth of the task queue as a constant QUEUE_DEPTH. In this tutorial, the depth of the task queue is set to 4 in accordance with the number of the sub task instances.
|
Lines:16-21 | Define the structure to store the image information.
|
Lines:23-31 | Define the structure of the parameter set for storing the information to be passed into the main task 1 program.
|
Lines:33-39 | Define the structure for the task queue data element. Each entry in the task queue will be an instance of this structure. The size of this structure must be a multiple of 16 bytes.
|
Line:51 | Declare an instance of the structure we defined at Lines:23-31 for passing parameters to the main task.
|
Line:54 | This function handles the gray scale processing of input image data buffer and outputs the results to the destination buffer.
|
Lines:58-62 | Create the MARS context.
|
Lines:64-70 | Create the task event flag instance for MPU to MPU communication. This will be used by the sub tasks instances to notify the main task that their portion of grayscale processing is completed.
|
Lines:72-79 | Create the task queue instance for MPU to MPU communication. This will be used by the main task to send grayscale processing requests to the sub tasks.
|
Lines:81-88 | Create the task for the main task 1 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the main task to context switch.
|
Lines:90-99 | Create multiple instances for the sub task 2 program. Specify a context save area size of MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the sub tasks to context switch.
|
Lines:101-107 | Initialize the parameters for grayscale conversion processing in the parameter structure declared at Line:51. The parameters stored in this structure are the addresses of the task ids of the created sub tasks, task event flag and task queue, storage areas for input/output image data, and total number of pixels of image data. The host address of this structure is passed to the main task using the task argument for the main task.
|
Lines:112-116 | Schedule the main task for execution.
|
Lines:118-122 | Wait for the main task to complete execution.
|
Lines:124-128 | Destroy the main task instance.
|
Lines:130-136 | Destroy the sub task instances.
|
Lines:138-143 | Destroy the MARS context.
|
Lines:146-209 | This function reads the input source image file from Line:9 and stores the image data into the structure defined at Lines:16-21.
|
Lines:212-240 | This function writes the output grayscaled image data to the output image file from Line:10.
|
Lines:242-248 | This function cleans up an instance of the image data structure.
|
Lines:250-263 | This is the entry function of the host program that does the following:
1. Read the input image data from input image file.
|
(task 1 program)
1 #include <stdio.h> 2 #include <stdint.h> 3 #include <spu_intrinsics.h> 4 #include <spu_mfcio.h> 5 #include <mars/task.h> 6 7 #define NUM_TASKS 4 8 9 #define ALIGN4_UP(x) (((x) + 0x3) & ~0x3) 10 11 typedef struct { 12 uint64_t ea_task_id; 13 uint64_t ea_event; 14 uint64_t ea_queue; 15 uint64_t ea_src; 16 uint64_t ea_dst; 17 uint32_t num; 18 uint32_t pad; 19 } grayscale_params_t; 20 21 typedef struct { 22 uint64_t ea_event; 23 uint64_t ea_src; 24 uint64_t ea_dst; 25 uint32_t num; 26 uint32_t id; 27 } grayscale_queue_elem_t; 28 29 static struct mars_task_id task2_id[NUM_TASKS]; 30 static struct mars_task_args task2_args; 31 32 static grayscale_params_t grayscale_params __attribute__((aligned(16))); 33 static grayscale_queue_elem_t data __attribute__((aligned(16))); 34 35 int mars_task_main(const struct mars_task_args *task_args) 36 { 37 int ret, i, tag = 0; 38 int num, remain, chunk; 39 uint64_t ea_task_id, ea_event, ea_queue; 40 uint64_t ea_src, ea_dst; 41 uint16_t mask = 0; 42 43 /* Get application parameters */ 44 mfc_get(&grayscale_params, task_args->type.u64[0], sizeof(grayscale_params_t), tag, 0, 0); 45 mfc_write_tag_mask(1 << tag); 46 mfc_read_tag_status_all(); 47 48 ea_task_id = grayscale_params.ea_task_id; 49 ea_event = grayscale_params.ea_event; 50 ea_queue = grayscale_params.ea_queue; 51 ea_src = grayscale_params.ea_src; 52 ea_dst = grayscale_params.ea_dst; 53 num = grayscale_params.num; 54 55 /* Get sub task ids */ 56 mfc_get(&task2_id, ea_task_id, sizeof(struct mars_task_id) * NUM_TASKS, tag, 0, 0); 57 mfc_write_tag_mask(1 << tag); 58 mfc_read_tag_status_all(); 59 60 /* Pass queue ea to sub task args */ 61 task2_args.type.u64[0] = ea_queue; 62 63 /* Schedule sub tasks for execution */ 64 for (i = 0; i < NUM_TASKS; i++) { 65 ret = mars_task_schedule(&task2_id[i], &task2_args, 0); 66 if (ret) { 67 printf("Could not schedule MARS sub task! (%d)\n", ret); 68 return 1; 69 } 70 } 71 72 remain = num; 73 chunk = num/NUM_TASKS; 74 for (i = 0; i < NUM_TASKS; i++) { 75 data.ea_event = ea_event; 76 data.ea_src = ea_src; 77 data.ea_dst = ea_dst; 78 data.id = i; 79 if (remain > chunk) { 80 data.num = ALIGN4_UP(chunk); 81 } else { 82 data.num = ALIGN4_UP(remain); 83 } 84 85 /* Push data to queue */ 86 ret = mars_task_queue_push_begin(ea_queue, &data, tag); 87 if (ret) { 88 printf("Could not push data to MARS task queue! (%d)\n", ret); 89 return 1; 90 } 91 ret = mars_task_queue_push_end(ea_queue, tag); 92 if (ret) { 93 printf("Could not complete data push to MARS task queue! (%d)\n", ret); 94 return 1; 95 } 96 97 remain -= chunk; 98 ea_src += (chunk * 4); 99 ea_dst += (chunk * 4); 100 101 /* Create event mask */ 102 mask |= 1 << i; 103 } 104 105 /* Wait until specified bits are set to event flag */ 106 ret = mars_task_event_flag_wait(ea_event, mask, MARS_TASK_EVENT_FLAG_MASK_AND, NULL); 107 if (ret) { 108 printf("Could not wait for MARS task event flag! (%d)\n", ret); 109 return 1; 110 } 111 112 /* Wait for all scheduled sub tasks to complete */ 113 for (i = 0; i < NUM_TASKS; i++) { 114 ret = mars_task_wait(&task2_id[i], NULL); 115 if (ret) { 116 printf("Could not wait for MARS sub task! (%d)\n", ret); 117 return 1; 118 } 119 } 120 121 return 0; 122 }
Line:7 | Define the number of the sub task 2 programs that need to be scheduled for execution. This number should be the same as the one specified in the host program at Line:13.
|
Lines:11-19 | Define the structure for the parameters passed in from the host program. This is a redefinition of the same structure defined in the host program at Lines:23-31.
|
Lines:21-27 | Define the structure for the task queue entry data. This is a redefinition of the same structure defined in the host program at Lines:33-38.
|
Lines:29-30 | Declare an array of task ids and an instance of a task arg structure that will be passed into the sub task.
|
Lines:32-33 | Declare an instance of the parameter structure to be passed in from the host program and an instance of the task queue data entry structure.
|
Lines:44-46 | Memory transfer the grayscale parameter structure from the host storage address specified in the task args sent from the host program.
|
Lines:48-53 | Initialize the local variables with the parameters from the host program.
|
Lines:56-58 | Memory transfer the array of sub task ids from the host storage address specified in the task args sent from the host program.
|
Line:61 | Initialize the task args to pass into the sub task and give it the host address of the task queue.
|
Lines:64-70 | Schedule all the instances of the sub tasks for execution.
|
Lines:72-103 | Partition the source image data evenly to each of the multiple sub task instances. Push the partitioned data into the task queue so that each sub task can pop it and begin processing. The parameters for the partitioned data indicate the host addresses and the number of pixels of the partitioned input/output data and the host addresses of task event flag and the identification numbers of each sub task.
|
Lines:106-110 | Wait for the task event flag event that notifies when all sub tasks have completed their processing. The main task will enter a wait state until the event is received.
|
Lines:113-119 | Wait for completion of all sub tasks.
|
(task 2 program)
1 #include <stdio.h> 2 #include <stdint.h> 3 #include <spu_intrinsics.h> 4 #include <spu_mfcio.h> 5 #include <mars/task.h> 6 7 #define MAX_BUFSIZE (16 << 10) 8 9 typedef struct { 10 uint64_t ea_event; 11 uint64_t ea_src; 12 uint64_t ea_dst; 13 uint32_t num; 14 uint32_t id; 15 } grayscale_queue_elem_t; 16 17 static unsigned char src_spe[MAX_BUFSIZE] __attribute__((aligned(128))); 18 static unsigned char dst_spe[MAX_BUFSIZE] __attribute__((aligned(128))); 19 20 static grayscale_queue_elem_t data __attribute__((aligned(16))); 21 22 void rgb2y(unsigned char *src, unsigned char *dst, int num) 23 { 24 int i; 25 26 __vector unsigned char *vsrc = (__vector unsigned char *) src; 27 __vector unsigned char *vdst = (__vector unsigned char *) dst; 28 29 __vector unsigned int vr, vg, vb, vy, vpat; 30 __vector float vfr, vfg, vfb, vfy; 31 32 __vector float vrconst = spu_splats(0.29891f); 33 __vector float vgconst = spu_splats(0.58661f); 34 __vector float vbconst = spu_splats(0.11448f); 35 __vector float vfzero = spu_splats(0.0f); 36 __vector unsigned int vmax = spu_splats((unsigned int) 255); 37 38 __vector unsigned char vpatr = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x00, 39 0x10, 0x10, 0x10, 0x04, 40 0x10, 0x10, 0x10, 0x08, 41 0x10, 0x10, 0x10, 0x0c }; 42 __vector unsigned char vpatg = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x01, 43 0x10, 0x10, 0x10, 0x05, 44 0x10, 0x10, 0x10, 0x09, 45 0x10, 0x10, 0x10, 0x0d }; 46 __vector unsigned char vpatb = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x02, 47 0x10, 0x10, 0x10, 0x06, 48 0x10, 0x10, 0x10, 0x0a, 49 0x10, 0x10, 0x10, 0x0e }; 50 __vector unsigned char vpaty = (__vector unsigned char) { 0x03, 0x03, 0x03, 0x10, 51 0x07, 0x07, 0x07, 0x10, 52 0x0b, 0x0b, 0x0b, 0x10, 53 0x0f, 0x0f, 0x0f, 0x10 }; 54 __vector unsigned char vzero = spu_splats((unsigned char) 0); 55 56 for (i = 0; i < num/4; i++) { 57 vr = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatr); 58 vg = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatg); 59 vb = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatb); 60 61 vfr = spu_convtf(vr, 0); 62 vfg = spu_convtf(vg, 0); 63 vfb = spu_convtf(vb, 0); 64 65 vfy = spu_madd(vfr, vrconst, vfzero); 66 vfy = spu_madd(vfg, vgconst, vfy); 67 vfy = spu_madd(vfb, vbconst, vfy); 68 69 vy = spu_convtu(vfy, 0); 70 71 vpat = spu_cmpgt(vy, vmax); 72 vy = spu_sel(vy, vmax, vpat); 73 74 vdst[i] = (__vector unsigned char) spu_shuffle(vy, (__vector unsigned int) vzero, vpaty); 75 } 76 77 return; 78 } 79 80 int mars_task_main(const struct mars_task_args *task_args) 81 { 82 int ret, tag = 0; 83 int my_id; 84 uint64_t ea_event, ea_queue; 85 uint16_t bits; 86 uint64_t ea_src, ea_dst; 87 unsigned int remain, num; 88 89 ea_queue = task_args->type.u64[0]; 90 91 /* Pop data from queue */ 92 ret = mars_task_queue_pop_begin(ea_queue, &data, tag); 93 if (ret) { 94 printf("Could not pop data from MARS task queue! (%d)\n", ret); 95 return 1; 96 } 97 ret = mars_task_queue_pop_end(ea_queue, tag); 98 if (ret) { 99 printf("Could not complete data pop from MARS task queue! (%d)\n", ret); 100 return 1; 101 } 102 103 my_id = data.id; 104 ea_event = data.ea_event; 105 ea_src = data.ea_src; 106 ea_dst = data.ea_dst; 107 remain = data.num; 108 109 /* main loop */ 110 while (remain > 0) { 111 if (remain > MAX_BUFSIZE/4) { 112 num = MAX_BUFSIZE/4; 113 } else { 114 num = remain; 115 } 116 117 /* DMA Transfer : GET input data */ 118 mfc_get(src_spe, ea_src, num * 4, tag, 0, 0); 119 mfc_write_tag_mask(1 << tag); 120 mfc_read_tag_status_all(); 121 122 /* convert to grayscale data */ 123 rgb2y(src_spe, dst_spe, num); 124 125 /* DMA Transfer : PUT output data */ 126 mfc_put(dst_spe, ea_dst, num * 4, tag, 0, 0); 127 mfc_write_tag_mask(1 << tag); 128 mfc_read_tag_status_all(); 129 130 remain -= num; 131 ea_src += num * 4; 132 ea_dst += num * 4; 133 } 134 135 /* Set bit to SPURS event flag */ 136 bits = 1 << my_id; 137 ret = mars_task_event_flag_set(ea_event, bits); 138 if (ret) { 139 printf("Could not set MARS task event flag! (%d)\n", ret); 140 return 1; 141 } 142 143 return 0; 144 }
Lines:9-15 | Define the structure for the task queue entry data. This is a redefinition of the same structure defined in the host program at Lines:33-38 as well as in the task 1 program at Lines:11-19.
|
Lines:17-18 | Declare instances of the source and destination buffer to store the processing input/output data.
|
Lines:22-78 | This function handles the grayscale processing of the partitioned input image data in the source buffer and stores the output to the destination buffer.
|
Line:89 | Get the host address of the task queue passed in from the main task.
|
Lines:92-101 | Pop the data from the task queue. If the main task has not pushed data into the task queue by the time of this call, this task will enter a wait state and its context will be switched out. When the task is able to pop data from the task queue, this task will resume execution and continue.
|
Lines:103-107 | Initialize the local variables with the parameters from the data popped from the task queue.
|
Line:110 | Loop until all the input data has been processed. Processing of data in each loop iteration is limited by the size of the local buffer sizes specified at Line:7.
|
Lines:118-120 | Memory transfer the input data from host storage to the source buffer declared at Line:17.
|
Line:123 | Do the actual grayscale processing of image data from source buffer to destination buffer.
|
Lines:126-128 | Memory transfer the output data from the destination buffer declared at Line:18 to host storage.
|
Lines:136-141 | Set the task event flag bits specified by this sub task's identification number.
|