Scheduler Pre-20
Version 2 (Mary Laser, 03/24/2012 04:29 pm)
| 1 | 2 | Mary Laser | h1. Scheduler Pre2.0 |
|---|---|---|---|
| 2 | 1 | Dan Stangel | |
| 3 | 1 | Dan Stangel | This document covers the technical implementation of the scheduler. It is intended for anyone who needs to replace, modify, or debug the scheduler. User oriented scheduler documentation can be found at [[foss-scheduler]]. |
| 4 | 1 | Dan Stangel | |
| 5 | 1 | Dan Stangel | This document is not intended for people who want to create an agent (although it certainly helps to know this information). See [[writing_an_agent |Writing an agent ]] if you want to know more about that. |
| 6 | 1 | Dan Stangel | |
| 7 | 1 | Dan Stangel | h2. About the Scheduler |
| 8 | 1 | Dan Stangel | |
| 9 | 1 | Dan Stangel | The scheduler is a _super-agent_, responsible for spawning and managing all other agents. The scheduler balances the needed tasks (found in the job queue) with the available resources. It tries to ensure that: |
| 10 | 1 | Dan Stangel | |
| 11 | 1 | Dan Stangel | |
| 12 | 1 | Dan Stangel | # One task does not lock out other tasks. |
| 13 | 1 | Dan Stangel | # Unused resources are used by other pending tasks. |
| 14 | 1 | Dan Stangel | # Agents are spawned in an optimal fashion (fastest). |
| 15 | 1 | Dan Stangel | # Agents do not exceed the alloted resources. |
| 16 | 1 | Dan Stangel | |
| 17 | 1 | Dan Stangel | |
| 18 | 1 | Dan Stangel | |
| 19 | 1 | Dan Stangel | |
| 20 | 1 | Dan Stangel | Although the scheduler is single-threaded, it manages child processes that run in parallel. |
| 21 | 1 | Dan Stangel | |
| 22 | 1 | Dan Stangel | |
| 23 | 1 | Dan Stangel | * Scheduler is single-threaded. |
| 24 | 1 | Dan Stangel | * Scheduler spawns agents (children) that run in parallel. |
| 25 | 1 | Dan Stangel | |
| 26 | 1 | Dan Stangel | h2. Front-End Communications |
| 27 | 1 | Dan Stangel | |
| 28 | 1 | Dan Stangel | The scheduler communicates with the front-end UI through the database's jobqueue table. This table lists which agents need to run, the necessary parameters for the agent, and the current operation status. |
| 29 | 1 | Dan Stangel | |
| 30 | 1 | Dan Stangel | |
| 31 | 1 | Dan Stangel | * Agents can either be general -- running an any available server (host) -- or they can be host-specific. The jobqueue specifies a "runonpfile" field if the agent should be locked to a specific host. The parameter denoted by the runonpfile field is used to identify the host: run on the host that matches the pfile. |
| 32 | 1 | Dan Stangel | * Jobs can contain a single parameter that is passed to the agent, or an SQL query can be provided for generating parameters. The latter multi-SQL-query (MSQ) is used in lieu of adding thousands of individual jobs to the jobqueue. The scheduler performs the MSQ request and the results are individually passed to agents. |
| 33 | 1 | Dan Stangel | |
| 34 | 1 | Dan Stangel | |
| 35 | 1 | Dan Stangel | |
| 36 | 1 | Dan Stangel | |
| 37 | 1 | Dan Stangel | The combination of host and query type leads to four combinations, but only two combinations are implemented. |
| 38 | 1 | Dan Stangel | |_. |_. Any host |_. Host-specific | |
| 39 | 1 | Dan Stangel | |_. One parameter | OK | N/A | |
| 40 | 1 | Dan Stangel | |_. MSQ | N/A | OK via runonpfile | |
| 41 | 1 | Dan Stangel | |
| 42 | 1 | Dan Stangel | Some example agents: |
| 43 | 1 | Dan Stangel | |
| 44 | 1 | Dan Stangel | |
| 45 | 1 | Dan Stangel | * *wget:* Any host, one parameter (the URL to get) |
| 46 | 1 | Dan Stangel | * *license:* The bSAM agent uses a host-specific MSQ. This allows bSAM to run on the system containing the license files, rather than resorting to NFS accesses. |
| 47 | 1 | Dan Stangel | * *filter_license:* Similar to the bsam agent, the host-specific MSQ reduces NFS access. |
| 48 | 1 | Dan Stangel | |
| 49 | 1 | Dan Stangel | |
| 50 | 1 | Dan Stangel | Some agents could fit into other categories, but are limited by the two available choices: |
| 51 | 1 | Dan Stangel | |
| 52 | 1 | Dan Stangel | |
| 53 | 1 | Dan Stangel | * *unpack:* Host-specific, MSQ. The SQL parameter returns ONE record that contains the pfile and ufile to unpack. This should be implemented as a host-specific one-parameter job, but the only host-specific option is an MSQ. |
| 54 | 1 | Dan Stangel | * Any future agent that only performs database accesses could be better fit as an any-host MSQ job. However, they will likely be implemented as a host-specific MSQ job. |
| 55 | 1 | Dan Stangel | |
| 56 | 1 | Dan Stangel | h3. Adding to the Queue |
| 57 | 1 | Dan Stangel | |
| 58 | 1 | Dan Stangel | Jobs are added to the jobqueue by the front-end UI. The UI knows the desired agent type, whether it is an any-host or MSQ job, and the proper parameters (or SQL). The scheduler has no control over what is added and does not validate whether the added job is correct. |
| 59 | 1 | Dan Stangel | |
| 60 | 1 | Dan Stangel | h3. Job Tracking |
| 61 | 1 | Dan Stangel | |
| 62 | 1 | Dan Stangel | The scheduler tracks jobs based on the jobqueue start and end times. |
| 63 | 1 | Dan Stangel | |
| 64 | 1 | Dan Stangel | |
| 65 | 1 | Dan Stangel | * No start time? Ok to run. |
| 66 | 1 | Dan Stangel | * Start without end? Job is currently being managed by the scheduler. |
| 67 | 1 | Dan Stangel | * Start with end? Job is completed. |
| 68 | 1 | Dan Stangel | |
| 69 | 1 | Dan Stangel | |
| 70 | 1 | Dan Stangel | Some jobs need to be rescheduled. For example, an MSQ may have a "LIMIT 5000" (allowing the scheduler to only manage a few results at a time and permitting a limited timeslice scheduling). This is done by removing the start time when the job completes -- effectively putting the job back into the jobqueue. If the MSQ returns no results, then the end-time is set, completing the job. |
| 71 | 1 | Dan Stangel | The jobs run with the following priorities: |
| 72 | 1 | Dan Stangel | |
| 73 | 1 | Dan Stangel | |
| 74 | 1 | Dan Stangel | # Anything currently running is allowed to run. The rationale: a job may take a long time and it is better not to cancel the job and try to restart it later. |
| 75 | 1 | Dan Stangel | # Any jobs held by the scheduler come next. |
| 76 | 1 | Dan Stangel | # Of the jobs held by the scheduler, jobs for available, active, agents come first. This reduces kill/spawn times. |
| 77 | 1 | Dan Stangel | # If there are no running agents of the correct type, then one is spawned (possibly after killing an incorrect ready/active agent type first). |
| 78 | 1 | Dan Stangel | # The job queue has a column for urgent tasks. These come next. |
| 79 | 1 | Dan Stangel | # Any available job, oldest first. |
| 80 | 1 | Dan Stangel | # Jobs that do not match the available agents are ignored and remain in the jobqueue. |
| 81 | 1 | Dan Stangel | |
| 82 | 1 | Dan Stangel | |
| 83 | 1 | Dan Stangel | The jobqueue keeps a prioritized table in case one agent depends on the results from another agent. As a result, there are frequently jobs in the jobqueue that cannot run (temporarily blocked due to a dependency). |
| 84 | 1 | Dan Stangel | This tracking method has a few limitations: |
| 85 | 1 | Dan Stangel | |
| 86 | 1 | Dan Stangel | |
| 87 | 1 | Dan Stangel | * If the scheduler dies, someone needs to remove the start times on incomplete tasks. (The scheduler tries to do this with signal handling, but sometimes dies before resetting the values.) |
| 88 | 1 | Dan Stangel | * Since the start time can be reset due to rescheduling, there is no way for the front-end to tell how long a job really took. |
| 89 | 1 | Dan Stangel | * The scheduler holds on to jobs. The front-end cannot distinguish between a "held" job and a "running job". |
| 90 | 1 | Dan Stangel | * The scheduler can only hold a few jobs at a time: one "any host" per agent, and four MSQ at a time (MAXMSQ in dbq.c). It is very possible for the four pending MSQ commands to be held and waiting for an agent to become available, while other MSQ commands in the queue could run. |
| 91 | 1 | Dan Stangel | |
| 92 | 1 | Dan Stangel | |
| 93 | 1 | Dan Stangel | |
| 94 | 1 | Dan Stangel | |
| 95 | 1 | Dan Stangel | The main function for checking the queue is in dbq.c: DBProcessQueue(). This checks the jobqueue for new tasks and processes the held MSQ records. |
| 96 | 1 | Dan Stangel | |
| 97 | 1 | Dan Stangel | h3. Signals |
| 98 | 1 | Dan Stangel | |
| 99 | 1 | Dan Stangel | The scheduler watches for a few signals. These are mainly used for debugging: |
| 100 | 1 | Dan Stangel | |
| 101 | 1 | Dan Stangel | |
| 102 | 1 | Dan Stangel | * *SIGINT*. Finish all running jobs, but do not start new ones. When all jobs complete, exit. This is denoted in the code by the "SLOWDEATH" flag and is used to provide a clean exit. |
| 103 | 1 | Dan Stangel | * *SIGQUIT*. Kill all running children, try to reset the jobqueue start time, and exit. |
| 104 | 1 | Dan Stangel | * *SIGTERM*. Handled the same as SIGQUIT. |
| 105 | 1 | Dan Stangel | * *SIGUSR1*. Display a quick summary of running processes (how many are running, waiting, or dead.) |
| 106 | 1 | Dan Stangel | * *SIGUSR2*. Display details about every MSQ job held by the scheduler. This can generate a huge amount of output, but allows debugging MSQ jobs. |
| 107 | 1 | Dan Stangel | * *SIGHUP*. This displays the number of running jobs and the summary of each process (same as SIGUSR1). |
| 108 | 1 | Dan Stangel | * *SIGSEGV*. If there is a crash, display all thread info (SIGHUP) before dying. |
| 109 | 1 | Dan Stangel | |
| 110 | 1 | Dan Stangel | h2. Back-end Children |
| 111 | 1 | Dan Stangel | |
| 112 | 1 | Dan Stangel | All children are treated as finite-state machines. The states (defined in spawn.h) are: |
| 113 | 1 | Dan Stangel | |
| 114 | 1 | Dan Stangel | |
| 115 | 1 | Dan Stangel | * *ST_FAIL = 0*. If an agent spawns and dies too rapidly, then mark it as failed. Failed agents are not respawned for a few minutes. (Prevents infinite spawning/death loops.) The timeout is defined in spawn.c as RespawnInterval (5 minutes) and RespawnCount (5 respawns). If the agent spawns faster than 5 times in 5 minutes, than mark it as a failure. It will remain a failure for RespawnInterval (5 minutes -- the variable is reused). NOTE: only abnormal deaths are counted here. If the scheduler intentionally kills a process (using ST_FREEING), then the number of spawns is reset and it should never reach ST_FAIL (even if it is spawned and killed rapidly). |
| 116 | 1 | Dan Stangel | * *ST_FREE.* The agent is not spawned yet and has no I/O allocated. All agents begin in this state. |
| 117 | 1 | Dan Stangel | * *ST_FREEING.* The agent was spawned, but has been told to die by the scheduler. It is now shutting down and has no I/O allocated. |
| 118 | 1 | Dan Stangel | * *ST_PREP*. The scheduler is preparing a child data structure. The structure has allocated memory but has not yet been spawned. This step prevents a well-timed SIGCHLD from freeing the data structure before the state becomes ST_SPAWNED. |
| 119 | 1 | Dan Stangel | * As an aside: Signals are boolean states and not queued. If three children die at once, then the parent only receives one SIGCHLD. Thus, the scheduler must scan every child when a SIGCHLD is called just in case there were multiple deaths. However, a new-dead child will look just like an old-dead child; there is no distinction. The ST_PREP state prevents an old-dead child from appearing as a new-dead and having its memory freed in the signal interrupt handler, while it was being allocated in the normal (non-interrupt handler) code. |
| 120 | 1 | Dan Stangel | * *ST_SPAWNED.* The agent is spawned but not yet ready (I/O allocated). When the agent sends its first "OK", it will be transitioned to ST_READY. |
| 121 | 1 | Dan Stangel | * *ST_READY.* The agent is live and ready for data. |
| 122 | 1 | Dan Stangel | * *ST_RUNNING.* The agent is actively processing data. When the agent sends an "OK", it will be transitioned back to ST_READY. |
| 123 | 1 | Dan Stangel | * *ST_DONE*. This is used by the MSQ table. Each SQL record has a status field and this indicates that the record is completed. When all MSQ records are completed, the MSQ job is done. |
| 124 | 1 | Dan Stangel | * *ST_END*. This is an unused marked. Since states are numeric, this allows the code to loop over all possible states: for(i=0; i<ST_END; i++)... |
| 125 | 1 | Dan Stangel | |
| 126 | 1 | Dan Stangel | |
| 127 | 1 | Dan Stangel | |
| 128 | 1 | Dan Stangel | |
| 129 | 1 | Dan Stangel | Most of the time, the children are in the ST_FREE, ST_READY, or ST_RUNNING states. |
| 130 | 1 | Dan Stangel | |
| 131 | 1 | Dan Stangel | |
| 132 | 1 | Dan Stangel | *NOTE*: When the scheduler runs, it displays the different state transitions for each child. However, not all transitions are shown. Since active children switch rapidly between ST_READY and ST_RUNNING, these transitions (to and from) are not displayed. |
| 133 | 1 | Dan Stangel | |
| 134 | 1 | Dan Stangel | h3. Talking with Children |
| 135 | 1 | Dan Stangel | |
| 136 | 1 | Dan Stangel | Each spawned agent has stdin and stdout redirected to the scheduler. (Stderr is not redirected.) The workflow is as follows: |
| 137 | 1 | Dan Stangel | |
| 138 | 1 | Dan Stangel | |
| 139 | 1 | Dan Stangel | # The agent is spawned by the scheduler. (Scheduler creates child.) This happens in the GetChild() function. The state is changed from ST_DEAD to ST_PREP (for populating the data structure) to ST_SPAWNED (indicating a process fork). |
| 140 | 1 | Dan Stangel | # When the child is initialized and ready, it writes "OK\n" to stdout. This tells the scheduler that the child can accept data (ST_SPAWNED becomes ST_READY). |
| 141 | 1 | Dan Stangel | # The child begins reading from stdin. |
| 142 | 1 | Dan Stangel | # If stdin closes, then the child should die as quickly as possible. |
| 143 | 1 | Dan Stangel | # If data appears on stdin, then the child should process it. The data will either come from the jq_args column (for any-host agents), or be the results from the MSQ query (in 'column=value' pairs, all on one line). |
| 144 | 1 | Dan Stangel | # When the scheduler sends data to the child (via stdin), the state is changed from ST_READY to ST_RUNNING. No further data will be sent until the child is ready. |
| 145 | 1 | Dan Stangel | # When the child finishes the task AND is ready for the next task, it writes "OK\n" to stdout. This transitions the child from ST_RUNNING to ST_READY. Note: The scheduler sees is no distinction between a child completing the first task and getting ready for the next task. |
| 146 | 1 | Dan Stangel | |
| 147 | 1 | Dan Stangel | |
| 148 | 1 | Dan Stangel | There are a few additional messages that the child can send to stdout: |
| 149 | 1 | Dan Stangel | |
| 150 | 1 | Dan Stangel | |
| 151 | 1 | Dan Stangel | * *"ERROR".* Sending the word "ERROR" on the line can be used to report processing problems. However, this is only logged by the scheduler and not actively used. If a processing error occurs, the agent should DIE. |
| 152 | 1 | Dan Stangel | * *"LOG".* Similar to "ERROR", log lines can be sent to the scheduler. The line contents are recorded in the database "log" table. (In the future, this table will be accessible via the UI.) In general, ERROR messages should be human-readable, while LOG lines should provide details about exactly what failed. |
| 153 | 1 | Dan Stangel | * *"Success".* Similar to "ERROR", this is counted for statistics, but not used by any error handling. It should denote a successfully completed task. |
| 154 | 1 | Dan Stangel | * *"ECHO"*. Anything following this keyword is sent to the scheduler's stderr. |
| 155 | 1 | Dan Stangel | * *"DB: SQL;".* If the child writes a line prefaced by "DB: " (space is needed), then the parameter is treated as a DB command. The scheduler will pass the SQL command to the DB and return any and all results to the agent. This is mainly for debugging and has some serious limitations: |
| 156 | 1 | Dan Stangel | * The scheduler is single-threaded. A very slow DB query could hang the scheduler. |
| 157 | 1 | Dan Stangel | * Many agents are spawned using SSH (to securely run them on remote servers). Although the scheduler can handle an SQL command that is up to 65K (MAXCMD in spawn.h), SSH returns data in blocks on 64 bytes. An SQL query larger than 64 bytes could be split, resulting in an invalid SQL command. |
| 158 | 1 | Dan Stangel | * Since a SELECT may return multiple records. One record is printed per line, in a 'column="value"' format. After the last row, "OK\n" is printed. Here's the problem: if the agent reads this data slowly, it can slow down the scheduler. |
| 159 | 1 | Dan Stangel | |
| 160 | 1 | Dan Stangel | |
| 161 | 1 | Dan Stangel | |
| 162 | 1 | Dan Stangel | |
| 163 | 1 | Dan Stangel | Due to the "DB:" limitations, agents should communicate directly with the DB rather than use the "DB:" command. (The "DB:" was created as a test and has some uses, but agents should not depend on it.) |
| 164 | 1 | Dan Stangel | |
| 165 | 1 | Dan Stangel | |
| 166 | 1 | Dan Stangel | * Anything else. Any other output from the agent is displayed by the scheduler to stderr. These lines are prefaced with the word "DEBUG" and the thread ID. |
| 167 | 1 | Dan Stangel | |
| 168 | 1 | Dan Stangel | h3. Killing Children |
| 169 | 1 | Dan Stangel | |
| 170 | 1 | Dan Stangel | The scheduler limits the number of spawned processes by host. Thus, if a host has a maximum of 4 spawned processes and a fifth is needed, then an existing child is killed first. |
| 171 | 1 | Dan Stangel | |
| 172 | 1 | Dan Stangel | |
| 173 | 1 | Dan Stangel | Under normal circumstances, only children in the ST_READY state are killed. Killing occurs as follows: |
| 174 | 1 | Dan Stangel | |
| 175 | 1 | Dan Stangel | |
| 176 | 1 | Dan Stangel | # Stdin to the child is closed AND the child is sent a SIGHUP. The child is moved from ST_READY to ST_FREEING. |
| 177 | 1 | Dan Stangel | # The child may choose to catch SIGHUP and cleanup any remaining tasks. |
| 178 | 1 | Dan Stangel | # If the child sees that stdin is closed, then it must exist ASAP. Similarly, if the child catches SIGHUP then it must exit quickly. |
| 179 | 1 | Dan Stangel | # Since the child was at ST_READY, the jobqueue is not modified. |
| 180 | 1 | Dan Stangel | # If the child is still in ST_FREEING after 20 seconds (defined in spawn.h as MINKILLTIME), then the child is assassinated using SIGKILL. |
| 181 | 1 | Dan Stangel | # When the child dies, a SIGCHLD is sent to the scheduler. This is caught and used to transition the child from ST_FREEING to ST_FREE. |
| 182 | 1 | Dan Stangel | |
| 183 | 1 | Dan Stangel | |
| 184 | 1 | Dan Stangel | |
| 185 | 1 | Dan Stangel | |
| 186 | 1 | Dan Stangel | There are some abnormal circumstances when the child may die... |
| 187 | 1 | Dan Stangel | |
| 188 | 1 | Dan Stangel | |
| 189 | 1 | Dan Stangel | * If the scheduler aborts, it may send a SIGKILL to every child. Alternately, the operating system may send a SIGKILL, SIGHUP, or SIGINT to the child. When a new instance of the child is created, it should check for any residues from past deaths. |
| 190 | 1 | Dan Stangel | * If a child aborts, the scheduler receives a SIGCHLD. However, since the child was not in ST_FREEING, it is treated as an abnormal death. The thread info is sent to stderr and the death count is increased. The death count is used by RespawnInterval and RespawnCount to detect runaway spawning/freeing loops. |
| 191 | 1 | Dan Stangel | |
| 192 | 1 | Dan Stangel | |
| 193 | 1 | Dan Stangel | Other situations when a child may be ordered to die (normal death) or kept alive too long: |
| 194 | 1 | Dan Stangel | |
| 195 | 1 | Dan Stangel | |
| 196 | 1 | Dan Stangel | * If the child has been in the ST_READY state for more than 15 minutes (spawn.h, MAXKILLTIME), then it will be deemed unnecessary and killed. |
| 197 | 1 | Dan Stangel | * If the child has been in the ST_READY state for less than 20 seconds (clients.c, MINKILLTIME), then it will be kept alive in case some other job needs it. Without MINKILLTIME, and agent could be spawned for one job and immediately killed (since it is ST_READY) by a different job that wants to run. This leads to very fast SPAWNED/READY/FREEING/FREE loops as two jobs battle for dominance. MINKILLTIME breaks the loop since an agent has time to go from SPAWNED to READY to RUNNING before being killed by some other job. |
| 198 | 1 | Dan Stangel | |
| 199 | 1 | Dan Stangel | h2. Configuring the Scheduler |
| 200 | 1 | Dan Stangel | |
| 201 | 1 | Dan Stangel | See the User docs for [[foss-scheduler/#Configuring the Scheduler | Configuring the Scheduler.]] |
| 202 | 1 | Dan Stangel | |
| 203 | 1 | Dan Stangel | h2. Testing |
| 204 | 1 | Dan Stangel | |
| 205 | 1 | Dan Stangel | * You might want to use '-I'. This allows you to enter jobs to run on stdin. This is good for testing new agents. |
| 206 | 1 | Dan Stangel | * -H is useful if you want to use the real configuration file on the local host. |
| 207 | 1 | Dan Stangel | * I used to use -I and -H. Now I just create my own configuration file for the specific test. |
| 208 | 1 | Dan Stangel | |
| 209 | 1 | Dan Stangel | |
| 210 | 1 | Dan Stangel | If the scheduler is killed using "kill -9", then the queue may not be reset to a stable condition. When you start the scheduler, it will monitor the queue. After 10 minutes of inactivity, the abandoned queue entries will be reclaimed for use by the scheduler. For a faster response, you can use "-R" to reset the queue immediately. However, don't use -R if there are multiple schedulers running at the same time. (Multiple schedulers is not supported, but -R will make a bad situation worse.) |
| 211 | 1 | Dan Stangel | |
| 212 | 1 | Dan Stangel | h2. Commanding the Scheduler |
| 213 | 1 | Dan Stangel | |
| 214 | 1 | Dan Stangel | The scheduler runs as an independent back-end process from the front-end user interface. As a result, the UI cannot communicate directly with the scheduler. Instead, all commands are placed in the database's jobqueue. During normal operations, the jobqueue stores tasks to be run (the tasks should match the scheduler's configuration file). However, there is one special jobqueue task. |
| 215 | 1 | Dan Stangel | |
| 216 | 1 | Dan Stangel | <pre> |
| 217 | 1 | Dan Stangel | jq_type = "command" |
| 218 | 1 | Dan Stangel | jq_args = parameters for command |
| 219 | 1 | Dan Stangel | </pre> |
| 220 | 1 | Dan Stangel | |
| 221 | 1 | Dan Stangel | |
| 222 | 1 | Dan Stangel | When the jobqueue's jq_type is the lowercase string "command", the parameters in "jq_args" are interpreted directly by the scheduler. The following jq_args are supported. |
| 223 | 1 | Dan Stangel | |
| 224 | 1 | Dan Stangel | |
| 225 | 1 | Dan Stangel | * "shutdown". The scheduler will finish all running tasks, but not start anything new. When all tasks complete, the scheduler will exit. |
| 226 | 1 | Dan Stangel | * "shutdown now". The scheduler kills all running processes and exits ASAP. |
| 227 | 1 | Dan Stangel | * "killjob 1234". If the jobqueue item 1234 (jq_pk="1234") is currently being processed by the scheduler, then kill it and mark it as a failure. This is usually used when the user queues a job to be processed, then decided to delete the job while it is running. |
| 228 | 1 | Dan Stangel | |
| 229 | 1 | Dan Stangel | |
| 230 | 1 | Dan Stangel | The front-end UI knows that the job is complete because it will be marked as processed in the jobqueue. |
| 231 | 1 | Dan Stangel | |
| 232 | 1 | Dan Stangel | h2. Building the Scheduler |
| 233 | 1 | Dan Stangel | |
| 234 | 1 | Dan Stangel | The scheduler consists of 5 source files: |
| 235 | 1 | Dan Stangel | |
| 236 | 1 | Dan Stangel | |
| 237 | 1 | Dan Stangel | * clients.c: Handles client communications. |
| 238 | 1 | Dan Stangel | * dbq.c: Contains ALL DB accesses. If the function touches the DB, then it is here. MSQ results are managed here too. |
| 239 | 1 | Dan Stangel | * hosts.c: Functions for managing host-based spawning. This keeps track of the number of spawns per host and whether new spawns are permitted. |
| 240 | 1 | Dan Stangel | * sockets.c: The read() and select() functions for communicating with an agent over stdin/stdout. |
| 241 | 1 | Dan Stangel | * spawn.c: Functions for spawning processes and handling signals. |
| 242 | 1 | Dan Stangel | * scheduler.c: The main file -- handles configuration and the infinite control loop. |
| 243 | 1 | Dan Stangel | |
| 244 | 1 | Dan Stangel | |
| 245 | 1 | Dan Stangel | |
| 246 | 1 | Dan Stangel | |
| 247 | 1 | Dan Stangel | To build the scheduler, use the Makefile. |
| 248 | 1 | Dan Stangel | |
| 249 | 1 | Dan Stangel | <pre> |
| 250 | 1 | Dan Stangel | make clean # remove all compiled files (clean slate for a new build) |
| 251 | 1 | Dan Stangel | make # build the scheduler |
| 252 | 1 | Dan Stangel | sudo make install # install it to /usr/local/fossology/agents/ |
| 253 | 1 | Dan Stangel | </pre> |
| 254 | 1 | Dan Stangel | |
| 255 | 1 | Dan Stangel | |
| 256 | 1 | Dan Stangel | The make command should build without any errors or warnings (a clean make). |
| 257 | 1 | Dan Stangel | |
| 258 | 1 | Dan Stangel | |
| 259 | 1 | Dan Stangel | NOTE: If you make any changes to the state machine labels (the ST_* definitions in spawn.h) then you *must* use 'make clean' before 'make'. (Someday we might introduce a 'make depends' file so code is compiled when all dependencies change.) |