Scheduler race condition in signal handler
|Assignee:||Alex Norton||% Done:||
Currently the scheduler has problems if it receives a signal at exactly the same time is it pulling something out of the job queue.
There needs to be less code in the signal handler.
Updated by Mary Laser about 1 year ago
This error has only been observed on fo.usa.
maryl> norton: do you have a test case to reliably produce the race condition error? http://www.fossology.org/issues/1965
<maryl> norton: we will need to test your fix, once it's checked in.
<norton> maryl: no, its happenning on fo.usa mostly
<maryl> how are u validating your fix?
<norton> I will be checking on that system
<maryl> ah, i c
Updated by Alex Norton about 1 year ago
On a system with a large number of hosts (> 6) the scheduler will deadlock during startup and never kill the agents.
To test this:
Once the scheduler has started, do a ps. If there are a large number of agent processes (> 60) that never go away, this indicates that the scheduler has entered this deadlocked state. Another indication is that the scheduler will become unresponsive to fo_cli and will have to be killed manually using a SIGKILL. All of the agents will need to be individually killed.