Bug #1965
Scheduler race condition in signal handler
| Status: | Closed | Start date: | 05/14/2012 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | Scheduler | |||
| Target version: | 2.0.0 | |||
| Rank: | Tester: |
Description
Currently the scheduler has problems if it receives a signal at exactly the same time is it pulling something out of the job queue.
There needs to be less code in the signal handler.
History
Updated by Mary Laser about 1 year ago
This error has only been observed on fo.usa.
maryl> norton: do you have a test case to reliably produce the race condition error? http://www.fossology.org/issues/1965
<maryl> norton: we will need to test your fix, once it's checked in.
<norton> maryl: no, its happenning on fo.usa mostly
<maryl> how are u validating your fix?
<norton> I will be checking on that system
<maryl> ah, i c
Updated by Alex Norton about 1 year ago
On a system with a large number of hosts (> 6) the scheduler will deadlock during startup and never kill the agents.
To test this:
Once the scheduler has started, do a ps. If there are a large number of agent processes (> 60) that never go away, this indicates that the scheduler has entered this deadlocked state. Another indication is that the scheduler will become unresponsive to fo_cli and will have to be killed manually using a SIGKILL. All of the agents will need to be individually killed.
Updated by Alex Norton about 1 year ago
- Status changed from In Progress to Resolved
Updated by Bob Gobeille about 1 year ago
- Status changed from Resolved to Closed
tested and verified in 5863 on fo.usa. However, the problem only happens occasionally so I hope it is really fixed.