Bug #1995
fo_scheduler segfault error
| Status: | Closed | Start date: | 05/17/2012 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | Scheduler | |||
| Target version: | 2.1.0 | |||
| Rank: | Tester: |
Description
tested in 5862 in debian 6.0/64 single system from source installation.
uploaded more than 20 uploads at the same time via run phpunit src/cli/tests/test_cp2foss.php many times.
after a while the scheduler stopped,
found one msg in /var/log/messages,
it is :
May 16 23:32:32 bl460c-10 kernel: [42060.420118] fo_scheduler17136: segfault at 1c ip 0000000000409a70 sp 00007fffb6d2ddb8 error 4 in fo_scheduler[400000+15000]
at that timeļ¼ I tried to restart scheduler, error msg came out
on the command line,
kernel:[ 586.060524] Call Trace:
Message from syslogd@bl460c-10 at May 15 15:29:45 ...
kernel:[ 586.060524] Code: 24 20 48 8b 6f 10 48 89 f7 4c 8b b5 58 02 00 00 f3 ab 8a 45 41 88 44 24 50 49 8d 46 20 48 89 44 24 08 49 8b 46 20 48 8b 54 24 08 <48> 89 44 24 2c 48 8b 42 08 48 89 46 14 49 8b 06 48 89 44 24 3c
have to kill all the agents in ps -ef manually, then drop the db and recreate db.
then restart the scheduler, no error msg happen.
after the scheduler starts.
fossy 3334 1 0 May16 ? 00:00:00 [fo_scheduler] <defunct>
fossy 28922 1 43 16:13 ? 00:33:34 /usr/local/share/fossology/scheduler/agent/fo_scheduler --daemon --reset --verbose=1
History
Updated by Paul Holland 12 months ago
- IterNum set to 2
Updated by Alex Norton 12 months ago
- Status changed from New to Feedback
- Estimate set to 8
I have been unable to replicate this problem.
Updated by larry shi 11 months ago
- Status changed from Feedback to Rejected
Hi Alex,
with one test script in 2.0 branch.
2.0/fossology/src/cli/tests/test_cp2foss.php svn 5857
phpunit test_cp2foss.php several times, found this defect,
but today re-run on another test machine, can not reproduce this defect, these 2 test machine, have almost the same environment, so reject this defect.
Updated by larry shi 11 months ago
- Status changed from Rejected to In Progress
reproduce method:
do not start the scheduler, then upload one package kind of 10 times, then start the scheduler, will reproduce this issue.
it seems that if too many agents(e.g. unpack) is scheduled at the same time, will lead to this error.
anything please contact me.
Updated by Alex Norton 9 months ago
- Assignee changed from Alex Norton to Bob Gobeille
Updated by Mary Laser 9 months ago
- Target version deleted (
2.0.1)
Updated by Mary Laser 8 months ago
- Status changed from In Progress to Feedback
- Target version set to 2.1.0
Hi Bob, I'm not convinced this is a scheduler bug. It looks like Larry's test exceeded system resources. If you agree, please close this defect. Otherwise, it needs further investigation.
BTW, it's a GOOD test! I've added it to the Scheduler_Test_Cases as a new stress test. Thanks Larry!
Mary
Updated by Bob Gobeille 8 months ago
- Assignee changed from Bob Gobeille to larry shi
Larry can you still reproduce this? I cannot. I stopped the scheduler, queued 14 jobs, started the scheduler. This is on a single system, not a cluster. Some of the uploads were repeats (uploads of the same file), most were not.
Updated by Mary Laser 8 months ago
- Status changed from Feedback to Closed
- Assignee changed from larry shi to Mary Laser
Bug was last seen 3 months ago when the scheduler was still under development & test for 2.1.
Closing due to inability to reproduce.