fo_scheduler segfault error
|Assignee:||Mary Laser||% Done:||
tested in 5862 in debian 6.0/64 single system from source installation.
uploaded more than 20 uploads at the same time via run phpunit src/cli/tests/test_cp2foss.php many times.
after a while the scheduler stopped,
found one msg in /var/log/messages,
it is :
May 16 23:32:32 bl460c-10 kernel: [42060.420118] fo_scheduler17136: segfault at 1c ip 0000000000409a70 sp 00007fffb6d2ddb8 error 4 in fo_scheduler[400000+15000]
at that time， I tried to restart scheduler, error msg came out
on the command line,
kernel:[ 586.060524] Call Trace:
Message from syslogd@bl460c-10 at May 15 15:29:45 ...
kernel:[ 586.060524] Code: 24 20 48 8b 6f 10 48 89 f7 4c 8b b5 58 02 00 00 f3 ab 8a 45 41 88 44 24 50 49 8d 46 20 48 89 44 24 08 49 8b 46 20 48 8b 54 24 08 <48> 89 44 24 2c 48 8b 42 08 48 89 46 14 49 8b 06 48 89 44 24 3c
have to kill all the agents in ps -ef manually, then drop the db and recreate db.
then restart the scheduler, no error msg happen.
after the scheduler starts.
fossy 3334 1 0 May16 ? 00:00:00 [fo_scheduler] <defunct>
fossy 28922 1 43 16:13 ? 00:33:34 /usr/local/share/fossology/scheduler/agent/fo_scheduler --daemon --reset --verbose=1
Updated by Alex Norton 12 months ago
- Status changed from New to Feedback
- Estimate set to 8
I have been unable to replicate this problem.
- Status changed from Feedback to Rejected
with one test script in 2.0 branch.
2.0/fossology/src/cli/tests/test_cp2foss.php svn 5857
phpunit test_cp2foss.php several times, found this defect,
but today re-run on another test machine, can not reproduce this defect, these 2 test machine, have almost the same environment, so reject this defect.
- Status changed from Rejected to In Progress
do not start the scheduler, then upload one package kind of 10 times, then start the scheduler, will reproduce this issue.
it seems that if too many agents(e.g. unpack) is scheduled at the same time, will lead to this error.
anything please contact me.
Updated by Mary Laser 8 months ago
- Status changed from In Progress to Feedback
- Target version set to 2.1.0
Hi Bob, I'm not convinced this is a scheduler bug. It looks like Larry's test exceeded system resources. If you agree, please close this defect. Otherwise, it needs further investigation.
BTW, it's a GOOD test! I've added it to the Scheduler_Test_Cases as a new stress test. Thanks Larry!
Updated by Bob Gobeille 8 months ago
- Assignee changed from Bob Gobeille to larry shi
Larry can you still reproduce this? I cannot. I stopped the scheduler, queued 14 jobs, started the scheduler. This is on a single system, not a cluster. Some of the uploads were repeats (uploads of the same file), most were not.