Bug #1995

fo_scheduler segfault error

Added by larry shi about 1 year ago. Updated 8 months ago.

Status:Closed Start date:05/17/2012
Priority:Normal Due date:
Assignee:Mary Laser % Done:

0%

Category:Scheduler
Target version:2.1.0
Rank: Tester:

Description

tested in 5862 in debian 6.0/64 single system from source installation.
uploaded more than 20 uploads at the same time via run phpunit src/cli/tests/test_cp2foss.php many times.
after a while the scheduler stopped,
found one msg in /var/log/messages,
it is :
May 16 23:32:32 bl460c-10 kernel: [42060.420118] fo_scheduler17136: segfault at 1c ip 0000000000409a70 sp 00007fffb6d2ddb8 error 4 in fo_scheduler[400000+15000]

at that time, I tried to restart scheduler, error msg came out
on the command line,

kernel:[ 586.060524] Call Trace:

Message from syslogd@bl460c-10 at May 15 15:29:45 ...
kernel:[ 586.060524] Code: 24 20 48 8b 6f 10 48 89 f7 4c 8b b5 58 02 00 00 f3 ab 8a 45 41 88 44 24 50 49 8d 46 20 48 89 44 24 08 49 8b 46 20 48 8b 54 24 08 <48> 89 44 24 2c 48 8b 42 08 48 89 46 14 49 8b 06 48 89 44 24 3c

have to kill all the agents in ps -ef manually, then drop the db and recreate db.
then restart the scheduler, no error msg happen.
after the scheduler starts.
fossy 3334 1 0 May16 ? 00:00:00 [fo_scheduler] <defunct>
fossy 28922 1 43 16:13 ? 00:33:34 /usr/local/share/fossology/scheduler/agent/fo_scheduler --daemon --reset --verbose=1

History

Updated by Paul Holland 12 months ago

  • IterNum set to 2

Updated by Alex Norton 12 months ago

  • Status changed from New to Feedback
  • Estimate set to 8

I have been unable to replicate this problem.

Updated by larry shi 11 months ago

  • Status changed from Feedback to Rejected

Hi Alex,

with one test script in 2.0 branch.
2.0/fossology/src/cli/tests/test_cp2foss.php svn 5857
phpunit test_cp2foss.php several times, found this defect,
but today re-run on another test machine, can not reproduce this defect, these 2 test machine, have almost the same environment, so reject this defect.

Updated by larry shi 11 months ago

  • Status changed from Rejected to In Progress

reproduce method:
do not start the scheduler, then upload one package kind of 10 times, then start the scheduler, will reproduce this issue.

it seems that if too many agents(e.g. unpack) is scheduled at the same time, will lead to this error.

anything please contact me.

Updated by Alex Norton 9 months ago

  • Assignee changed from Alex Norton to Bob Gobeille

Updated by Mary Laser 9 months ago

  • Target version deleted (2.0.1)

Updated by Mary Laser 8 months ago

  • Status changed from In Progress to Feedback
  • Target version set to 2.1.0

Hi Bob, I'm not convinced this is a scheduler bug. It looks like Larry's test exceeded system resources. If you agree, please close this defect. Otherwise, it needs further investigation.

BTW, it's a GOOD test! I've added it to the Scheduler_Test_Cases as a new stress test. Thanks Larry!

Mary

Updated by Bob Gobeille 8 months ago

  • Assignee changed from Bob Gobeille to larry shi

Larry can you still reproduce this? I cannot. I stopped the scheduler, queued 14 jobs, started the scheduler. This is on a single system, not a cluster. Some of the uploads were repeats (uploads of the same file), most were not.

Updated by Mary Laser 8 months ago

  • Status changed from Feedback to Closed
  • Assignee changed from larry shi to Mary Laser

Bug was last seen 3 months ago when the scheduler was still under development & test for 2.1.
Closing due to inability to reproduce.

Also available in: Atom PDF