Bug #1569
wget shouldn't be using LOCAL special option
| Status: | Closed | Start date: | 03/15/2012 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | Wget agent | |||
| Target version: | 2.1.0 | Estimated time: | 24.00 hours | |
| Rank: | 2 | Tester: |
Description
Currently the wget agent must run on the same system as the scheduler to correctly perform an upload from server. See http://www.fossology.org/issues/1413 for more information. This should not be a requirement for wget.
History
Updated by Paul Holland about 1 year ago
- IterNum set to 2
Updated by Mary Laser about 1 year ago
- Status changed from New to Closed
notification changed to be more informative/less confusing. svn 5927
Updated by Mary Laser about 1 year ago
- Status changed from Closed to New
- Assignee changed from Mary Laser to larry shi
accidentally closed; reopening & reassigning to Larry
Updated by larry shi about 1 year ago
- Status changed from New to Feedback
Hi Alex,
here you mentioned 'wget', I suppose you mean 'wget_agent'.
why do not you think 'This should not be a requirement for wget run on the same system as the scheduler to correctly perform an upload from server'?
also, at the beginning of 2.0, I remembered that we defined a fossology core, it contains the schuduler,wget_agent, UI, etc.
any suggestions?
Updated by Bob Gobeille about 1 year ago
The situation is this:
1) Upload from server requires the file path to be on the web server.
2) Upload from server then queues wget_agent which needs to read this path (assuming it is on the same server the agent is running on).
3) So upload from server fails if it does not run on the same host as the web server.
4) The simple way to resolve this is to force wget_agent to run with special[] = LOCAL which forces wget_agent to run on the scheduler machine works because we always run the web server and scheduler on the same host.
The problem is:
5) We don't want to be forced to always run the web server and scheduler on the same host. This requirement limits our ability to run multiple web servers, should we ever want to set up a configuration like that.
6) #4 above means that upload from URL and any other wget_agent job will be forced to run on the scheduler system. That is undesirable since we want to distribute the load.
Updated by larry shi about 1 year ago
- Assignee changed from larry shi to Bob Gobeille
- Estimated time set to 24.00
Hi Alex, Bob,
If fossology want to run multiple web servers.
here are my thoughts
1. on upload from server page, have to display all the web server name to select, then enter a path for that server.
after selecting a path on a specified server
2.1 method 1, have to tell scheduler to schedule wget_agent on the specified server, how to tell the scheduler, any API? or other method? For now, if I want to let wgent_agent run on the machine same as the scheduler, I can force wget_agent to run with special[] = LOCAL in wgent_agent.conf.
2.2 method 2, change wget_agent, let wget_agent can get the file(s) from other machine(not the machine the wget_agent locate).
3. I am not sure if wen can fix this defect in this iteration. do you think this defect has a high priority?
thoughts?
Updated by Mary Laser about 1 year ago
Hi Bob - I spoke w/Larry on OC tonight. He asked me to remind you to respond to this issue. Thanks.
Updated by Bob Gobeille about 1 year ago
Let's add a new field to JobQueueAdd() called $Host to write the host into the jobqueue record. Then Alex will modify the scheduler to look at Host and if it is not NULL to run the agent on that host. This way Upload from Server will always run on the correct host, but Upload from URL can run on any host.
Also, the Upload from Server UI will need a new host field that should be a pull down of the available machines from the [HOSTS] group in fossology.conf.
How does this sound?
Updated by Bob Gobeille about 1 year ago
- Status changed from Feedback to In Progress
- Assignee changed from Bob Gobeille to larry shi
Updated by Bob Gobeille about 1 year ago
The new database field should be called "jq_host" to be consistent.
Updated by larry shi about 1 year ago
Bob, I agree with you.
paul, So we may want to defer this defect to the next iteration, make sense?
Updated by larry shi about 1 year ago
this defect needs more time to fix.
modify the db(add jq_host field in jobqueue table ), consider how to upgrade,
modify all the places who call JobQueueAdd,
make cli scripts work,
tests will take some time, this defect fix will impact the existing functions.
I did not fix it till now.
Updated by Alex Norton about 1 year ago
The scheduler changes have been made for this. If a job is queued with the jq_host field set to non-NULL, then the scheduler will make sure that it only runs the agent on that host.
As a suggestion for JobQueueAdd, simply add the host as the last argument and have it declared with a default value or NULL, this way you won't need to make changes to every location that it is called from. The only location it should need to be changed is when scheduling a wget_agent that does an upload from server. This seems like the perfect use of the default function parameter feature of php.
Updated by larry shi about 1 year ago
thank you alex
Updated by larry shi about 1 year ago
- Status changed from In Progress to Feedback
- Assignee changed from larry shi to Alex Norton
after checking in svn 5963/5964/5966/, but find that the scheduler can not start.
will look into it tomorrow.
Alex, if you have time, please help to confirm if the cause if because the scheduler. thanks
2012-06-19 16:53:03 scheduler [23333] :: JOB[-2].buckets[23339.localhost]: successfully remove from the system
2012-06-19 16:53:03 scheduler [23333] :: JOB[-2]: job removed from system
2012-06-19 16:53:03 scheduler [23333] :: META_AGENT[localhost.nomos] version is: "unknown"
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6].nomos[23352.localhost]: received: "OK"
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6].nomos[23352.localhost]: agent status change: AGENT_SPAWNED -> AGENT_RUNNING
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6].nomos[23352.localhost]: agent successfully created
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6]: job status changed: JOB_CHECKEDOUT => JOB_STARTED
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6].nomos[23352.localhost]: agent status change: AGENT_RUNNING -> AGENT_PAUSED
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6]: job status changed: JOB_STARTED => JOB_COMPLETE
2012-06-19 16:53:03 scheduler [23333] :: JOB[-6].nomos[23352.localhost]: sent to agent "CLOSE"
2012-06-19 16:53:03 scheduler [23333] :: META_AGENT[localhost.wget_agent] version is: "unknown"
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9].wget_agent[23355.localhost]: received: "OK"
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9].wget_agent[23355.localhost]: agent status change: AGENT_SPAWNED -> AGENT_RUNNING
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9].wget_agent[23355.localhost]: agent successfully created
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9]: job status changed: JOB_CHECKEDOUT => JOB_STARTED
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9].wget_agent[23355.localhost]: agent status change: AGENT_RUNNING -> AGENT_PAUSED
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9]: job status changed: JOB_STARTED => JOB_COMPLETE
2012-06-19 16:53:03 scheduler [23333] :: JOB[-9].wget_agent[23355.localhost]: sent to agent "CLOSE"
2012-06-19 16:53:03 scheduler [23333] :: META_AGENT[localhost.pkgagent] version is: "unknown"
2012-06-19 16:53:03 scheduler [23333] :: JOB[-7].pkgagent[23350.localhost]: received: "OK"
.....
Updated by Alex Norton about 1 year ago
- Assignee changed from Alex Norton to larry shi
The crash was because of the scheduler. The scheduler no longer crashes when checking a new job out of the database.
Updated by larry shi 12 months ago
- Priority changed from Normal to High
- Rank set to 2
- IterNum changed from 2 to 3
Updated by larry shi 11 months ago
- Assignee changed from larry shi to Mary Laser
please help to verify and close it, thanks
If it is complicated to verify on a cluster, I think you can verify when you have a new cluster.
Updated by Paul Holland 11 months ago
- IterNum changed from 3 to 6
Needs to be validated by Mary. She'll be back in iteration 6.
Updated by Mary Laser 11 months ago
- Estimate set to 2
Updated by Mary Laser 11 months ago
- Status changed from Resolved to In Progress
- Assignee changed from Mary Laser to Alex Norton
- Target version changed from 2.0.1 to 2.1.0
After deleting the "LOCAL" special option from both wget_agent conf files in my test cluster, I was only able to upload from the the scheduler system (fluffy). Attempts to upload from the agent system (pigwidgeon), resulted in this message:
Upload failed for /home/laser/testfiles.tar: '/home/laser/testfiles.tar' does not exist.
Updated by larry shi 11 months ago
- Assignee changed from Alex Norton to larry shi
I am looking into this issue.
Note: I went to fluffy today, found that the configuration is incorrect, from fluffy with fossy, can not log in pigwidgeon without password, for now, the cluster work.
Updated by Bob Gobeille 10 months ago
Sorry I couldn't give you a quicker answer on OC. Here is what you wrote:
Shi, Yao-Bin (Larry, Open Source Program Office) says: (9:50:57 AM) for http://www.fossology.org/issues/1569 for now, we want to upload from server ( the server is not just host server) when the server is agent server, when I want to check if the file you want to upload exist if the checking code is in upload-srv-files.php You know user of upload-srv-files.php is not fossy, right? so can not visit the agent server without password can you understand what problems I meeting?
I think you are saying that the problem is not doing an upload from any host in the system (because wget_agent runs on that system), but the problem is that from the UI you want to test that the file exists and this may not be possible because apache might not have access to it. Is that right? If so, the two solutions are:
1) don't do the file exist check if the file is not on the web server
2) restrict the upload to files on the web server that apache has access to.
The interesting thing about 1) is that the file exist check is being done as apache but the upload is being done by the scheduler (fossy). So it's not a perfect check anyway.
I think I prefer 1).
Updated by larry shi 10 months ago
- Target version changed from 2.1.0 to 2.0.1
in i did one test on fluffy cluster, select from pigwidgeon,
so will add one jobqueque recode:
INSERT INTO jobqueue (jq_job_fk,jq_type,jq_args,jq_runonpfile,jq_starttime,jq_endtime,jq_end_bits,jq_host) VALUES ('9','wget_agent','9 - /home/laser/monitor.sh','no',NULL,NULL,0,'pigwidgeon');
but find one error msg for wget_agent as below:
2012-08-09 21:59:34 wget_agent [0] :: JOB28.wget_agent[26869.localhost]: "FATAL wget_agent.c.401: path /home/laser/monitor.sh is not http://, https://, or ftp://"
2012-08-09 21:59:34 wget_agent [0] :: JOB28.wget_agent[26869.localhost]: agent failed, code: 26
why wget_agent is running on localhost not on pigwidgeon?
if wget_agent is runing on pigwidgeon, probably will work fine.
will talk with alex.
Updated by Alex Norton 10 months ago
upload from server is not currently working, so I have to way of testing if the problem described by Larry is even occuring on my machine.
I cannot test this and help until upload from server will schedule the job.
Updated by larry shi 10 months ago
- Status changed from In Progress to Feedback
- Assignee changed from larry shi to Alex Norton
vincent and larry tested on their test machine in svn 6141, upload from server works.
Updated by Alex Norton 10 months ago
- Assignee changed from Alex Norton to larry shi
Upload from server is not working for me. I get this error in the ui:
Upload failed for /home/norton/Downloads/postgresql-9.1.4.tar.bz2: '/home/norton/Downloads/postgresql-9.1.4.tar.bz2' does not exist.
I'm currently using svn 6141.
Updated by larry shi 10 months ago
- Status changed from Feedback to In Progress
- Estimate changed from 4 to 8
I canot reproduce the problem alex mentioned.
I am looking into why scheduler can not schedule agent on the correct machine.
Updated by larry shi 10 months ago
- Status changed from In Progress to Resolved
fixed in svn 6161.
if your cluster is configured correctly, I think this bug should not reproduce,
anything please let me know.
Updated by Mary Laser 10 months ago
- Target version changed from 2.0.1 to 2.1.0
Updated by Paul Holland 10 months ago
- Assignee changed from larry shi to Mary Laser
Resolved task needs validated before closed.