Monday, October 18, 2004

SCMSWeb as a replacement of MDS for job scheduling problem

Since MDS2 is very unstable and SQMS/G relies on it, sometimes SQMS/G can't get available resources due to timeout in MDS itself. To temporarily fix this problem, we decided to extend SCMSWeb to act like a MDS especially for SQMS/G.

To activate this function, you have to upgrade SCMSWeb to the newest version in CVS. Note that you have to upgrade all SCMSWeb instance in the grid. Next step, you have to specify your scheduler for job monitoring module in /etc/sce/sce.conf as follows.

[jobmon]
scheduler = sqms

Now SCMSWeb only supports SQMS, SGE, MDS, and RSL. If you are using other scheduler, you can use MDS or RSL if globus supports it. Actually, MDS and RSL are very similar but MDS may contain remote information.

In order to use MDS, you have to specify mds_host, mds_port, and mds_basedn in mds group.

[mds]
mds_host = mds.thaigrid.net
mds_port = 2135
mds_basedn = mds-vo-name=thai,o=grid


For RSL, it is necessary to specify globus_location and scheduler_type in jobmon group.

[jobmon]
globus_location = /usr/grid
scheduler_type = pbs


Note that both MDS and RSL does not provide full information like native drivers such as SQMS and SGE.

In SQMS, I introduces new resource information plug-in for retrieving queue status from specified URL. If you want to activate this plug-in, please change option allhost_source in group sqms or queue's group to liburlrinfo.so. Moreover, specify option queue_url in form %scmsweb_cgiurl%/queue_mon.cgi by replacing %scmsweb_cgiurl% with cgi url specified to SQMSWeb at installation time (contact your admin). Default value is http://localhost/cgi-bin/scmsweb/queue_mon.cgi.

[sqms]
allhost_source = liburlrinfo.so
queue_url = http://observer.thaigrid.net/cgi-bin/scmsweb/queue_mon.cgi

No comments: