Python tools
The complete documentation of the ecalautoctrl python package can be found at this link.
The package provide both the tools to control the automation operation and those for accessing information like the automation status and outputs.
In the following a comprehensive guide for users of the automation output is given, please refer to the full ecalautoctrl documentation for further details.
Installation with pip
The ecalautoctrl python package can be installed directly from the gitlab
repository using pip install
:
This command works in any python environment (using python -m
ensures that the package is
installed in the currently active environment). When installing the package within a CMSSW
release make sure to use the python3 -m
since the python
command is not automatically
mapped to python3
.
For a clean and standalone installation using a dedicated conda environment is strongly encouraged. This can be achieved with the following steps:
conda create -n py39-ectrl python==3.9
conda activate py39-ectrl
python -m pip install git+https://gitlab.cern.ch/cms-ecal-dpg/ECALELFS/automation-control/
Note
all the example show below can be run in any python interactive session or script.
Consider installing ipython
in the conda environment to have a simple yet powerful
interactive session in which to execute them.
Using the automation image
Docker images are maintained with all the necessary software to interact with the automation framework. The images work with singularity and can be either built from the CERN gitlab registry or an unpacked version can be accessed from cvmfs. The first method works everywhere (lxplus, laptop, etc) the second one requires having cvmsf mounted (out of the box on lxplus, for usage on a personal machine see the cvmfs guide).
To build and run an interactive shell using the image on the gitlab registry use:
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity
singularity shell --cleanenv -B /eos -B /cvmfs docker://gitlab-registry.cern.ch/cms-ecal-dpg/ecalelfs/automation:dev
To access the unpacked image on cvmfs:
singularity shell --cleanenv -B /cvmfs -B /eos /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-ecal-dpg/ecalelfs/automation:dev
In both cases the automation environment can be activate executing source /home/ecalgit/setup.sh
from within the running image.
Mounting EOS and AFS
Mounting EOS is optional, -B /eos
can be omitted. Likewise -B /afs
will attach AFS to the image, this is optional too.
Accessing run and job level information
All the information related to a given CMS run can be accessed (and modified) using the RunCtrl class.
The RunCtrl
class init
function accepts two arguments that allow one to access information
from different processing instances. The dbname
selects from with influxdb database the
information should be read, while the campaign
parameter allows switching between different
processing campaigns stored in the same database. The snippet below illustrates how to
create an instance of the RunCtrl
.
Note
All the following examples uses the database configuration to access information for the ECAL prompt processing during 2022.
Likewise information related to single tasks (group of jobs executing one workflow for a single run or groups of runs) and the single jobs within a task can be retrieved using the JobCtrl interface.
from ecalautoctrl import JobCtrl
jctrl = JobCtrl(workflow='pulse-shapes-merge', campaign='prompt', tags={'run_number' : '359342', 'fill' : '8181'}, dbname="ecal_prompt_v1")
jctrl.taskCompleted()
jctrl.getFailed()
jctrl.getJob(jid=0, last=True)
Note
The code snippet above provides a short example on how JobCtrl
can be used to
retrieve useful information. Please refer to the ecalautoctrl docs
for more details.
Collecting output files from a specific workflow
The output of each workflow can be retrieved using the getOutput method from RunCtrl
.
The funtion take one mandatory argument: process
that specify the output of which
workflow step is requested. One between the runs
, fills
and era
arguments can also
be specified to narrow down the list of outputs to specific runs.
The example below show how to access the ECALElf ntuples produced on top of the WSkim for the
entire Run2022C
acquisition era:
from ecalautoctrl import RunCtrl
rctrl = RunCtrl(dbname='ecal_prompt_v1', campaign='prompt')
rctrl.getOutput(era='Run2022C', process='ecalelf-ntuples-wskim')
The above command will yield a list of output files (technically a python set to avoid duplicates) which should look similar to this:
Click to expand
[...]
'/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/eleIDTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/extraCalibTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/extraStudyTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/ntuple_0.root',
'/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/eleIDTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/extraCalibTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/extraStudyTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/ntuple_7.root'
[...]
Warning
The automation package provides a single output
field that is bounded to be a string
in influxdb. Therefore, each workflow output field might store information in different
ways. The ecalelf-ntuples-wskim
workflow produces four output ROOT files for each
job that are stored in the output
field as a comma separated list.
Trucs et Astuces (useful commands)
The RunCtrl
class methods can be used/combined to retrieve useful information.
Same examples below.
Get an era
start and end run using getRunsInEra:
# agnostic
start, end = (min(rctrl.getRunsInEra(era='Run2022C')), max(rctrl.getRunsInEra(era='Run2022C')))
# requiring that the run was injected in a given workflow (for instance to avoid accounting
for Cosmics runs)
start, end = (min(rctrl.getRunsInEra(era='Run2022C', task='pulse-shapes')), max(rctrl.getRunsInEra(era='Run2022C', task='pulse-shapes')))
Likewise, use getRunsInFill to get all the runs belonging to a fill: