Python tools

The complete documentation of the ecalautoctrl python package can be found at this link.

The package provide both the tools to control the automation operation and those for accessing information like the automation status and outputs.

In the following a comprehensive guide for users of the automation output is given, please refer to the full ecalautoctrl documentation for further details.

Installation with pip

The ecalautoctrl python package can be installed directly from the gitlab repository using pip install:

python -m pip install git+https://gitlab.cern.ch/cms-ecal-dpg/ECALELFS/automation-control/

This command works in any python environment (using python -m ensures that the package is installed in the currently active environment). When installing the package within a CMSSW release make sure to use the python3 -m since the python command is not automatically mapped to python3.

For a clean and standalone installation using a dedicated conda environment is strongly encouraged. This can be achieved with the following steps:

conda create -n py39-ectrl python==3.9
conda activate py39-ectrl
python -m pip install git+https://gitlab.cern.ch/cms-ecal-dpg/ECALELFS/automation-control/

Note

all the example show below can be run in any python interactive session or script. Consider installing ipython in the conda environment to have a simple yet powerful interactive session in which to execute them.

Using the automation image

Docker images are maintained with all the necessary software to interact with the automation framework. The images work with singularity and can be either built from the CERN gitlab registry or an unpacked version can be accessed from cvmfs. The first method works everywhere (lxplus, laptop, etc) the second one requires having cvmsf mounted (out of the box on lxplus, for usage on a personal machine see the cvmfs guide).

To build and run an interactive shell using the image on the gitlab registry use:

export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity
singularity shell --cleanenv -B /eos -B /cvmfs docker://gitlab-registry.cern.ch/cms-ecal-dpg/ecalelfs/automation:dev

To access the unpacked image on cvmfs:

singularity shell --cleanenv -B /cvmfs -B /eos /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/cms-ecal-dpg/ecalelfs/automation:dev

In both cases the automation environment can be activate executing source /home/ecalgit/setup.sh from within the running image.

Mounting EOS and AFS

Mounting EOS is optional, -B /eos can be omitted. Likewise -B /afs will attach AFS to the image, this is optional too.

Accessing run and job level information

All the information related to a given CMS run can be accessed (and modified) using the RunCtrl class.

The RunCtrl class init function accepts two arguments that allow one to access information from different processing instances. The dbname selects from with influxdb database the information should be read, while the campaign parameter allows switching between different processing campaigns stored in the same database. The snippet below illustrates how to create an instance of the RunCtrl.

Note

All the following examples uses the database configuration to access information for the ECAL prompt processing during 2022.

from ecalautoctrl import RunCtrl

rctrl = RunCtrl(dbname='ecal_prompt_v1', campaign='prompt')

Likewise information related to single tasks (group of jobs executing one workflow for a single run or groups of runs) and the single jobs within a task can be retrieved using the JobCtrl interface.

from ecalautoctrl import JobCtrl

jctrl = JobCtrl(workflow='pulse-shapes-merge', campaign='prompt', tags={'run_number' : '359342', 'fill' : '8181'}, dbname="ecal_prompt_v1")

jctrl.taskCompleted()
jctrl.getFailed()
jctrl.getJob(jid=0, last=True)

Note

The code snippet above provides a short example on how JobCtrl can be used to retrieve useful information. Please refer to the ecalautoctrl docs for more details.

Collecting output files from a specific workflow

The output of each workflow can be retrieved using the getOutput method from RunCtrl. The funtion take one mandatory argument: process that specify the output of which workflow step is requested. One between the runs, fills and era arguments can also be specified to narrow down the list of outputs to specific runs.

The example below show how to access the ECALElf ntuples produced on top of the WSkim for the entire Run2022C acquisition era:

from ecalautoctrl import RunCtrl

rctrl = RunCtrl(dbname='ecal_prompt_v1', campaign='prompt')
rctrl.getOutput(era='Run2022C', process='ecalelf-ntuples-wskim')

The above command will yield a list of output files (technically a python set to avoid duplicates) which should look similar to this:

Click to expand

[...]
 '/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/eleIDTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/extraCalibTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/extraStudyTree_0.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357478/ntuple_0.root',
 '/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/eleIDTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/extraCalibTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/extraStudyTree_7.root,/eos/cms/store/group/dpg_ecal/alca_ecalcalib/automation_prompt/ecalelf/wskim/357479/ntuple_7.root'
[...]

Warning

The automation package provides a single output field that is bounded to be a string in influxdb. Therefore, each workflow output field might store information in different ways. The ecalelf-ntuples-wskim workflow produces four output ROOT files for each job that are stored in the output field as a comma separated list.

Trucs et Astuces (useful commands)

The RunCtrl class methods can be used/combined to retrieve useful information. Same examples below.

Get an era start and end run using getRunsInEra:

# agnostic
start, end = (min(rctrl.getRunsInEra(era='Run2022C')), max(rctrl.getRunsInEra(era='Run2022C')))

# requiring that the run was injected in a given workflow (for instance to avoid accounting
for Cosmics runs)
start, end = (min(rctrl.getRunsInEra(era='Run2022C', task='pulse-shapes')), max(rctrl.getRunsInEra(era='Run2022C', task='pulse-shapes')))

Likewise, use getRunsInFill to get all the runs belonging to a fill:

# agnostic
rctrl.getRunsInFill(fill=8081)

# requiring that the run was injected in a given workflow (for instance to avoid accounting
for Cosmics runs)
rctrl.getRunsInFill(fill=8081, task='ecalelf-ntuples-wskim')