Skip to content

Data reprocessing

Note

Different influxdb user have different priviledges on different databases. The correct influxdb username and password should be set as environment variables before executing the steps described in the following:

export ECAL_INFLUXDB_USER=username
export ECAL_INFLUXDB_PWD=password

Reprocessing of entire datasets (e.g. for preparation of data re-recos or further studies) can be handled by the system similarly to the prompt processing. The main difference is that the runs to be processed are injected all at once rather than being fetched automatically by the system.

The reprocessing infrastructure is also an excellent tool to test new workflows and perform development of the automation system.

Different reprocessing campaigns can be stored inside the same influx database, action on a specific campaign can be performed by specifying the campaign name trhough the -c/--campaign option of the ecalrunctrl.py and ecalautomation.py scripts. In general the campaign name is used as a tag in the influxdb for all measurements (run, job, ...).

Creating a new campaign

A new reprocessing campaign can be created using the ecalrunctrl.py script.

ecalrunctrl.py --db 'db_name' create --campaign 'test_rereco'

db_name is the name of the influx database in use (usually one per subdetector with versioning and separation between prompt and rereco). test_rereco is the name of the new campaign. Further options are described in the script help menu ecalrunctrl.py create --help.

Adding workflows to the campaign

By default the campaign is created with a single run type all that filters run by rejecting non global runs. The run types can be customized using the rtype-create, rtype-update and rtype-list sub-commands of the ecalrunctrl.py script.

Workflows are added to the newly created campaign with:

ecalrunctrl.py --db 'db_name' rtype-update --campaign 'test_rereco' --add workflow1 workflow2 workflow3 --type=all

The --db and --campaign options match the one specified at creation time. The default all run type is updated adding three workflows (space separated argument to the --add option).

All runs injected in the future in the test_rereco campaign will be processed by the three workflows.

Note

The workflow name should match the one specified with the task argument of the Handler class. Both the git branch name and the Jenkins' item name assocaited to the workflow are irrelevant. For example to add the pi0-mon workflow the above command line should include --add pi0-mon.

Injecting runs in the new campaign

Once the new campaign has been created and workflows have been added, data can be injected for reprocessing using the command below.

ecalrunctrl.py --db 'db_name' inject --campaign 'test_rereco' --era 'Run2022D' --globaltag='MYGT'

The --db and --campaign options match the one specified at creation time. Data can be injected with the granularity of a single run. Options to inject a single run or group of runs (run ranges, era) are available:

--runs RUNS           Comma separated list of
                        run(s) to inject in the new
                        campaign.
--range RRANGE        Define a range of runs to be
                        injected (min,max).
--era ERA             Inject all runs belonging to
                        a given CMS data acquisition
                        era.