Setup

Please note that this is still WiP, so not all options are fully implemented yet!

How to structure (the) project(s)

Depending on the workflow chosen, different options are possible or advisable.

CLAs require actions taken on certain events only. One would naturally also store approved contributors for future checks, making commits to some data store necessary. To avoid cluttering your commit history, those commits should be separated from your 'playground'. Since the bot's pipeline should only run on triggering events, it would either require rules to limit it to those and prevent the remainder of your pipeline from running. Or you again choose a separate project for 'home' with a dedicated pipeline.

Overall, this clearly points to option 1. Choose option 2 if you are dead set to keep personal data separate. Options 1 and 2 are also more secure w.r.t. the access tokens used.

repository
structure All three repositories could be identical (possibly with different branches) or separate. Recommended would be to have storage and bot within one repository, separate from the code. This also makes it possible to have multiple projects checked by a single installation.

1. Bot pipeline/storage in a single separate repository

'playground' != 'home' == 'store'

Caveat: does require access tokens
Option: combine multiple projects' signatures in a single place

2. Bot pipeline in a separate repository, storage repository elsewhere

'playground' != 'home' != 'store'

Caveat: does require access tokens
Option: combine multiple projects' signatures in a single place

3. Bot pipeline within the repository

'playground' == 'home' == 'store'

This is discouraged and not supported by the installer!

Caveat 1: commits can change the pipeline and mess up strict checking
Caveat 2: likely requires branches to separate concerns Caveat 3: storing signatures will create commits (storage is within a GitLab project and requires an access token)
Option: to separate commit messages and not mess up the main development branch, the signature store could be in a dedicated branch
It may be possible to work without a webhook, integrating the bot directly in the default pipeline (no interaction in this case).

Multiple 'playgrounds'

Of course you can have a single CLA Bot checking multiple 'playgrounds'. This does not require multiple 'homes', since the trigger payload sent by the webhook will clearly identify each triggering 'playground'. 'Store' will separate data by the project ids of each 'playground' if set up with the default private scope. You can also opt to have global scope to store your list of contributors. In this case a single 'home' can combine the approved contributors for several projects, be it a GitLab group or a complete GitLab instance. The combined contributors are identified by a shared 'home' in this case.

You could use install-bot to set up CLA Bot for multiple 'playgrounds'. Start by a simple, single installation first, then extend this existing installation adding more 'playgrounds' to it. install-bot should guide you through this process.

Workflow options

1. Agreement required for every push/MR, no storage

This can be a single action that does not require storing agreements. The contributor would have to add a comment with commits and be turned down in case he didn't, no interaction between contributor and bot otherwise. For MRs, this can be handled in the description of comments to the MR. Interaction within this MR is possible. The bot would block a push/MR if no agreement is found.

This would be limited to a "gitlab workflow" since it is connected only to events in GitLab.

2. Agreement stored for future pushes/MRs (CLAs?)

2.1. Non-gitlab workflow

A contributor's agreement is stored in a "signature database" that is looked up with all events. This makes "non-gitlab workflows" possible, i.e. agreement is done on paper or via email, signatures then added manually to said database via a command line tool (add-cla).
Contributors would have to use a known email address and their identity would be verified via email/on paper.

2.2. gitlab workflow

Other than that, agreement could be given in commit messages/MRs comments and stored in the database by the bot. The identities of contributors would be assured by the IdP underlying the GitLab instance (possibly signed commits).

When/How to run CLA Bot

CLA Bot should be acting on merge requests, possibly pushes. The convenient option seems to be via webhooks. "Plain" CI pipelines would be less interactive and would require a careful distinction to "normal" CI pipelines, i.e. more rules need to be defined for steps and stages.

1. Merge requests

MRs trigger a "merge pipeline" that can be identified inside/from CI. So the bot could be a job/stage only for this case. However, such a pipeline is not created when comments are added to the MR so there would be no proper interaction between CLA Bot and contributors.
MRs create "merge request events" that can call a webhook. This is similar to the above case but can trigger a pipeline in any project with a distinct pipeline type (trigger pipeline instead of merge pipeline). It's similar in the sense of being created in addition whenever a "merge pipeline" is created. A new MR will cause that event, a comment on this MR will not.
Comments in MRs create "comment events" - as do comments on issues. The payload can distinguish both since running the bot only makes sense for comments in MRs.
If the bot finds the MR to be problematic w.r.t. CLA, the only option on a GitLab CE is to create an unresolved discussion. The project should then be set up to block MRs until all discussions are resolved. It may require to prevent anyone from resolving discussions (or have the bot reopen them).

2. Pushes

Not considered for now. But they work analogous to MRs, apart from different event types, no comments or discussions. The bot could simply fail a pipeline and block or at least flag the commit that way. A "gitlab workflow" would only work via comments in commits, so would be very limited and less interactive. The committer cannot easily comment on a missing CLA and the bot could check prior agreement, but the committer has no straightforward way of acting on a failed pipeline.

Access tokens

CLA Bot requires several project/pipeline access tokens to work. They are mainly required for API access to your projects, but also to fetch/commit signatures and keep track of contributors. GitLab introduced a limited lifetime for project access tokens. This requires updates to some of those tokens whenever the lifetime nears its end. While the tokens can be conveniently "rotated" from GitLab's UI, there is no automated way of updating the CI/CD variables. So familiarise yourself with their purpose and definition, you will have to update those!

For more details, please have a look at Tokens?.