Skip to content

Conversation

@cipherboy
Copy link
Contributor

Description

Nicely laid out profiles have a structure matching their corresponding
benchmarks. I'm thinking the CIS and STIG profiles here.

Let's formalize that structure a bit.

Given a profile with format:

    selection:
        # <identifier> description
        - rule_choice

Where one or more comments precede one or more rules, and the closest
rule that matches the given format for a reference identifier wins.

Let's take this as an example:

    ## 5.3 Configure PAM ##
    ### 5.3.1 Ensure password creation requirements are configured (Automated)
    - var_password_pam_minlen=14
    - accounts_password_pam_minlen
    - var_password_pam_minclass=4
    - accounts_password_pam_minclass

Here both accounts_password_pam_minlen and
accounts_password_pam_minclass should get CIS reference value 5.3.1. The
other two entries are vars, and since 5.3 is further away than 5.3.1,
5.3.1 should win.

Some ground rules:

  • We should avoid guessing when possible
  • We should create minimal diffs
  • Some rules lack a references section; we should add them in that
    case.
  • If we're not sure, ignore the rule and print info telling the caller
    about it.
  • Rules that don't belong to a section shouldn't be in the profile!
  • If we're adding a reference, don't clutter other products! Only do
    our current product.

Mostly this is built on top of the ssg.rule_yaml helpers I've added
previously during my internship.

The structure roughly follows the previous utils/refchecker.py. Walk all
rules in a profile, load their YAML, load their actual lines, use a bit
of magic to auto-detect the rule identifier, and write the results out.

Most of the heavy lifting was done in rule_yaml.py to add a new helper
method, add_or_modify_nested_section_key, that adds a new key in a
nested section (and optionally, the section itself) or -- if said key is
already there -- modifies the value of that key.

Signed-off-by: Alexander Scheel <[email protected]>

Rationale

I was talking with Jan and Matej on #openscap about the new controls file. We're working on porting our current Ubuntu CIS benchmark into CaC; this is just one step that will help with it.

I'm opening this in the interest of working together on a common solution to product-specific controls and benchmark-based profiles. Knowing what we're currently using should help show where we're coming from and make a common design possible.

See also: #6906 and #6907

@cipherboy cipherboy added enhancement General enhancements to the project. Infrastructure Our content build system standards Benchmarks related. labels Apr 28, 2021
@cipherboy cipherboy self-assigned this Apr 28, 2021
@pep8speaks
Copy link

pep8speaks commented Apr 28, 2021

Hello @cipherboy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-29 20:18:00 UTC

@openscap-ci
Copy link
Collaborator

openscap-ci commented Apr 28, 2021

Changes identified:
Others:
 Changes in Python files.

Show details

Others:
 Python file utils/autorefer.py is newly added.

Recommended tests to execute:
 (cd build && cmake ../ && ctest -j4)

@cipherboy
Copy link
Contributor Author

Note that, if at the end you want a section of rules with multiple identifiers, you can do it like so:

        ### 5.3.1,5.3.2 Multi-use rules
        - some_other_rule

This wouldn't quite fit in the existing benchmark structure I suppose, but could be inserted in a useful place. I didn't experiment with what happened with one rule that is placed multiple times in a profile; it would probably fail the build too.

Nicely laid out profiles have a structure matching their corresponding
benchmarks. I'm thinking the CIS and STIG profiles here.

Let's formalize that structure a bit.

Given a profile with format:

```yaml
    selection:
        # <identifier> description
        - rule_choice
```

Where one or more comments precede one or more rules, and the closest
comment that matches the given format for a reference identifier wins.

Let's take this as an example:

```yaml
    ## 5.3 Configure PAM ##
    ### 5.3.1 Ensure password creation requirements are configured (Automated)
    - var_password_pam_minlen=14
    - accounts_password_pam_minlen
    - var_password_pam_minclass=4
    - accounts_password_pam_minclass
```

Here both accounts_password_pam_minlen and
accounts_password_pam_minclass should get CIS reference value 5.3.1. The
other two entries are vars, and since 5.3 is further away than 5.3.1,
5.3.1 should win.

Some ground rules:

 - We should avoid guessing when possible.
 - We should create minimal diffs.
 - Some rules lack a references section; we should add them in that
   case.
 - If we're not sure, ignore the rule and print info telling the caller
   about it.
 - Rules that don't belong to a section shouldn't be in the profile!
 - If we're adding a reference, don't clutter other products! Only do
   our current product.

Mostly this is built on top of the ssg.rule_yaml helpers I've added
previously during my internship.

The structure roughly follows the previous utils/refchecker.py. Walk all
rules in a profile, load their YAML, load their actual lines, use a bit
of magic to auto-detect the rule identifier, and write the results out.

Most of the heavy lifting was done in rule_yaml.py to add a new helper
method, add_or_modify_nested_section_key, that adds a new key in a
nested section (and optionally, the section itself) or -- if said key is
already there -- modifies the value of that key.

Signed-off-by: Alexander Scheel <[email protected]>
@matejak
Copy link
Member

matejak commented May 5, 2021

I don't like the approach for following reasons:

  • In the real world, reference is not a rule’s property, but it depends on a relationship of a rule and a regulation that exists independently from the rule. Our project started with references as rule properties, but it is clearly visible that if the situation continues to develop at the this pace, we will end up with rule definition files that will contain a scary reference blob that won't be human-readable.
    But the project has made a step in a direction away from this - we have control files that are supposed to be based on those regulations. Next, we have the concept of compiled rules as an intermediary, but still human-readable format. Therefore, I would much rather see the future in references being added by means of control files (or even profile files) rather than by means of rule keys.
  • The proposal would introduce duplication of information - part of which would be present in comments.

I am sorry that you put quite some energy into this proposal, but I think that its elements could be leveraged when references are added to rule metadata as a step of the build process.

@cipherboy
Copy link
Contributor Author

cipherboy commented May 5, 2021

Hi @matejak :-) Thanks for taking the time to look! I think this fits best in conjunction with my thoughts on #6889 and #6915 (the off-topic ones) if you haven't read them.

Our project started with references as rule properties, but it is clearly visible that if the situation continues to develop at the this pace, we will end up with rule definition files that will contain a scary reference blob that won't be human-readable.

I definitely hear and understand that. I'll start thinking about how to solve that. :-) Do you have ideas yet?

The proposal would introduce duplication of information - part of which would be present in comments.

I'd like to push back here. While the information is duplicated in the strictest since (and in conjunction with #6889 and #6915) the intent is to have the source of truth ultimately be the profile. This utility is a way to make that profile push references back to the rules themselves. The duplication is necessary for the build system but this tooling makes the profile the canonical reference location, allowing automatic update of the rules in question.

Where I'm coming from is that it is really helpful to have content about a rule centralized in a single place. That's why we created the rule directory (remember the old structure? I barely do!) in the first place: everything about the XCCDF rule entry (including references, ...) are in one place. I'd be fine splitting out a rule into a separate references.yml if you wanted to make auto-generated stuff exist in one place. But if you make controls influence profiles and rules's references (which they don't do presently, correct?) --- now you need another build step and it gets really tough to see where everything (in the resulting XCCDF) is coming from.

The other intent with this is to make changes hmm auditable. You mentioned elsewhere about making sure changes don't unintentionally impact the generated content. I'm 100% on board with that. By checking in the resulting profiles and references, I feel it becomes easier to audit changes. You know any changes in content must reflect back into existing rules that are checked in, else they'll be ignored. For large changes (consider an upgrade from CIS benchmark at v1.0.0 to v3.0.x) --- being able to do so incrementally across multiple PRs (but having a large portion being automated such as references & prodtypes) would be a big help.

With things not happening until the actual build stage, you're stuck auditing build artifacts. The resulting XCCDF still isn't completely diff-able IIUC. We had that problem when I was an intern too. By taking some of the more risky moving parts (bulk changes of any sort -- profile generation, prodtypes, references, &c) and putting them back into the existing content system (rather than as another layer on top) --- you restore confidence in the content you're shipping.

You could even go so far as to add some of these steps to the test system if you prefer. The goal of my tooling is to be minimal and reproducible -- this makes review simple: run the same tools locally and compare the generated output or see if they make any new changes. They shouldn't if the PR is valid (is the assertion I'm trying to bake in with these tool changes).

We're just beginning our CaC journey at Canonical, but already this tool has been very helpful for importing our existing CIS tooling into the CaC build system.


The goal with this PR is to provide something that works for us, now. IIUC this is a capability that the controls file currently lacks.

Over time, as we discuss and reach consensus over what changes we all agree to make, we can update our tooling and work towards that goal. This is, IMO, an important step towards making controls file (whether directly or via an intermediate, checked-in profile representation) automatically generate references. I'm in no way saying that this tool is perfect or fully complete.

Yeah, there's still work to be done in the ultimate goal, but having some starting code that we can then change, reuse, extend, whatever to fit the eventual end goal is a lot better than having none. :-)

(Or who knows, maybe it is better to throw it all away, and start over, but at least there's some existing code that solved a very similar problem already).


Perhaps I should combine all my various discussions on what I'm envisioning into a single document for our call later, would that help?

Hope this helps :-)

@vojtapolasek vojtapolasek added this to the 0.1.56 milestone May 6, 2021
@matejak
Copy link
Member

matejak commented May 6, 2021

Hi @matejak :-) Thanks for taking the time to look! I think this fits best in conjunction with my thoughts on #6889 and #6915 (the off-topic ones) if you haven't read them.

Our project started with references as rule properties, but it is clearly visible that if the situation continues to develop at the this pace, we will end up with rule definition files that will contain a scary reference blob that won't be human-readable.

I definitely hear and understand that. I'll start thinking about how to solve that. :-) Do you have ideas yet?

The proposal would introduce duplication of information - part of which would be present in comments.

I'd like to push back here. While the information is duplicated in the strictest since (and in conjunction with #6889 and #6915) the intent is to have the source of truth ultimately be the profile. This utility is a way to make that profile push references back to the rules themselves. The duplication is necessary for the build system but this tooling makes the profile the canonical reference location, allowing automatic update of the rules in question.

I understand the thought that there would the primary source in the profile file, but as long as the build system looks into rules, then there is duplication. Moreover, some references are currently present only in rules, so a human being can't tell just by looking whether a reference in a rule YAML is derived from a profile file comment, or it exists just in the rule yaml. Obviously, one could enforce that every reference in rules has to be derived from a reference in a profile file, but we would be then talking about a backward-incompatible change of input formats in the project, which can't be introduced as a PR, but it needs public discussion in the ML at least.

Where I'm coming from is that it is really helpful to have content about a rule centralized in a single place. That's why we created the rule directory (remember the old structure? I barely do!) in the first place: everything about the XCCDF rule entry (including references, ...) are in one place. I'd be fine splitting out a rule into a separate references.yml if you wanted to make auto-generated stuff exist in one place. But if you make controls influence profiles and rules's references (which they don't do presently, correct?) --- now you need another build step and it gets really tough to see where everything (in the resulting XCCDF) is coming from.

I agree that there would have to be another build step. But I don't see that as something bad and scary - just take a look how a compiled rule differs from a rule in the source tree - there would be one more key s.a. control-based references, and that would be it.
Again, reference is not a rule property. A rule is e.g. sshd can't allow root to log in. It has description, rationale, remediations, and checks. Then there are identifiers, which are supposed to uniquely identify the rule, so although they are derived, they are permanent and unambiguous, so it makes sense to store them close to the rule instead in some kind of rule ID -> other IDs mapping data structure.
References to policies are something entirely different - they are subjective, there is usually some sort of reasoning related to the relationship, and they can also change. I don't like edits to a blob in the rule yaml file, although the rule itself hasn't changed a bit.

The other intent with this is to make changes hmm auditable. You mentioned elsewhere about making sure changes don't unintentionally impact the generated content. I'm 100% on board with that. By checking in the resulting profiles and references, I feel it becomes easier to audit changes. You know any changes in content must reflect back into existing rules that are checked in, else they'll be ignored. For large changes (consider an upgrade from CIS benchmark at v1.0.0 to v3.0.x) --- being able to do so incrementally across multiple PRs (but having a large portion being automated such as references & prodtypes) would be a big help.

With things not happening until the actual build stage, you're stuck auditing build artifacts. The resulting XCCDF still isn't completely diff-able IIUC. We had that problem when I was an intern too. By taking some of the more risky moving parts (bulk changes of any sort -- profile generation, prodtypes, references, &c) and putting them back into the existing content system (rather than as another layer on top) --- you restore confidence in the content you're shipping.

prodtypes are another pain, yes. And I agree with the emphasis on the auditable behavior. Maybe I get it wrong, but I don't see the difference between auditing source rule files that have partial machine-generated content, and auditing compiled rule files that are a build artifact, but they are otherwise the same. We already have some kind of auditing of profile changes: https://complianceascode.github.io/template/2020/03/06/profile_guard.html And we already have a compiler of rule references to an ordered json: https://github.com/ComplianceAsCode/content/blob/master/utils/gen_tables.py that could be used in a similar way.

You could even go so far as to add some of these steps to the test system if you prefer. The goal of my tooling is to be minimal and reproducible -- this makes review simple: run the same tools locally and compare the generated output or see if they make any new changes. They shouldn't if the PR is valid (is the assertion I'm trying to bake in with these tool changes).

We're just beginning our CaC journey at Canonical, but already this tool has been very helpful for importing our existing CIS tooling into the CaC build system.

The goal with this PR is to provide something that works for us, now. IIUC this is a capability that the controls file currently lacks.

Over time, as we discuss and reach consensus over what changes we all agree to make, we can update our tooling and work towards that goal. This is, IMO, an important step towards making controls file (whether directly or via an intermediate, checked-in profile representation) automatically generate references. I'm in no way saying that this tool is perfect or fully complete.

I think that you can benefit from the tooling without having it merged upstream. As I have outlined above, your proposal has big consequences, as basically changes the content input workflow and interpretation of the source yamls, and such proposal is much bigger than a PR. While it may improve workflows, I believe that the steps that the whole project makes should, among other things, play nice with backward compatibility, and they should introduce workflow changes that are likely not to change soon again, which I see as weaknesses of your proposal.

Yeah, there's still work to be done in the ultimate goal, but having some starting code that we can then change, reuse, extend, whatever to fit the eventual end goal is a lot better than having none. :-)

(Or who knows, maybe it is better to throw it all away, and start over, but at least there's some existing code that solved a very similar problem already).

Perhaps I should combine all my various discussions on what I'm envisioning into a single document for our call later, would that help?

As I have already said, I believe that some of the code should be reused in the bigger-picture task to enhance the build system in a way that is not clear yet. We think that something needs to be done wrt reference handling, but parsing comments in profile files is not something that we would like. Regarding your proposal to accumulate ideas into a document - that's a great idea, go for it.

@cipherboy
Copy link
Contributor Author

Just two quick follow up points:

  1. I think part of the question is around what is a rule. In your mind, it sounds like it is purely a state+action that needs to be taken to a system. From my perspective, it is still the full <XCCDF:Rule /> element, in a nicer form.
  2. Which ML? I joined the public open-scap-list <[email protected]> but its never had much traffic, especially on these type of discussions (at least, as far as I recall in the past) --- is there a separate public developer mailing list I've never found? :-)

Happy to organize my thoughts into a document form if that benefits the community. I was hoping that GH Issue/PR with something concrete would be easy enough to see and respond to, but I guess not. :-)

@JAORMX
Copy link
Contributor

JAORMX commented May 7, 2021

/retest

@cipherboy
Copy link
Contributor Author

Closing this and carrying this downstream.

@cipherboy cipherboy closed this May 10, 2021
cipherboy added a commit to cipherboy/scap-security-guide that referenced this pull request Jun 14, 2021
This pull request (ComplianceAsCode#6908) was rejected upstream; remove its
corresponding documentation.

Signed-off-by: Alexander Scheel <[email protected]>
cipherboy added a commit to cipherboy/scap-security-guide that referenced this pull request Jul 16, 2021
This pull request (ComplianceAsCode#6908) was rejected upstream; remove its
corresponding documentation.

Signed-off-by: Alexander Scheel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement General enhancements to the project. Infrastructure Our content build system standards Benchmarks related.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants