Skip to content

Conversation

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Jan 16, 2020

We already modelled strdup as an allocation function. This PR adds ArrayFunction and DataFlowFunction models for it as well (for https://jira.semmle.com/browse/CPP-466).

Notably DataFlowUtil.qll has extremely sparse support for possible data flows through modelled functions. I suspect I will be making many changes to it in the coming days / weeks so please be super critical and/or constructive about that part.

@geoffw0 geoffw0 added the C++ label Jan 16, 2020
@geoffw0 geoffw0 requested a review from a team as a code owner January 16, 2020 11:12
outModel.isReturnValue()
or
inModel.isParameterDeref(iIn) and
outModel.isReturnValueDeref()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that there should not be data flow from x to strdup(x), only taint flow. The distinction between data and taint flow remains a bit blurry, and in the end it's about what people might use the libraries for. I can imagine using data flow to follow a pointer from malloc to free. Then I'd not want to follow it across strdup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to taint tracking - and then realized that it's the same as one of the cases in exprToDefinitionByReferenceStep only to a ReturnValueDeref rather than a ParameterDeref ... and in fact all of the cases in exprToDefinitionByReferenceStep should be duplicated for return values in the same way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems right to me, but I'd like to hear the opinion of @rdmarsh2 before merging.

@rdmarsh2, I think you were involved the last time we made the taint-tracking library use the "deref" data-flow models to conflate pointers with their objects. We did it only for reference arguments, and this PR now extends it it to cover return values as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the same caveats about false positives, and I'd like to see some testing on real code, but I'm for it.

outModel.isReturnValue()
or
inModel.isParameterDeref(iIn) and
outModel.isReturnValueDeref()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems right to me, but I'd like to hear the opinion of @rdmarsh2 before merging.

@rdmarsh2, I think you were involved the last time we made the taint-tracking library use the "deref" data-flow models to conflate pointers with their objects. We did it only for reference arguments, and this PR now extends it it to cover return values as well.

@jbj
Copy link
Contributor

jbj commented Jan 17, 2020

When the change note is fixed, I'm happy to merge this PR. Then we can see in the nightly CPP-Differences whether there are any surprising FPs.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this. @rdmarsh2, if you test and find a problem with this PR, it should be easy enough to revert the offending code.

@jbj jbj merged commit 3632d51 into github:master Jan 17, 2020
@geoffw0
Copy link
Contributor Author

geoffw0 commented Jan 17, 2020

I plan to have a thorough look through the CPP-diffs on Monday (as suggested above).

@jbj
Copy link
Contributor

jbj commented Jan 20, 2020

You'll have to wait for the submodule bump in https://git.semmle.com/Semmle/code/pull/36032 before the results show up.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Jan 20, 2020

You'll have to wait for the submodule bump in https://git.semmle.com/Semmle/code/pull/36032 before the results show up.

Yep, I noticed this change is not reflected in the latest CPP-diffs run (https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/742/).

@geoffw0 geoffw0 mentioned this pull request Jan 23, 2020
@geoffw0
Copy link
Contributor Author

geoffw0 commented Jan 23, 2020

The corresponding diff job is https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/743/ and it contains no changes to query results. I'm not sure how surprising this is as I'm not sure how many queries are (just now) still using the old security library taint anyway.

@geoffw0 geoffw0 deleted the modelstrdup branch February 10, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants