How does the CodeQL C extractor work #21739
Replies: 2 comments
-
|
Hi @scheduler-v1, We have a list of publications related to CodeQL at https://codeql.github.com/publications/. That website also contains other relevant information about the QL language, which you may find useful if you haven't seen it before.
Broadly, extractors traverse the AST of the program and store that information in TRAP files. These are then turned into corresponding databases by the CLI. The database classes broadly correspond to different types of nodes in the AST.
The C/C++ extractor is not open-source. You may however find e.g. the source code for the Go extractor in this repository at https://github.com/github/codeql/tree/main/go/extractor which should give you a practical example. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, that clears things up. So my understanding is that the extractor is basically responsible for walking the C/C++ AST and emitting TRAP facts that match the dbscheme. Then the CodeQL CLI imports those TRAP files and turns them into the actual database. That also explains the It’s a bit unfortunate for research purposes that the C/C++ extractor itself is not open source, but the Go extractor sounds like a useful reference implementation for understanding the general architecture. For a paper, I think the safest way to describe it would be something like:
Thanks again for confirming this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on an academic research paper that documents some of the inner workings of CodeQL with the example of analyzing C code. I have gotten so far as to understand how the language itself works, but now I need to go a step further and find out how the extractors create a database.
My findings so far end at the internal database classes that start with @ (e.g. @varaccess). The only thing I could find was that the dbscheme file assigns numbers to these internal database classes.
But how do the extractors know which database class an expression should be assigned to? Is there documentation for the C extractor? Is the source code for the C extractor available in this repository?
Beta Was this translation helpful? Give feedback.
All reactions