Overview
CoRe is a high-quality, human-verified benchmark designed to evaluate LLMs on fundamental static analysis tasks. CoRe includes 12,553 task instances spanning data dependency, control dependency, and information flow across programs written in C/C++, Java, and Python. To ensure semantic diversity and reasoning complexity, we propose a semantics-aware diverse sampling strategy that selects targets and task instances based on structural coverage and dependency depth.
Dependency Classification
Trace Generation
Dependency Source Enumeration
F1
# | Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|---|---|---|---|---|
1 | Model A | 95.1 | 92.3 | 93.7 | 93.7 |
2 | Model B | 91.0 | 89.5 | 90.1 | 90.2 |
Correct Trace Rate (%)
# | Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|---|---|---|---|---|
1 | Model A | 83.0 | 80.5 | 78.2 | 80.6 |
2 | Model B | 75.0 | 73.1 | 72.5 | 73.5 |
Exact Match (%)
# | Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|---|---|---|---|---|
1 | Model A | 70.0 | 72.3 | 71.4 | 71.2 |
2 | Model B | 60.0 | 62.5 | 61.9 | 61.5 |
Examples
Data Dependency
Control Dependency
Information Flow
# Example Data Dependency
input_code = """
SELECT name FROM students WHERE grade > 3.5;
"""
expected = {
"tables": ["students"],
"columns": ["name", "grade"],
}
def factorial(n: int) -> int:
if n <= 1:
return 1
return n * factorial(n - 1)
// Example InfoFlow
function readUserToken() {
const token = localStorage.getItem('token');
sendToServer(token);
}
Citation
@misc{your2025dataset, title = {Your Awesome Dataset}, author = {Doe, Jane and Doe, John}, howpublished = {\url{https://yourdataset.github.io}}, year = {2025} }