CoRe🥑

Overview

CoRe is a high-quality, human-verified benchmark designed to evaluate LLMs on fundamental static analysis tasks. CoRe includes 12,553 task instances spanning data dependency, control dependency, and information flow across programs written in C/C++, Java, and Python. To ensure semantic diversity and reasoning complexity, we propose a semantics-aware diverse sampling strategy that selects targets and task instances based on structural coverage and dependency depth.

Dependency Classification

Trace Generation

Dependency Source Enumeration

F1

#	Model	Data Dependency	Control Dependency	Information Flow	Overall
1	Model A	95.1	92.3	93.7	93.7
2	Model B	91.0	89.5	90.1	90.2

Correct Trace Rate (%)

#	Model	Data Dependency	Control Dependency	Information Flow	Overall
1	Model A	83.0	80.5	78.2	80.6
2	Model B	75.0	73.1	72.5	73.5

Exact Match (%)

#	Model	Data Dependency	Control Dependency	Information Flow	Overall
1	Model A	70.0	72.3	71.4	71.2
2	Model B	60.0	62.5	61.9	61.5

Examples

Data Dependency

Control Dependency

Information Flow

# Example Data Dependency
input_code = """
SELECT name FROM students WHERE grade > 3.5;
"""

expected = {
    "tables": ["students"],
    "columns": ["name", "grade"],
}

def factorial(n: int) -> int:
    if n <= 1:
        return 1
    return n * factorial(n - 1)

// Example InfoFlow
function readUserToken() {
  const token = localStorage.getItem('token');
  sendToServer(token);
}

Citation

@misc{your2025dataset,
  title        = {Your Awesome Dataset},
  author       = {Doe, Jane and Doe, John},
  howpublished = {\url{https://yourdataset.github.io}},
  year         = {2025}
}