💡 Overview
CoRe is a high-quality, human-verified benchmark designed to evaluate LLMs on fundamental static analysis tasks. CoRe includes 12,553 task instances spanning data dependency, control dependency, and information flow across programs written in C/C++, Java, and Python. To ensure semantic diversity and reasoning complexity, we propose a semantics-aware diverse sampling strategy that selects targets and task instances based on structural coverage and dependency depth.
🏆 Leaderboard
Dependency Classification
Trace Generation
Dependency Source Enumeration
F1 Score
Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|
Correct Trace Rate (%)
Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|
Exact Match (%)
Model | Data Dependency | Control Dependency | Information Flow | Overall |
---|
📝 Citation
@article{xie2025core,
title={CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks},
author={Xie, Danning and Zheng, Mingwei and Liu, Xuwei and Wang, Jiannan and Wang, Chengpeng and Tan, Lin and Zhang, Xiangyu},
journal={arXiv preprint arXiv:2507.05269},
year={2025}
}