278
Should we worry about data leakage in materials property prediction?
Short answer: yes. Composition overlap across train and test sets can create a false sense of model robustness, especially when crystal prototypes are near-duplicates. We recently reran a published benchmark with composition-family splits and observed performance drops of 30-50% depending on target property.
Leakage is not always malicious; many datasets were never designed for ML benchmarking. But if we do not define clear split protocols, we cannot compare papers meaningfully. I would love to see a community-maintained suite of leakage-resistant evaluation splits.
Posting as Anonymous Researcher
Comments
Composition-family splits should be mandatory in benchmark papers now. Random splits are no longer defensible for many targets.
30