CAO Bin 曹斌
I am engaged in AI4CM (AI for Computational Materials) research, focusing on crystallography and spectroscopy (http://www.caobin.asia). My research primarily includes physics-based diffraction pattern simulation, machine learning representations in spectrum-based sequence models, and crystal-based graph structures. My main areas of study are:
- Crystal structure representation for downstream property prediction and generation.
- Spectrum representation for crystal structure and symmetry identification.
In addition, I actively promote active learning applications in materials science by developing BGOlearn (Bayesian optimization package). I collaborate with experimental research teams and enhance the BGOlearn to advance the application of mature machine learning techniques in the materials community.
I am passionate about open science and strongly advocate for the unrestricted dissemination of knowledge. To support this vision, I share all code from my research to ensure transparency and accessibility.
Currently, I am pursuing my studies at HKUST(GZ) under the supervision of Professor Zhang Tong-yi.
📝 Publications

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang
- We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database.
- SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang, ICLR2025 (Top-tier AI conference)

Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy
Tianliang Li, Bin Cao(co-first author), Tianhao Su, …, Lingyan Feng, Tong-yi Zhang
- A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.
- Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy Tianliang Li, Bin Cao(co-first author), Tianhao Su, …, Lingyan Feng, Tong-yi Zhang SMALL (JCR Q1)

CGWGAN: crystal generative framework based on Wyckoff generative adversarial network
Tianhao Su, Bin Cao(co-first author), Shunbo Hu, Musen Li, Tong-yi Zhang
- In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.
- CGWGAN: crystal generative framework based on Wyckoff generative adversarial networkTianhao Su, Bin Cao(co-first author), Shunbo Hu, Musen Li, Tong-yi Zhang JMI (New journal led by my supervisor, Prof. Zhang Tongyi.)

SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang
- In this work, a large open-source dataset of powder XRD patterns designed for symmetry identification. 21 existing ML models are assessed, summarizing the XRD sequence data characteristics, and providing suggestions for the further development of ML models best suited for analyzing XRD patterns.
- SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry ClassificationBin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang arXiv

Shouyang Zhang, Bin Cao (co-first), Tianhao Su, Yue Wu, Zhenjie Feng, Jie Xiong, Tong-Yi Zhang
- In this work, we developed a machine learning phase identifier that achieved excellent performance within a relatively small scope.
- Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction PatternsShouyang Zhang, Bin Cao (co-first), Tianhao Su, Yue Wu, Zhenjie Feng, Jie Xiong, Tong-Yi Zhang IUCrJ (JCR Q1)

B Cao, T Su, S Yu, T Li, T Zhang, Z Dong, TY Zhang
- To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn…
- Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys B Cao, T Su, S Yu, T Li, T Zhang, Z Dong, TY Zhang Material & Design (JCR Q1)

MLMD: a programming-free AI platform to predict and design materials
Jiaxuan Ma, Bin Cao (co-first author), Shuya Dong, Yuan Tian, Menghuan Wang, Jie Xiong, Sheng Sun
- We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning..
- MLMD: a programming-free AI platform to predict and design materials Ma, J., Cao, B(co-first)., Dong, S. et al. npj Comput Mater 10, 59 (2024) (JCR Q1)

Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility Qinghua Wei, Bin Cao (co-first author), Hao Yuan (co-first author), Youyang Chen, Kangdong You, Shuting Yu, Tixin Yang, Ziqiang Dong, Tong-Yi Zhang
- In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)…
- Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility. Qinghua Wei, Bin Cao (co-first author), Hao Yuan (co-first author), Youyang Chen, Kangdong You, Shuting Yu, Tixin Yang, Ziqiang Dong, Tong-Yi Zhang npj Comput Mater 201 (2023) (JCR Q1)

(Cover paper)Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water Bin Cao, Shuang Yang, Ankang Sun, Ziqiang Dong, Tong-Yi Zhang
- In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…
- Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water Cao B, Yang S, Sun A, Dong Z, Zhang TY. Journal of Materials Informatics(2022) (New journal led by my supervisor, Prof. Zhang Tongyi.)
…