CAO Bin 曹斌
I am working in AI4CM (Artificial Intelligence for Computational Materials), with a focus on crystallography and spectroscopy. My research spans physics-based diffraction pattern simulation, machine learning representations in spectrum-based sequence models, and graph-based modeling of crystal structures. You can learn more on my personal website: www.caobin.asia.
🔬 Research Focus
- Crystal structure representation for downstream property prediction and material generation.
- Spectral modeling and representation for crystal structure and symmetry identification.
In parallel, I actively promote active learning in materials science through the development of BGOlearn, a Bayesian global optimization package tailored for materials design. Collaborating with experimental teams, I continuously enhance BGOlearn to bridge mature ML techniques with real-world scientific workflows.
🌍 Open Science Advocacy
I am passionate about open science and firmly support the unrestricted sharing of knowledge. To that end, I openly release the code and datasets from my research to ensure transparency, reproducibility, and community benefit.
🎓 Academic Path
I completed my MPhil under the supervision of Prof. Zhang Tong-yi and am currently pursuing a PhD with him at The Hong Kong University of Science and Technology (Guangzhou) since 2023.
📝 Selected Publications (co-/first author)

Tianliang Li, Lifei Chen(co-first author) Bin Cao(co-first author), Siyuan Liu,…, Tong-Yi Zhang, Lingyan Feng
- This work developed an integrated AL software, BgoFace, which satisfies most material property optimization re-quirements. The application of BgoFace (with default setting) successfully accel-erated the discovery of G4-based CPL materials, achievingresults within six iterations and synthesizing 24 experimentalgroups. The final QY nearly doubled the initial best QY inthe training dataset.
- Optimize the quantum yield of G‐quartet‐based circularly polarized luminescence materials via active learning strategy‐BgoFace Tianliang Li, Lifei Chen(co-first author) Bin Cao(co-first author), Siyuan Liu,…, Tong-Yi Zhang, Lingyan Feng, MGE advances

Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey
Zhixun Li, Bin Cao(co-first author), Rui Jiao(co-first author), Liang Wang(co-first author), Ding Wang, Yang Liu, Dingshuo Chen, Jia Li, Qiang Liu, Yu Rong, Liang Wang, Tong-Yi Zhang, Jeffrey Xu Yu MatGen
- We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field.
- Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey Zhixun Li, Bin Cao(co-first author), Rui Jiao(co-first author), Liang Wang(co-first author), Ding Wang, Yang Liu, Dingshuo Chen, Jia Li, Qiang Liu, Yu Rong, Liang Wang, Tong-Yi Zhang, Jeffrey Xu Yu, arXiv

Tianliang Li, Bin Cao(co-first author), Yitong Wang, Lixing Lin, …, Lingyan Feng, Tong-yi Zhang
- We apply an interpretable AL strategy to efficiently optimize the photothermal conversion efficiency (PCE) of carbon dots (CDs) in photothermal therapy (PTT). Using this approach, we successfully synthesized irondoped CDs (Fe-CDs) with PCE exceeding 78.7% after only 16 experimental trials over four iterations.
- Interpretable Active Learning Identifies Iron-Doped Carbon Dots With High Photothermal Conversion Efficiency for Antitumor Synergistic Therapy Tianliang Li, Bin Cao(co-first author), Yitong Wang, Lixing Lin, …, Lingyan Feng, Tong-yi Zhang, Aggregate (JCR Q1)

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang
- We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database.
- SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang, ICLR2025 (Top-tier AI conference)

Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy
Tianliang Li, Bin Cao(co-first author), Tianhao Su, …, Lingyan Feng, Tong-yi Zhang
- A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.
- Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy Tianliang Li, Bin Cao(co-first author), Tianhao Su, …, Lingyan Feng, Tong-yi Zhang SMALL (JCR Q1)

CGWGAN: crystal generative framework based on Wyckoff generative adversarial network
Tianhao Su, Bin Cao(co-first author), Shunbo Hu, Musen Li, Tong-yi Zhang
- In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.
- CGWGAN: crystal generative framework based on Wyckoff generative adversarial networkTianhao Su, Bin Cao(co-first author), Shunbo Hu, Musen Li, Tong-yi Zhang JMI (New journal led by my supervisor, Prof. Zhang Tongyi.)

SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang
- In this work, a large open-source dataset of powder XRD patterns designed for symmetry identification. 21 existing ML models are assessed, summarizing the XRD sequence data characteristics, and providing suggestions for the further development of ML models best suited for analyzing XRD patterns.
- SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry ClassificationBin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang arXiv

Shouyang Zhang, Bin Cao (co-first), Tianhao Su, Yue Wu, Zhenjie Feng, Jie Xiong, Tong-Yi Zhang
- In this work, we developed a machine learning phase identifier that achieved excellent performance within a relatively small scope.
- Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction PatternsShouyang Zhang, Bin Cao (co-first), Tianhao Su, Yue Wu, Zhenjie Feng, Jie Xiong, Tong-Yi Zhang IUCrJ (JCR Q1)

B Cao, T Su, S Yu, T Li, T Zhang, Z Dong, TY Zhang
- To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn…
- Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys B Cao, T Su, S Yu, T Li, T Zhang, Z Dong, TY Zhang Material & Design (JCR Q1)

MLMD: a programming-free AI platform to predict and design materials
Jiaxuan Ma, Bin Cao (co-first author), Shuya Dong, Yuan Tian, Menghuan Wang, Jie Xiong, Sheng Sun
- We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning..
- MLMD: a programming-free AI platform to predict and design materials Ma, J., Cao, B(co-first)., Dong, S. et al. npj Comput Mater 10, 59 (2024) (JCR Q1)

Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility Qinghua Wei, Bin Cao (co-first author), Hao Yuan (co-first author), Youyang Chen, Kangdong You, Shuting Yu, Tixin Yang, Ziqiang Dong, Tong-Yi Zhang
- In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)…
- Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility. Qinghua Wei, Bin Cao (co-first author), Hao Yuan (co-first author), Youyang Chen, Kangdong You, Shuting Yu, Tixin Yang, Ziqiang Dong, Tong-Yi Zhang npj Comput Mater 201 (2023) (JCR Q1)

(Cover paper)Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water Bin Cao, Shuang Yang, Ankang Sun, Ziqiang Dong, Tong-Yi Zhang
- In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…
- Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water Cao B, Yang S, Sun A, Dong Z, Zhang TY. Journal of Materials Informatics(2022) (New journal led by my supervisor, Prof. Zhang Tongyi.)
…