I am a PhD student at HKUST (Guangzhou), working with Prof. ZHANG Tong-Yi. My research focuses on artificial intelligence for materials science, with particular emphasis on algorithm development for crystallography and spectroscopy. During my PhD, I also served as a visiting student at City University of Hong Kong, working with Prof. REN Yang, and an intern at Shanghai AI Lab (AI4S team), working with Dr. HAO Hong Xia.
Before joining HKUST (Guangzhou), I earned an MPhil in Mechanics at Shanghai University, working with Prof. ZHANG Tong-Yi. During this period, I also interned at Zhejiang Laboratory, where I worked on transfer learning for materials science. I received my BEng in Chemical Machinery from Beijing University of Chemical Technology, where my work focused on finite element and chemistry.
In recent years, I developed a series of machine-learning algorithms for crystal structure determination (XQueryer), crystal property prediction (PRDNet), and novel crystal discovery (SimXRD). My first-author papers have appeared in materials science journals (e.g., National Science Review and Science Bulletin) and AI conferences such as ICLR.
One of my long-term projects, Bgolearn, received support from the Shanghai Artificial Intelligence Open Source Award Project Support Plan, where I serve as the principal developer. The project was awarded RMB 500,000 to support the development of the open-source Bgolearn platform. Outside of research, I enjoy jogging and watching movies.
") does not match the recommended repository name for your site ("").
", so that your site can be accessed directly at "http://".
However, if the current repository name is intended, you can ignore this message by removing "{% include widgets/debug_repo_name.html %}" in index.html.
",
which does not match the baseurl ("") configured in _config.yml.
baseurl in _config.yml to "".

Cao Bin, Liu Yang#, Zhang Longhan, Wu Yifan, Luo Yuyu, Cheng Hong, Ren Yang#, Zhang Tongyi# (# corresponding author)
arXiv 2026 Under Review
We propose PRDNet, a novel architecture that integrates graph embeddings with a learned pseudoparticle diffraction module. It generates synthetic diffraction patterns that are invariant to crystallographic symmetries. We extensively evaluate PRDNet on multiple large-scale benchmarks, including Materials Project, JARVIS-DFT, and MatBench. Our model achieves state-of-the-art performance across a wide range of crystal property prediction tasks, demonstrating its effectiveness.
Cao Bin, Liu Yang#, Zhang Longhan, Wu Yifan, Luo Yuyu, Cheng Hong, Ren Yang#, Zhang Tongyi# (# corresponding author)
arXiv 2026 Under Review
We propose PRDNet, a novel architecture that integrates graph embeddings with a learned pseudoparticle diffraction module. It generates synthetic diffraction patterns that are invariant to crystallographic symmetries. We extensively evaluate PRDNet on multiple large-scale benchmarks, including Materials Project, JARVIS-DFT, and MatBench. Our model achieves state-of-the-art performance across a wide range of crystal property prediction tasks, demonstrating its effectiveness.

Cao Bin*, Qin Yin#, Luo Yan*, Ying Zhehan, Yan Zilin, Weng Tu-Tao, Li Kaikai#, Zhang Tongyi# (* equal contribution, # corresponding author)
Science Bulletin 2025
Here, we present a spatially adaptive active-learning framework with closed-loop experimentation for targeted catalyst optimization. Bayesian optimization and a conditional variational autoencoder first identify a low-overpotential stability subspace, followed by active learning to pinpoint the most stable candidate. This strategy leads to the discovery of a Cu–RuO₂ catalyst with outstanding durability (625 h) and a low overpotential of 177 mV at 10 mA cm⁻². Our results highlight an efficient AI-driven pathway for accelerating the design of stable acidic OER catalysts.
Cao Bin*, Qin Yin#, Luo Yan*, Ying Zhehan, Yan Zilin, Weng Tu-Tao, Li Kaikai#, Zhang Tongyi# (* equal contribution, # corresponding author)
Science Bulletin 2025
Here, we present a spatially adaptive active-learning framework with closed-loop experimentation for targeted catalyst optimization. Bayesian optimization and a conditional variational autoencoder first identify a low-overpotential stability subspace, followed by active learning to pinpoint the most stable candidate. This strategy leads to the discovery of a Cu–RuO₂ catalyst with outstanding durability (625 h) and a low overpotential of 177 mV at 10 mA cm⁻². Our results highlight an efficient AI-driven pathway for accelerating the design of stable acidic OER catalysts.

Cao Bin, Zheng Zinan, Liu Yang, Zhang Longhan, Wong W-Y Lawrence, Weng Tu-Tao, Li Jia#, Li Haoxiang#, Zhang Tongyi# (# corresponding author)
National Science Review 2025
We developed XQueryer, an intelligent agent for simulating, recognizing, and analyzing powder X-ray diffraction (PXRD) patterns. Trained on over two million high-fidelity simulated spectra, XQueryer achieves significantly higher accuracy—28.9% better than existing AI models and traditional methods. Integrated with a powder diffractometer, it enables real-time structural analysis of crystal samples.
Cao Bin, Zheng Zinan, Liu Yang, Zhang Longhan, Wong W-Y Lawrence, Weng Tu-Tao, Li Jia#, Li Haoxiang#, Zhang Tongyi# (# corresponding author)
National Science Review 2025
We developed XQueryer, an intelligent agent for simulating, recognizing, and analyzing powder X-ray diffraction (PXRD) patterns. Trained on over two million high-fidelity simulated spectra, XQueryer achieves significantly higher accuracy—28.9% better than existing AI models and traditional methods. Integrated with a powder diffractometer, it enables real-time structural analysis of crystal samples.

Li Tianliang*, Chen Lifei*, Cao Bin*, Liu Siyuan, Lin Lixing, Li Zeyu, Chen Yingying, Li Zhenzhen, Zhang Tongyi#, Feng Linyan# (* equal contribution, # corresponding author)
MGE advances 2025
This work developed an integrated AL software, BgoFace, which satisfies most material property optimization re-quirements. The application of BgoFace (with default setting) successfully accel-erated the discovery of G4-based CPL materials, achievingresults within six iterations and synthesizing 24 experimentalgroups. The final QY nearly doubled the initial best QY inthe training dataset.
Li Tianliang*, Chen Lifei*, Cao Bin*, Liu Siyuan, Lin Lixing, Li Zeyu, Chen Yingying, Li Zhenzhen, Zhang Tongyi#, Feng Linyan# (* equal contribution, # corresponding author)
MGE advances 2025
This work developed an integrated AL software, BgoFace, which satisfies most material property optimization re-quirements. The application of BgoFace (with default setting) successfully accel-erated the discovery of G4-based CPL materials, achievingresults within six iterations and synthesizing 24 experimentalgroups. The final QY nearly doubled the initial best QY inthe training dataset.

Li Zhixun*, Cao Bin*, Jiao Rui*, Wang Liang*, Wang Ding, Liu Yang, Chen Dingshuo, Li Jia, Liu Yu, Wang Liang, Zhang Tongyi, Yu Xu Jeffrey (* equal contribution)
arXiv 2025
We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field.
Li Zhixun*, Cao Bin*, Jiao Rui*, Wang Liang*, Wang Ding, Liu Yang, Chen Dingshuo, Li Jia, Liu Yu, Wang Liang, Zhang Tongyi, Yu Xu Jeffrey (* equal contribution)
arXiv 2025
We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field.

Li Tianliang*, Cao Bin*, Wang Yitong, Lin Lixing, Chen Lifei, Su Tianhao, Song Haicheng, Ren Yuze, Zhang Longhan, Chen Yingying, Li Zhenzhen, Feng Linyan#, Zhang Tongyi# (* equal contribution, # corresponding author)
Aggregate 2025
We apply an interpretable AL strategy to efficiently optimize the photothermal conversion efficiency (PCE) of carbon dots (CDs) in photothermal therapy (PTT). Using this approach, we successfully synthesized irondoped CDs (Fe-CDs) with PCE exceeding 78.7% after only 16 experimental trials over four iterations.
Li Tianliang*, Cao Bin*, Wang Yitong, Lin Lixing, Chen Lifei, Su Tianhao, Song Haicheng, Ren Yuze, Zhang Longhan, Chen Yingying, Li Zhenzhen, Feng Linyan#, Zhang Tongyi# (* equal contribution, # corresponding author)
Aggregate 2025
We apply an interpretable AL strategy to efficiently optimize the photothermal conversion efficiency (PCE) of carbon dots (CDs) in photothermal therapy (PTT). Using this approach, we successfully synthesized irondoped CDs (Fe-CDs) with PCE exceeding 78.7% after only 16 experimental trials over four iterations.

Cao Bin*, Liu Yang*, Zheng Zinan*, Tan Ruifeng, Li Jia#, Zhang Tongyi# (* equal contribution, # corresponding author)
International Conference on Learning Representations (ICLR) 2025 Top tier AI conference
We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database. SimXRD comprises 4,065,346 simulated powder XRD patterns, representing 119,569 unique crystal structures under 33 simulated conditions that reflect real-world variations. We benchmark 21 sequence models in both in-library and out-of-library scenarios and analyze the impact of class imbalance in longtailed crystal label distributions. Remarkably, we find that: (1) current neural networks struggle with classifying low-frequency crystals, particularly in out-oflibrary situations; (2) models trained on SimXRD can generalize to real experimental data.
Cao Bin*, Liu Yang*, Zheng Zinan*, Tan Ruifeng, Li Jia#, Zhang Tongyi# (* equal contribution, # corresponding author)
International Conference on Learning Representations (ICLR) 2025 Top tier AI conference
We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database. SimXRD comprises 4,065,346 simulated powder XRD patterns, representing 119,569 unique crystal structures under 33 simulated conditions that reflect real-world variations. We benchmark 21 sequence models in both in-library and out-of-library scenarios and analyze the impact of class imbalance in longtailed crystal label distributions. Remarkably, we find that: (1) current neural networks struggle with classifying low-frequency crystals, particularly in out-oflibrary situations; (2) models trained on SimXRD can generalize to real experimental data.

Li Tianliang*, Cao Bin*, Su Tianhao*, Lin Lixing, Wang Dong, Liu Xinting, Wan haoyu, Ji Haiwei, He Zixuan, Chen Yingying, Feng Lingyan#, Zhang Tongyi (* equal contribution, # corresponding author)
Small 2024
A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.
Li Tianliang*, Cao Bin*, Su Tianhao*, Lin Lixing, Wang Dong, Liu Xinting, Wan haoyu, Ji Haiwei, He Zixuan, Chen Yingying, Feng Lingyan#, Zhang Tongyi (* equal contribution, # corresponding author)
Small 2024
A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.

Su Tianhao*, Cao Bin*, Hu Shunbo, Li Musen, Zhang Tongyi# (* equal contribution, # corresponding author)
journal of material informatics 2024
In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.
Su Tianhao*, Cao Bin*, Hu Shunbo, Li Musen, Zhang Tongyi# (* equal contribution, # corresponding author)
journal of material informatics 2024
In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.

Zhang Shouyang*, Cao Bin*, Su Tianhao, Wu Yue, Feng Zhenjie, Xiong Jie#, Zhang Tongyi# (* equal contribution, # corresponding author)
IUCrJ 2024
In this work, we developed a machine learning phase identifier that achieved excellent performance for structure identification from powder diffraction patterns.
Zhang Shouyang*, Cao Bin*, Su Tianhao, Wu Yue, Feng Zhenjie, Xiong Jie#, Zhang Tongyi# (* equal contribution, # corresponding author)
IUCrJ 2024
In this work, we developed a machine learning phase identifier that achieved excellent performance for structure identification from powder diffraction patterns.

Cao Bin, Su Tianhao, Yv Shuting, Li Tianyuan, Zhang Taolue, Dong Ziqiang#, Zhang Tongyi# (# corresponding author)
Materials & Design 2024
To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn
Cao Bin, Su Tianhao, Yv Shuting, Li Tianyuan, Zhang Taolue, Dong Ziqiang#, Zhang Tongyi# (# corresponding author)
Materials & Design 2024
To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn

Ma Jiaxuan*, Cao Bin*, Dong Shuya, Tian Yuan, Wang Menghuan, Xiong Jie#, Sun Sheng# (* equal contribution, # corresponding author)
npj Computational Materials 2024
We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning.
Ma Jiaxuan*, Cao Bin*, Dong Shuya, Tian Yuan, Wang Menghuan, Xiong Jie#, Sun Sheng# (* equal contribution, # corresponding author)
npj Computational Materials 2024
We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning.

Wei Qinghua*, Cao Bin*, Yuan Hao*, Chen Youyang, You Kangdong, Yv Shuting, Yang Tixin, Dong Ziqiang#, Zhang Tongyi# (* equal contribution, # corresponding author)
npj Computational Materials 2023
In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)….
Wei Qinghua*, Cao Bin*, Yuan Hao*, Chen Youyang, You Kangdong, Yv Shuting, Yang Tixin, Dong Ziqiang#, Zhang Tongyi# (* equal contribution, # corresponding author)
npj Computational Materials 2023
In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)….

Cao Bin, Yang Shuang, Sun Ankang, Dong Ziqing#, Zhang Tongyi# (# corresponding author)
journal of material informatics 2022 Cover Paper & 2024 Best Paper Award
In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…
Cao Bin, Yang Shuang, Sun Ankang, Dong Ziqing#, Zhang Tongyi# (# corresponding author)
journal of material informatics 2022 Cover Paper & 2024 Best Paper Award
In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…