Portrait
Bin Cao (曹斌)
Phd Student
The Hong Kong University of Science and Technology (Guangzhou)
About Me

I am a PhD student at HKUST (Guangzhou), working with Prof. ZHANG Tong-Yi. My research focuses on artificial intelligence for materials science, with particular emphasis on algorithm development for crystallography and spectroscopy. During my PhD, I also served as a visiting student at City University of Hong Kong, working with Prof. REN Yang, and an intern at Shanghai AI Lab (AI4S team), working with Dr. HAO Hong Xia.

Before joining HKUST (Guangzhou), I earned an MPhil in Mechanics at Shanghai University, working with Prof. ZHANG Tong-Yi. During this period, I also interned at Zhejiang Laboratory, where I worked on transfer learning for materials science. I received my BEng in Chemical Machinery from Beijing University of Chemical Technology, where my work focused on finite element and chemistry.

In recent years, I developed a series of machine-learning algorithms for crystal structure determination (XQueryer), crystal property prediction (PRDNet), and novel crystal discovery (SimXRD). My first-author papers have appeared in materials science journals (e.g., National Science Review and Science Bulletin) and AI conferences such as ICLR.

One of my long-term projects, Bgolearn, received support from the Shanghai Artificial Intelligence Open Source Award Project Support Plan, where I serve as the principal developer. The project was awarded RMB 500,000 to support the development of the open-source Bgolearn platform. Outside of research, I enjoy jogging and watching movies.

Education
  • The Hong Kong University of Science and Technology (Guangzhou)
    The Hong Kong University of Science and Technology (Guangzhou)
    AMAT Thrust
    Ph.D. Student
    Sep. 2023 - present (2026 expected)
  • City University of Hong Kong
    City University of Hong Kong
    Department of Physics
    Visiting Research Student
    Jun. 2025 - Dec. 2025
  • Shanghai University
    Shanghai University
    MPhil in Mechanics
    Sep. 2016 - Jun. 2023
  • Beijing University of Chemical Technology
    Beijing University of Chemical Technology
    B.S. in Chemical Machinery
    Sep. 2016 - Jun. 2020
Experience
  • Shanghai AI Lab
    Shanghai AI Lab
    Research Intern
    Jan. 2026 - Jul. 2026
  • Zhejiang Lab
    Zhejiang Lab
    Research Intern
    Mar. 2023 - Sep. 2023
Honors & Awards
  • Invited Academic Talk by Promising Young Talent Award (CMC)
    2025
  • Outstanding Young Academic Presentation Award (CMC)
    2024
  • Outstanding Graduate of Shanghai University
    2023
  • China National Scholarship
    2022
News
2025
I completed a six-month research exchange at the Department of Physics, City University of Hong Kong, under the supervision of Prof. Ren Yang .
Dec 30
I am invited to attend the CCF ChinaData 2025 Conference on AI for Science. Read more
Dec 02
I am honored to have received two awards: the Invited Academic Talk by Promising Young Talent award and the High-Level Academic Poster. 2025 China Materials Conference.
Jul 06
2024 best paper award of journal of material informatics
Apr 06
2024
Successfully passed the Ph.D. qualification examination of HKUST(GZ).
Nov 01
Outstanding Young Academic Presentation Award at the 2024 China Materials Conference.
Jul 01
2023
Embarked on a Ph.D. journey at Hong Kong University of Science and Technology (Gangzhou).
Sep 01
Outstanding Graduate, Shanghai University. Master.
Jun 01
Contributed to the community by open-sourcing Bgolearn, a package on materials optimization. Read more
Feb 01
2022
Successfully graduated from BUCT, equipped with essential knowledge and skills. Bachelor
May 31
Selected Publications (view all )
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Cao Bin, Liu Yang#, Zhang Longhan, Wu Yifan, Luo Yuyu, Cheng Hong, Ren Yang#, Zhang Tongyi# (# corresponding author)

arXiv 2026 Under Review

We propose PRDNet, a novel architecture that integrates graph embeddings with a learned pseudoparticle diffraction module. It generates synthetic diffraction patterns that are invariant to crystallographic symmetries. We extensively evaluate PRDNet on multiple large-scale benchmarks, including Materials Project, JARVIS-DFT, and MatBench. Our model achieves state-of-the-art performance across a wide range of crystal property prediction tasks, demonstrating its effectiveness.

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Cao Bin, Liu Yang#, Zhang Longhan, Wu Yifan, Luo Yuyu, Cheng Hong, Ren Yang#, Zhang Tongyi# (# corresponding author)

arXiv 2026 Under Review

We propose PRDNet, a novel architecture that integrates graph embeddings with a learned pseudoparticle diffraction module. It generates synthetic diffraction patterns that are invariant to crystallographic symmetries. We extensively evaluate PRDNet on multiple large-scale benchmarks, including Materials Project, JARVIS-DFT, and MatBench. Our model achieves state-of-the-art performance across a wide range of crystal property prediction tasks, demonstrating its effectiveness.

Spatial-adaptive active learning identifies ultra-durable and highly active catalysts for acidic oxygen evolution reaction
Spatial-adaptive active learning identifies ultra-durable and highly active catalysts for acidic oxygen evolution reaction

Cao Bin*, Qin Yin#, Luo Yan*, Ying Zhehan, Yan Zilin, Weng Tu-Tao, Li Kaikai#, Zhang Tongyi# (* equal contribution, # corresponding author)

Science Bulletin 2025

Here, we present a spatially adaptive active-learning framework with closed-loop experimentation for targeted catalyst optimization. Bayesian optimization and a conditional variational autoencoder first identify a low-overpotential stability subspace, followed by active learning to pinpoint the most stable candidate. This strategy leads to the discovery of a Cu–RuO₂ catalyst with outstanding durability (625 h) and a low overpotential of 177 mV at 10 mA cm⁻². Our results highlight an efficient AI-driven pathway for accelerating the design of stable acidic OER catalysts.

Spatial-adaptive active learning identifies ultra-durable and highly active catalysts for acidic oxygen evolution reaction

Cao Bin*, Qin Yin#, Luo Yan*, Ying Zhehan, Yan Zilin, Weng Tu-Tao, Li Kaikai#, Zhang Tongyi# (* equal contribution, # corresponding author)

Science Bulletin 2025

Here, we present a spatially adaptive active-learning framework with closed-loop experimentation for targeted catalyst optimization. Bayesian optimization and a conditional variational autoencoder first identify a low-overpotential stability subspace, followed by active learning to pinpoint the most stable candidate. This strategy leads to the discovery of a Cu–RuO₂ catalyst with outstanding durability (625 h) and a low overpotential of 177 mV at 10 mA cm⁻². Our results highlight an efficient AI-driven pathway for accelerating the design of stable acidic OER catalysts.

XQueryer: an intelligent crystal structure identifier for powder X-ray diffraction
XQueryer: an intelligent crystal structure identifier for powder X-ray diffraction

Cao Bin, Zheng Zinan, Liu Yang, Zhang Longhan, Wong W-Y Lawrence, Weng Tu-Tao, Li Jia#, Li Haoxiang#, Zhang Tongyi# (# corresponding author)

National Science Review 2025

We developed XQueryer, an intelligent agent for simulating, recognizing, and analyzing powder X-ray diffraction (PXRD) patterns. Trained on over two million high-fidelity simulated spectra, XQueryer achieves significantly higher accuracy—28.9% better than existing AI models and traditional methods. Integrated with a powder diffractometer, it enables real-time structural analysis of crystal samples.

XQueryer: an intelligent crystal structure identifier for powder X-ray diffraction

Cao Bin, Zheng Zinan, Liu Yang, Zhang Longhan, Wong W-Y Lawrence, Weng Tu-Tao, Li Jia#, Li Haoxiang#, Zhang Tongyi# (# corresponding author)

National Science Review 2025

We developed XQueryer, an intelligent agent for simulating, recognizing, and analyzing powder X-ray diffraction (PXRD) patterns. Trained on over two million high-fidelity simulated spectra, XQueryer achieves significantly higher accuracy—28.9% better than existing AI models and traditional methods. Integrated with a powder diffractometer, it enables real-time structural analysis of crystal samples.

Optimize the quantum yield of G‐quartet‐based circularly polarized luminescence materials via active learning strategy‐BgoFace
Optimize the quantum yield of G‐quartet‐based circularly polarized luminescence materials via active learning strategy‐BgoFace

Li Tianliang*, Chen Lifei*, Cao Bin*, Liu Siyuan, Lin Lixing, Li Zeyu, Chen Yingying, Li Zhenzhen, Zhang Tongyi#, Feng Linyan# (* equal contribution, # corresponding author)

MGE advances 2025

This work developed an integrated AL software, BgoFace, which satisfies most material property optimization re-quirements. The application of BgoFace (with default setting) successfully accel-erated the discovery of G4-based CPL materials, achievingresults within six iterations and synthesizing 24 experimentalgroups. The final QY nearly doubled the initial best QY inthe training dataset.

Optimize the quantum yield of G‐quartet‐based circularly polarized luminescence materials via active learning strategy‐BgoFace

Li Tianliang*, Chen Lifei*, Cao Bin*, Liu Siyuan, Lin Lixing, Li Zeyu, Chen Yingying, Li Zhenzhen, Zhang Tongyi#, Feng Linyan# (* equal contribution, # corresponding author)

MGE advances 2025

This work developed an integrated AL software, BgoFace, which satisfies most material property optimization re-quirements. The application of BgoFace (with default setting) successfully accel-erated the discovery of G4-based CPL materials, achievingresults within six iterations and synthesizing 24 experimentalgroups. The final QY nearly doubled the initial best QY inthe training dataset.

Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey
Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey

Li Zhixun*, Cao Bin*, Jiao Rui*, Wang Liang*, Wang Ding, Liu Yang, Chen Dingshuo, Li Jia, Liu Yu, Wang Liang, Zhang Tongyi, Yu Xu Jeffrey (* equal contribution)

arXiv 2025

We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field.

Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey

Li Zhixun*, Cao Bin*, Jiao Rui*, Wang Liang*, Wang Ding, Liu Yang, Chen Dingshuo, Li Jia, Liu Yu, Wang Liang, Zhang Tongyi, Yu Xu Jeffrey (* equal contribution)

arXiv 2025

We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field.

Interpretable Active Learning Identifies Iron-Doped Carbon Dots With High Photothermal Conversion Efficiency for Antitumor Synergistic Therapy
Interpretable Active Learning Identifies Iron-Doped Carbon Dots With High Photothermal Conversion Efficiency for Antitumor Synergistic Therapy

Li Tianliang*, Cao Bin*, Wang Yitong, Lin Lixing, Chen Lifei, Su Tianhao, Song Haicheng, Ren Yuze, Zhang Longhan, Chen Yingying, Li Zhenzhen, Feng Linyan#, Zhang Tongyi# (* equal contribution, # corresponding author)

Aggregate 2025

We apply an interpretable AL strategy to efficiently optimize the photothermal conversion efficiency (PCE) of carbon dots (CDs) in photothermal therapy (PTT). Using this approach, we successfully synthesized irondoped CDs (Fe-CDs) with PCE exceeding 78.7% after only 16 experimental trials over four iterations.

Interpretable Active Learning Identifies Iron-Doped Carbon Dots With High Photothermal Conversion Efficiency for Antitumor Synergistic Therapy

Li Tianliang*, Cao Bin*, Wang Yitong, Lin Lixing, Chen Lifei, Su Tianhao, Song Haicheng, Ren Yuze, Zhang Longhan, Chen Yingying, Li Zhenzhen, Feng Linyan#, Zhang Tongyi# (* equal contribution, # corresponding author)

Aggregate 2025

We apply an interpretable AL strategy to efficiently optimize the photothermal conversion efficiency (PCE) of carbon dots (CDs) in photothermal therapy (PTT). Using this approach, we successfully synthesized irondoped CDs (Fe-CDs) with PCE exceeding 78.7% after only 16 experimental trials over four iterations.

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Cao Bin*, Liu Yang*, Zheng Zinan*, Tan Ruifeng, Li Jia#, Zhang Tongyi# (* equal contribution, # corresponding author)

International Conference on Learning Representations (ICLR) 2025 Top tier AI conference

We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database. SimXRD comprises 4,065,346 simulated powder XRD patterns, representing 119,569 unique crystal structures under 33 simulated conditions that reflect real-world variations. We benchmark 21 sequence models in both in-library and out-of-library scenarios and analyze the impact of class imbalance in longtailed crystal label distributions. Remarkably, we find that: (1) current neural networks struggle with classifying low-frequency crystals, particularly in out-oflibrary situations; (2) models trained on SimXRD can generalize to real experimental data.

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Cao Bin*, Liu Yang*, Zheng Zinan*, Tan Ruifeng, Li Jia#, Zhang Tongyi# (* equal contribution, # corresponding author)

International Conference on Learning Representations (ICLR) 2025 Top tier AI conference

We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database. SimXRD comprises 4,065,346 simulated powder XRD patterns, representing 119,569 unique crystal structures under 33 simulated conditions that reflect real-world variations. We benchmark 21 sequence models in both in-library and out-of-library scenarios and analyze the impact of class imbalance in longtailed crystal label distributions. Remarkably, we find that: (1) current neural networks struggle with classifying low-frequency crystals, particularly in out-oflibrary situations; (2) models trained on SimXRD can generalize to real experimental data.

Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy
Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy

Li Tianliang*, Cao Bin*, Su Tianhao*, Lin Lixing, Wang Dong, Liu Xinting, Wan haoyu, Ji Haiwei, He Zixuan, Chen Yingying, Feng Lingyan#, Zhang Tongyi (* equal contribution, # corresponding author)

Small 2024

A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.

Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy

Li Tianliang*, Cao Bin*, Su Tianhao*, Lin Lixing, Wang Dong, Liu Xinting, Wan haoyu, Ji Haiwei, He Zixuan, Chen Yingying, Feng Lingyan#, Zhang Tongyi (* equal contribution, # corresponding author)

Small 2024

A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.

CGWGAN: crystal generative framework based on Wyckoff generative adversarial network
CGWGAN: crystal generative framework based on Wyckoff generative adversarial network

Su Tianhao*, Cao Bin*, Hu Shunbo, Li Musen, Zhang Tongyi# (* equal contribution, # corresponding author)

journal of material informatics 2024

In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.

CGWGAN: crystal generative framework based on Wyckoff generative adversarial network

Su Tianhao*, Cao Bin*, Hu Shunbo, Li Musen, Zhang Tongyi# (* equal contribution, # corresponding author)

journal of material informatics 2024

In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.

Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction Patterns
Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction Patterns

Zhang Shouyang*, Cao Bin*, Su Tianhao, Wu Yue, Feng Zhenjie, Xiong Jie#, Zhang Tongyi# (* equal contribution, # corresponding author)

IUCrJ 2024

In this work, we developed a machine learning phase identifier that achieved excellent performance for structure identification from powder diffraction patterns.

Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction Patterns

Zhang Shouyang*, Cao Bin*, Su Tianhao, Wu Yue, Feng Zhenjie, Xiong Jie#, Zhang Tongyi# (* equal contribution, # corresponding author)

IUCrJ 2024

In this work, we developed a machine learning phase identifier that achieved excellent performance for structure identification from powder diffraction patterns.

Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys
Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys

Cao Bin, Su Tianhao, Yv Shuting, Li Tianyuan, Zhang Taolue, Dong Ziqiang#, Zhang Tongyi# (# corresponding author)

Materials & Design 2024

To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn

Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys

Cao Bin, Su Tianhao, Yv Shuting, Li Tianyuan, Zhang Taolue, Dong Ziqiang#, Zhang Tongyi# (# corresponding author)

Materials & Design 2024

To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn

MLMD: a programming-free AI platform to predict and design materials
MLMD: a programming-free AI platform to predict and design materials

Ma Jiaxuan*, Cao Bin*, Dong Shuya, Tian Yuan, Wang Menghuan, Xiong Jie#, Sun Sheng# (* equal contribution, # corresponding author)

npj Computational Materials 2024

We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning.

MLMD: a programming-free AI platform to predict and design materials

Ma Jiaxuan*, Cao Bin*, Dong Shuya, Tian Yuan, Wang Menghuan, Xiong Jie#, Sun Sheng# (* equal contribution, # corresponding author)

npj Computational Materials 2024

We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning.

Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility
Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility

Wei Qinghua*, Cao Bin*, Yuan Hao*, Chen Youyang, You Kangdong, Yv Shuting, Yang Tixin, Dong Ziqiang#, Zhang Tongyi# (* equal contribution, # corresponding author)

npj Computational Materials 2023

In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)….

Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility

Wei Qinghua*, Cao Bin*, Yuan Hao*, Chen Youyang, You Kangdong, Yv Shuting, Yang Tixin, Dong Ziqiang#, Zhang Tongyi# (* equal contribution, # corresponding author)

npj Computational Materials 2023

In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)….

Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water
Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water

Cao Bin, Yang Shuang, Sun Ankang, Dong Ziqing#, Zhang Tongyi# (# corresponding author)

journal of material informatics 2022 Cover Paper & 2024 Best Paper Award

In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…

Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water

Cao Bin, Yang Shuang, Sun Ankang, Dong Ziqing#, Zhang Tongyi# (# corresponding author)

journal of material informatics 2022 Cover Paper & 2024 Best Paper Award

In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…

All publications