CAO Bin 曹斌

I am engaged in AI4CM (AI for Computational Materials) research, focusing on crystallography and spectroscopy (http://www.caobin.asia). My research primarily includes physics-based diffraction pattern simulation, machine learning representations in spectrum-based sequence models, and crystal-based graph structures. My main areas of study are:

  • Crystal structure representation for downstream property prediction and generation.
  • Spectrum representation for crystal structure and symmetry identification.

In addition, I actively promote active learning applications in materials science by developing BGOlearn (Bayesian optimization package). I collaborate with experimental research teams and enhance the BGOlearn to advance the application of mature machine learning techniques in the materials community.

I am passionate about open science and strongly advocate for the unrestricted dissemination of knowledge. To support this vision, I share all code from my research to ensure transparency and accessibility.

Currently, I am pursuing my studies at HKUST(GZ) under the supervision of Professor Zhang Tong-yi.

📝 Publications

ICLR 2025
sym

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang

SimXRD-4M

  • We developed a novel XRD simulation method that incorporates comprehensive physical interactions, resulting in a high-fidelity database.
SMALL 2024
sym

Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy

Tianliang Li, Bin Cao(co-first author), Tianhao Su, …, Lingyan Feng, Tong-yi Zhang

TCGPR+Bgolearn

  • A novel ML model, termed the sequential backward Tree-Classifier for Gaussian Process Regression (TCGPR), is proposed to improve data pattern recognition following the divide-and-conquer principle.
JMI 2024
sym

CGWGAN: crystal generative framework based on Wyckoff generative adversarial network

Tianhao Su, Bin Cao(co-first author), Shunbo Hu, Musen Li, Tong-yi Zhang

Crystal Generative Framework

  • In this work, we present a crystal generative framework based on Wyckoff generative adversarial network (CGWGAN) to efficiently discover novel crystals.
arXiv 2024
sym

SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification

Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang

Database & Benchmark

  • In this work, a large open-source dataset of powder XRD patterns designed for symmetry identification. 21 existing ML models are assessed, summarizing the XRD sequence data characteristics, and providing suggestions for the further development of ML models best suited for analyzing XRD patterns.
IUCrJ 2024
sym

Crystallographic Phase Identifier of a Convolutional Self-Attention Neural Network (CPICANN) on Powder Diffraction Patterns

Shouyang Zhang, Bin Cao (co-first), Tianhao Su, Yue Wu, Zhenjie Feng, Jie Xiong, Tong-Yi Zhang

Phase

  • In this work, we developed a machine learning phase identifier that achieved excellent performance within a relatively small scope.
M&D 2024
sym

Active Learning Accelerates the Discovery of High Strength and High Ductility Lead-Free Solder Alloys

B Cao, T Su, S Yu, T Li, T Zhang, Z Dong, TY Zhang

Project

  • To facilitate materials informatics development, all active learning algorithms were made open-source in our designed framework, Bgolearn…
NPJ 2024
sym

MLMD: a programming-free AI platform to predict and design materials

Jiaxuan Ma, Bin Cao (co-first author), Shuya Dong, Yuan Tian, Menghuan Wang, Jie Xiong, Sheng Sun

Project

  • We developed MLMD, an AI platform for materials design. It is capable of effectively discovering novel materials with high-potential advanced properties end-to-end, utilizing model inference, surrogate optimization, and even working in situations of data scarcity based on active learning..
NPJ 2023
sym

Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility Qinghua Wei, Bin Cao (co-first author), Hao Yuan (co-first author), Youyang Chen, Kangdong You, Shuting Yu, Tixin Yang, Ziqiang Dong, Tong-Yi Zhang

Project

  • In general, small in size and big in noise, while the design space is huge, by a newly developed data preprocessing algorithm, named the Tree-Classifier for Gaussian Process Regression (TCGPR)…
JMI 2022
sym

(Cover paper)Domain knowledge-guided interpretive machine learning: formula discovery for the oxidation behavior of ferritic-martensitic steels in supercritical water Bin Cao, Shuang Yang, Ankang Sun, Ziqiang Dong, Tong-Yi Zhang

Project

  • In this study, we propose a domain knowledge-guided interpretive machine learning strategy and demonstrate it by studying the oxidation behavior of ferritic-martensitic steels in supercritical water…