Shyue Ping Ong’s Post

View profile for Shyue Ping Ong, graphic

Materials Scientist, Data Scientist, Programmer

Why Nature Portfolio should have never published the Google AI for Materials article --- I guess most people have now heard about the Google AI for Materials paper out in Nature. I waited a full day to decide if I wanted to comment. I generally do not make public criticisms of published work as I do not relish the role of a critic. However, this topic is close to my heart and I feel I cannot stay silent. In my opinion, Nature should have never published the Google paper because the work violates FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Let me first state outright that like many others, I think the model performance metrics are impressive. Given my extremely high regard for the authors, I have no reason to doubt the results and I believe the work itself is sound. However, hype aside (I deliberately put an Economist article to show how bad it is), the main performance improvements were obtained via brute force data generation and the model architectures are several years old. Given this, one would place an enormous emphasis on the data. However, **Google has decided not to share the data used to generate the model or even the final model itself**. The only data shared are the final stable crystals identified by the model, which does not allow one to reproduce the model. In other words, you can only take Google's word for it that the energy error is 11 meV/atom and there is not a single group out there (my own included) that can verify that number. No one can even test or use the model on any other system. I think there have been enough science scandals in recent memory (superconductivity springs to mind) for reproducibility to be a **minimum** requirement for any published work, much less one published in Nature. In fact, Nature's publishing guidelines state that "a condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications." Apparently, the interpretation of this rule is flexible to the point of being useless. While I think it is important that companies like Google participate in the scientific process, they must be held to the same standards of rigor. A work that is impossible to verify cannot be considered science by any standards. As an aside, my group has always walked the talk on FAIR data and code - all our published works come with all the data and open-source code necessary to reproduce our results. There are no exceptions. #ai #ml #materialsscience

A Google AI has discovered 2.2m materials unknown to science

A Google AI has discovered 2.2m materials unknown to science

economist.com

Ekin Dogus Cubuk

Research Scientist at Google DeepMind

5mo

We understand the concerns and appreciate the feedback. We will do our best to improve on what we shared already. I do think it is unfair to not acknowledge that the main result of the paper, 381k stable materials, are shared immediately. This is a ~10x increase in the number of stable materials available online. Today. This is the main result of the paper, and can easily be reproduced by running your own DFT on subsets and constructing a convex hull. We used MP settings with consistent help and meetings with the MP team to make sure our computational discoveries will be comparable when ready. But as I said, we understand, and we will do our best. Hi Shyue, I was sad to see this post from you, as it seemed to have a different perspective on our work compared to the times you reached out to me asking to collaborate. I wish you had reached out to me first with your questions and concerns, I would have been happy to explain the methodological advances in the paper, since you seem to think there were none. First of all, it is unfair to think a paper can improve the number of stable materials and the efficiency with which to find them by an order of magnitude without any methodological advances, but let me describe some anyway.

Did they really not give enough info to reproduce, or is it simply the case that reproducing requires brute force? If, as you say, the methods are not new and only the results, then this is no different than a synchrotron paper that only publishes the result and not the data. It just requires a lot of money to reproduce. That being said, sharing data and final models when practical advances the pace of science. But, that is not strictly a requirement for peer reviewed science. You are asking for something beyond peer reviewed science. I don't have any objection what you advocate for, and you raise a good point bringing up the nature guidelines, just noting it is not unusual.  It is, in my experience, very common that such journal requirements are flaunted often by well respected groups, and then used as an excuse to reject papers from less established groups. I think this dialogue about what data can be made available easily and cheaply, and whether that should be actually required to publish is good dialogue. I had advocated for publishing model parameters being a requirement for certain types of papers, in the past.

Taylor Sparks

Associate Prof of Materials Science & Engineering and co-host of Materialism Podcast

5mo

Preach! This was exactly my sentiment when I gave my opinion on it to Wired magazine. https://www.wired.com/story/an-ai-dreamed-up-380000-new-materials-the-next-challenge-is-making-them/

Josiah Roberts

Postdoctoral Fellow at University at Buffalo

5mo

Please note that this is the author who published M3gnet last month, a direct competitor to the Google paper. Dr. Ong's criticisms are important, but also likely motivated by competition. And in any case, we need to remember to be skeptical of all attempts to publish universal computational models or any other claim to have "solved" all of inorganic chemistry.

Ridwan Sakidja

Matthew & Patricia Harthcock CNAS Faculty Fellow | Materials Science Program Director and MNAS Interdisciplinary Program Chair | Physics & Materials Science Professor

4mo

I think we need to take this with a grain of salt. While the impressive expansion of reportedly new compounds by DeepMind should still be welcomed, we must admit that none of the known databases including perhaps the GNoME itself have attempted to assess the phase stability of line compounds beyond the ground state. As a metallurgist, I cringe at the news of new "stable materials" being discovered when in fact a proper assessment of the thermodynamic stability of the line compounds at temperatures has not been done. Furthermore, the exploration of the roles of defect chemistry and alloying/solid solution/interstitial effects toward the phase stability often times is not even part of many, if not all, of the known databases. Indeed, they are much harder to assess, I would readily admit, but those are the "experimentally observables" and probably the most useful information that an exploratory campaign can bank on. Until we reach this, we will continue to be guided by experiments and (semi)empirical thermodynamic models.

Guillaume Godin

Scientific Director Artificial Intelligence

5mo

As a reviewer of several AI article in nature’s journals, if I cannot see code and model. I cannot do my job of reviewing. So my response is done in 1 minute. No GitHub no paper!

Chenru Duan

Research Scientist at MSFT Quantum | Ph.D. MIT Chem | AI4Science community builder & workshop organizer | crduan.com

5mo

Well said! The AI safety reason can probably be applied to ANY work with machine learning involved and depends so much on interpretation.

Khagesh Tanwar, PhD

Researcher in Experimental, Theoretical and Computational Condensed Matter Physics

4mo

It’s just doesn’t make any sense. There are so many materials predicted by theorists using DFT, or other sort of quantum or classical simulations which eventually have less than 0.0001% sucess rate of producibility in the lab. One thing is for sure true is that the simulations are not really good at predicting new materials. Although, guarantee, we can get a lot of insights into existing materials, and sometimes prediction of new materials as well. I believe this research must be judged based completely different criteria, bcz the novelty of the work is only thing, “a lot of new materials”. Now, for instance take one of those predicted materials try to make it in the lab and publish it, it maynot even pass peer review for high quality (not necessarily nature) journals. To me, it’s not rigorous science in the field of materials science, condensed matter physics or chemistry or similar fields. But, it’s indeed a truly novel research as a capability of AI based systems.

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics