IBM has teamed up with NASA to build a suite of transformer architecture-based language models designed to give the scientific and academic community access to vast amounts of scientific knowledge and information.
The IBM-NASA models were trained on 60 billion tokens from a collection of astrophysics, earth science, planetary science, heliophysics and biological and physical sciences data.
Members of the scientific and academic community can use the open-source version of the models available on Hugging Face to support various use cases, such as data classification, entity extraction, question answering and information retrieval.
The trained encoder model can be modified to support non-generative linguistic tasks and can generate data embeddings for document retrieval using retrieval augmented generation.
The IBM-NASA team also developed a retriever model on top of the encoder model to produce embeddings that map the similarity between text pairs.
IBM’s Watsonx.ai, a geospatial foundation model built from NASA’s satellite data, is also available on Hugging Face to expand the use of Earth science data for geospatial intelligence and advance climate-related innovation.
Join the Potomac Officers Club’s 5th Annual Artificial Intelligence Summit on March 21 to hear more about cutting-edge AI innovations from government and industry experts. Register here.