Microsoft working on an LLM to take on Gemini, GPT-4

Codenamed MAI-1, the new LLM reportedly has 500 billion parameters.

Large language models, LLMs
Phalexaviles/Shutterstock

Microsoft is reportedly working on a new large language model (LLM) to take on Google’s Gemini and OpenAI’s GPT-4.

Codenamed MAI-1, the new LLM is currently in the development phase and is being led by Mustafa Suleyman, co-founder of Google DeepMind and Inflection AI, The Information reported citing two sources.

Suleyman joined Microsoft in March along with Karen Simonyan, the other co-founder of Inflection AI, in order to lead the company’s copilot effort, according to a blog post authored by Microsoft Chief Executive Satya Nadella.

Microsoft had also paid $650 million to Inflection AI to license its software. Suleyman and Simonyan along with other Inflection AI staff joining Microsoft are part of the same deal.

While the sources cited by the Information didn’t reveal the purpose behind building the 500-billion parameter LLM, they said the new LLM could be introduced at the company’s Build conference later this month.

Reportedly, the company is dedicating a huge amount of computing resources to train the model, including using data from the internet and data generated from GPT-4.

To put things into context, OpenAI’s GPT-4 reportedly has 1.76 trillion parameters and the company spent over $100 million on compute resources to train it.

While Microsoft may be working on the behemoth model, the company last month launched a new family of small language models (SLMs) —  Phi-3 family — as part of its plan to make lightweight yet high-performing generative AI technology available across more platforms, including mobile devices.

The Phi-3 family consists of three models — the 3.8-billion-parameter Phi-3 Mini, the 7-billion-parameter Phi-3 Small, and the 14-billion-parameter Phi-3 Medium.

The last few months have seen a flurry of LLMs being announced by multiple vendors, such as Snowflake, Databricks, Cohere, Mistral, Anthropic, Meta, Google, and AWS.

While Snowflake launched its Arctic LLM, Databricks launched its DBRX model. Separately, Meta had released its Llama 3 model. Just days later, Cohere had launched iterations of its Command family of models.

Copyright © 2024 IDG Communications, Inc.