- All Gemini models are capable of processing and using more than just words. They were pre-trained and fine-tuned on a variety of audio, images and videos, a large code base and text in different languages.
- Gemini’s applications and models are also completely independent of Imagen 2 and can be used in some of the company’s development tools and environments.
- Because Gemini models are multimodal, they can theoretically perform a range of multimodal tasks.
Google is trying to make waves with Gemini, a flagship suite of generative AI models, applications and services.But while Gemini is in some ways looks promising, but as our informal review reveals, it performed poorly in other ways.So what is a Gemini? How do you use it? How does it stack up against its competitors?
What is Gemini?
Gemini is Google’s long-promised family of next-generation GenAI models, developed by Google’s AI research lab DeepMind and Google Research. It has three modes:
- Gemini Ultra, flagship Gemini model
- Gemini Pro, a “living” Gemini model
- Gemini Nano, a smaller “stripped-down” model that runs on mobile devices like the Pixel 8 Pro
All Gemini models are trained to be “naturally multimodal” – in other words, able to process and use more than just words. They were pre-trained and fine-tuned on a variety of audio, images and videos, a large code base and text in different languages. This sets Gemini apart from models such as Google’s own LaMDA, which is specifically trained on text data. LaMDA can’t understand or generate anything other than text (for example, articles, email drafts), but the Gemini model doesn’t.
Gemini is the difference between application and Gemini model?
Once again, Google demonstrated its lack of branding skills by not making it clear from the start that Gemini was separate from the Gemini app (formerly Bard) on both web and mobile platforms.Gemini application is just an interface through which access to certain model — Gemini can imagine it as Google GenAI client.
As a side note, Gemini’s applications and models are also completely separate from Imagen 2, Google’s text-to-image model that can be used in some of the company’s development tools and environments.Don’t worry, you are not the only one confused people.
What can a Gemini do?
Because Gemini models are multimodal, they can theoretically perform a range of multimodal tasks, from transcribing speech to adding captions to images and videos to generating artwork. These features are not yet at the production stage (more on that later), but Google promises all of them and more in the near future.Google fell badly short in its initial Bard launch.Recently, the company also released a video purport to demonstrate Gemini’s capabilities, only to turn out to be heavily doctored and more or less aspirational.
Also read: Google’s Bard chatbot gets the Gemini Pro update globally






