Gemini Live, a voice conversation AI that competes with ChatGPT Voice Mode, has been unveiled by Google. It provides multimodal capabilities and improved interaction.
Google is introducing Gemini Live, a voice conversation feature for its AI assistant Gemini, to compete with OpenAI’s new Advanced Voice Mode for ChatGPT. Gemini Live, introduced at the 2024 Made by Google event, is accessible to users of its advanced bundle.
OpenAI vs. Google: Gemini Live to Compete ChatGPT Voice Mode
The company has recently announced the launch of Gemini Live in a thread on X, intending to compete with OpenAI‘s recently released Advanced Voice Mode for ChatGPT.
The feature was introduced the 2024 event and is now accessible to Gemini Advanced users. The objective of the feature is to enhance the fluidity and flexibility of the interaction with AI, enabling users to interrupt, transition to another subject, or continue the conversation at any point, similar to a telephone conversation.
In multiple turns, the new speech engine generates coherent, emotionally intoning, and natural conversational flow, which Google claims is a distinctive feature. The AI can replicate the user’s speech in real-time, and the ten voices sound authentic. This hands-free mode can be activated in the background or even when the phone is locked, allowing the user to engage in other activities without disrupting the conversation.
Action to Improve AI Interaction
As a result, the Gemini 1.5 Pro and Gemini 1.5 Flash AI assistant models enable the conduct of lengthy and intricate conversations by employing a broader context window than the generative AI models. This allows Gemini Live to engage in more extensive conversations and store information more efficiently.
In addition to voice commands, the organization has verified that Gemini Live will incorporate multimodal input, initially demonstrated at Google I/O 2024, by the year’s conclusion. These capabilities will enable the AI to comprehend and respond to visual stimuli, such as images and videos, thereby increasing its adaptability. Currently, the update is exclusively available for Android devices and in English. However, additional languages and iOS devices are anticipated to be supported shortly.
The company also introduces supplementary features and integrations with its services as it implements the feature. New extensions for Google applications, including Calendar, Keep, Tasks, and YouTube Music, will be added to Gemini in the upcoming weeks. These enhancements will enable users to more effectively manage their schedules, set reminders, and create playlists by utilizing voice commands.
Moreover, shortly, Android users will be able to enable Gemini on top of any app using the power button or voice commands. This feature will enable users to interact with Gemini in other applications, pose questions, or generate content as simple as an image that can be easily integrated into the user’s work.
OpenAI Faces Obstacles in its Advanced Voice Mode
Despite being a novel concept, the Advanced Voice Mode for ChatGPT in the OpenAI Vs. Google’s battle has encountered some challenges during its limited alpha testing. The mode, which aims to provide a more natural discourse, has been criticized for causing users to become dependent on the AI due to the realistic voice interactions.
As a result, OpenAI recently released a safety concern that highlighted the potential for forming social relationships between users and AI, which could negatively affect interpersonal relationships.
Furthermore, the organization has been researching to enhance the software engineering capabilities of its AI models. To address these challenges, the organization has recently implemented a human-evaluated subset of the SWE-bench benchmark that more accurately assesses the ability of AI models to address real-world software issues. This action ensures that AI advancements are safe and practical in practical applications.