Google commenced its annual I/O developer conference today, traditionally a platform for unveiling new software updates and occasional hardware. This year, no new hardware was introduced, as Google had already announced the Pixel 8A phone. Instead, the keynote was a showcase of numerous AI software updates, highlighting Google's strategy to dominate the generative AI landscape.
Table of Contents
☑️ Gemini Steps Up
☑️ New Gemini Models
☑️ AI for Work and Study
☑️ AI for Convenience
☑️ AI for Creativity
☑️ AI for Safety
Gemini Steps Up
Google introduced enhancements to its on-device mobile large language model, now renamed Gemini Nano with Multimodality. According to CEO Sundar Pichai, this model can "turn any input into any output," meaning it can process text, photos, audio, web and social videos, and live video from a phone's camera to synthesize information and answer questions. A demonstration video showed the model scanning book titles on a shelf with a camera and recording them in a database.
Additionally, developers will have access to more computing power with Gemini than with other large language models (LLMs).
New Gemini Models
Google unveiled two new models of its Gemini AI, each optimized for different tasks:
Gemini 1.5 Flash
Google has introduced a new AI model to its lineup: Gemini 1.5 Flash. This multimodal model is as powerful as Gemini 1.5 Pro but is specifically optimized for "narrow, high-frequency, low-latency tasks," making it more adept at generating fast responses. Additionally, Google has made enhancements to Gemini 1.5 to improve its translation, reasoning, and coding capabilities. Notably, Google has doubled the context window of Gemini 1.5 Pro from 1 million to 2 million tokens, significantly increasing the amount of information it can process.
Project Astra
Project Astra, a visual chatbot, was also introduced. It extends the capabilities of Google Lens, allowing users to interact with their surroundings through their phone cameras. Users can ask questions about anything they point their camera at. A prerecorded demo showcased Astra's ability to understand spatial and contextual information, identifying locations, deciphering code on a computer screen, and even suggesting creative band names for pets. The demo highlighted Astra's voice-powered interactions using a phone's camera and a camera in unidentified smart glasses.
AI for Work and Study
Google showed off some features that would be attractive for office work and learning for parents as well as students.
Workspace Suite of Office Tools
Google is integrating its latest mainstream language model, Gemini 1.5 Pro, into the Workspace environment, enhancing tools like Docs, Sheets, Slides, Drive, and Gmail. Set to be available to paid subscribers next month, Gemini 1.5 Pro will function as a general-purpose assistant within Workspace. It will have the capability to access and fetch information from any content stored in your Drive, regardless of your current location. Additionally, it will perform tasks such as composing emails using information from documents you are viewing or reminding you to respond to emails you are currently examining. While some early testers already have access to these functionalities, Google plans to roll out these features to all paid Gemini subscribers in the upcoming month.
Circle to Search
Google has introduced a new feature for Android phones and tablets called Circle to Search, designed to assist with solving math problems. By circling a math problem on your device's screen, Google's AI will provide a step-by-step breakdown of the problem, rather than solving it directly. This approach ensures that the tool is helpful for learning and understanding how to solve the problem independently, without facilitating cheating on homework.
AI for Convenience
Google showed off some features that would be convenient for our daily life.
Ask Photos
This summer, Google is launching a new feature called Ask Photos that promises to be incredibly useful for anyone with an extensive collection of photos, spanning years or even decades. This feature allows users to pose questions about their Google Photos library, and Gemini, Google's AI, will search and retrieve relevant information and images. The functionality of Ask Photos extends beyond simple image recognition tasks like identifying dogs or cats. For instance, during Google's I/O 2024 keynote, CEO Sundar Pichai demonstrated the feature by asking Gemini for his license plate number. The AI responded with the number and also provided a corresponding image to verify its accuracy.
Gems
Google has announced the launch of Gems, a new feature that allows users to create custom chatbots within Gemini. Similar to OpenAI's GPTs, Gems enables users to tailor Gemini's responses and areas of expertise according to their specific needs. For instance, users can configure Gemini to act as a motivational running coach with daily inspirations and training plans, or as a dedicated calculus tutor. This feature will be available soon for Gemini Advanced subscribers.
Gemini Live
Additionally, Google is enhancing the conversational capabilities of Gemini with the introduction of Gemini Live. This new feature is designed to make voice interactions with Gemini more fluid and natural. The updates include giving the chatbot extra personality, the ability for users to interrupt it mid-sentence, and the capability to use the smartphone camera to view and provide information in real-time. Gemini will also feature new integrations with Google Calendar, Tasks, and Keep, leveraging its multimodal capabilities to, for example, add events from a flyer directly into your personal calendar. These improvements aim to make Gemini a more dynamic and helpful conversation partner.
An Evolution in Search
Google is set to enhance its search function by introducing
AI Overviews, previously referred to as the "Search Generative Experience." This update, which rolls out across the US this week, employs a specialized Gemini model to curate and display summarized answers directly on the search results pages. This format is similar to what users might experience with AI search tools like Perplexity or Arc Search, providing concise, synthesized information drawn from across the web.
AI for Creativity
Google's experimental AI division, Google Labs, recently showcased a suite of advanced tools that highlight the creative potential of AI technology.
VideoFX
One of the standout innovations is VideoFX, a generative video model derived from Google DeepMind's video generator, Veo. This tool allows users to create 1080p videos from text prompts, offering enhanced flexibility in the video production process.
Enhancements to ImageFX
Additionally, Google has upgraded
ImageFX, a high-resolution image generator. This improved version is better at interpreting user prompts and generating text, and it produces fewer unwanted digital artifacts in images than its predecessors.
Enhancements to MusicFX
During the presentation, Google also introduced DJ Mode in its MusicFX platform. This feature enables musicians to generate song loops and samples based on specific prompts. DJ Mode was notably demonstrated in a lively performance by musician Mark Rebillet, which preceded the I/O keynote.
These tools collectively represent Google's ongoing commitment to expanding the capabilities of AI in creative fields, making it easier for professionals and enthusiasts alike to generate high-quality digital media content.
AI for Safety
One of the final highlights from the keynote was security and safety.
New scam detection feature
During the recent keynote, Google unveiled a significant new feature for Android aimed at enhancing security: a scam detection capability that monitors phone calls for deceptive language typical of scammers, such as requests to transfer funds. If potential scam activity is detected, the feature will interrupt the call and display an onscreen prompt advising the user to hang up. Importantly, this feature operates directly on the device, ensuring that phone calls are not sent to the cloud for analysis and maintaining user privacy.
SynthID watermarking tool
Additionally, Google has made advancements to its SynthID watermarking tool, which is designed to identify media content generated by AI. This tool embeds a watermark invisible to the human eye but detectable through software analyzing pixel-level data, aiding in the detection of misinformation, deepfakes, or phishing attempts. The latest updates have extended SynthID's capabilities to scan content across the Gemini app, the web, and videos generated by Veo. Google plans to make SynthID an open-source tool later this summer, further supporting efforts to combat digital misinformation and fraud.
Now just join
FoxData and embark on a journey of marketing excellence as we unveil the latest industry news and trends, unveil powerful growth strategies, and present cutting-edge measurement solutions.
All content, layout and frame code of all FoxData blog sections belong to the original content and technical team, all reproduction and references need to indicate the source and link in the obvious position, otherwise legal responsibility will be pursued.