Google Whisk AI: Overview, Functionality, and Differences from Gemini

Google Whisk AI: Overview, Functionality, and Differences from Gemini

21 hours ago | 5 Views

Google has launched an innovative AI experiment named Whisk, which presents a distinctive method for generating and visualizing images through generative AI. In contrast to conventional techniques that necessitate the submission of extensive and detailed prompts, Whisk enables users to initiate the process using images instead. As stated by Google, users simply need to drag and drop their images to commence the generation of new visuals. There are various intricacies involved in the operation of Whisk, and this discussion will elucidate its functionality, availability, and the process of remixing images.

Google Whisk uses Gemini and Imagen 3 models behind the scenes

Google Whisk is not an entirely new AI model; rather, it functions as a tool that leverages both Google Gemini and Google Imagen 3 to generate images. To understand how Whisk processes images as prompts, it is important to note the steps involved. Initially, users must provide an image representing the subject, another depicting the scene, and a third that conveys the desired style. Whisk then remixes these images, effectively blending all three to produce a unique image that can be considered the user's creation.

In the background, Google employs Gemini to generate detailed prompts based on the images provided. After analyzing the submitted images, Google Gemini formulates comprehensive prompts, which are subsequently fed into Google's Imagen 3 image generator.

Your images could differ slightly from the reference material

Google recognizes that Whisk captures only the essence of your subject rather than producing a replica. It selectively extracts certain characteristics from your image, which is why the outcomes may differ from your expectations. For instance, Google notes that the generated subject could exhibit variations in height, weight, hairstyle, or even skin tone. Acknowledging that these attributes may be significant to your project or creation, Google provides the option for you to edit the prompts.

How is Whisk different from making images using Google Gemini?

To begin with, if one wishes to generate images utilizing Gemini, which operates on the underlying technology of Imagen 3, it is necessary to provide an extensive and detailed prompt to achieve the desired visual outcome. However, even with such thorough input, there is no assurance that the AI will accurately interpret the request or that the prompts will effectively convey the intended imagery.

In contrast, Whisk simplifies the image generation process by allowing users to work with pre-existing images. If you have a specific reference in mind, you can leverage those images to create a composite or remix. This approach streamlines the image creation process compared to the traditional method of crafting text-based prompts.

Google Whisk: Availability

Currently, Google Whisk is not accessible to users in India or any other location outside the United States, as it is exclusively available in the US at this time. Interested individuals can experience it by following this link.

Read Also: OpenAI has made ChatGPT Search available to all free users, expanding its reach globally across various platforms

HOW DID YOU LIKE THIS ARTICLE? CHOOSE YOUR EMOTICON!

#