HodentekHelp: Does Gemini AI create the image from text prompt?

Monday, January 27, 2025

Does Gemini AI create the image from text prompt?

Creating an image with Gemini AI is straightforward. Simply provide a textual description of the image you need. While Gemini AI can create and display the image, it does not store it. You can download the image to the Downloads folder on a Windows computer or share it using the built-in share icon.

Here is an example of text description used to create an image using Gemini AI on Chrome:

Create an icon with the letter H decorated in yellow, with its insides filled with a web of electronic circuits. The background should be black, 24-bit color with 8-bit transparency, and the format should be PNG.

At first, it did not create a PNG file. Perhaps due to size constraints, it used a JPG format instead. Repeating to ask creation of a PNG file was not successful and most agents have this habit of repeating what they did neglecting to refine them according to the wishes of the user. Sometimes, they get fixated on their result across couple of future sessions.

Other AI agents, such as Meta AI on WhatsApp and Copilot, also have the capability to create images from text. Many other AI agents offer similar functionality.

Here is the image created by Gemini AI.

The image more or less follows the description. Small refinements may sometime lead to totally different image, not an iteration on the previous.

I wanted to check the other parts of the image description. It is possible to check the image file using PIL and Python:

from PIL import Image

def verify_image_properties(image_path):
  """
  Verifies the color and transparency of an image using PIL.

  Args:
    image_path: Path to the image file.

  Returns:
    A tuple containing:
      - True if the dominant color is yellow, False otherwise.
      - True if the image has 24-bit color, False otherwise.
      - True if the image has 8-bit transparency, False otherwise.
  """

  try:
    img = Image.open(path/to/your/image.png)


    # Check color (simplified approximation)
    dominant_color = img.getpixel((img.width // 2, img.height // 2))  # Get center pixel color
    is_yellow = (dominant_color[0] > 200) and (dominant_color[1] > 200) and (dominant_color[2] < 50)

    # Check color depth
    is_24_bit_color = img.mode == 'RGB'

    # Check transparency depth
    has_8_bit_transparency = False
    if img.mode == 'RGBA':
        if img.info.get('dpi') is not None and len(img.info['dpi']) == 2:
            has_8_bit_transparency = True

    return is_yellow, is_24_bit_color, has_8_bit_transparency

  except Exception as e:
    print(f"Error processing image: {e}")
    return False, False, False

# Example usage
image_path = "r'C:\Users\hoden\PycharmProjects\exploreImage\Images\GeminiG.jpg')" # Replace with the actual path

is_yellow, is_24_bit_color, has_8_bit_transparency = verify_image_properties(image_path)

if is_yellow:
  print("Image is predominantly yellow.")
else:
  print("Image is not predominantly yellow.")

if is_24_bit_color:
  print("Image has 24-bit color.")
else:
  print("Image does not have 24-bit color.")

if has_8_bit_transparency:
  print("Image has 8-bit transparency.")
else:
  print("Image does not have 8-bit transparency.")

The code returns the following:

C:\Users\hoden\AppData\Local\Programs\Python\Python312\python.exe C:\Users\hoden\PycharmProjects\exploreImage\Images\VerifyImage.py

Image is not predominantly yellow.

Image has 24-bit color.

Image does not have 8-bit transparency.

Process finished with exit code 0

Did the Gemini AI create an image to fit our description?

The image is not predominantly yellow is true and finding the color in the image center is perhaps the wrong approach.

Other methods may yield better result than the Center Pixel Method in the code for the visible color yellow:

Center Pixel Method: This method checks the color of the pixel at the center of the image. It's simple but may not always represent the overall dominant color.

Most Common Color: This method counts the occurrences of each color and identifies the most frequent one. It can be effective but might not capture the visually dominant color if the image has a lot of background noise

K-Means Clustering: This method groups similar colors together and identifies the most visually impactful color. It's more sophisticated and can provide a better representation of the dominant color but requires more computational resources.

Monday, January 27, 2025

Does Gemini AI create the image from text prompt?

No comments: