We have seen that some of the chatGPT based chats (CoPilot,DALL-E 2, MidJourney, NightCafe Creator, etc.) seem to work with both textual /image information. However, AI's interaction with text is lot more advanced than with the images.
The difference between text, and image is analogous to structured data and unstructured data. Most of the textual information is highly ordered and structured, but that of the image is unstructured. Text may give a more intimate view into what it represents and it may even generate human-like text, but image gives only a give few of its attributes like shape, color, size and it may not be able to give the full context, like emotions. We can say at this point that text gives a higher contextual understanding than an image.
Image analysis using AI commonly known as computer vision has also made great advances but challenges are many. The probing of images by AI with research in deep learning and neural networks is narrowing this gap. Presently AI models are getting better at object detection, image recognition and generate images from text descriptions.
Here are some examples of images created from text:
Create a detailed picture of Ayodhya temple in India.
Co_Pilot tried to create an image. I am not showing all its attempts as they did not remotely resemble a Hindu temple but something from the fairy tales.
Here is what I finally got after goading:
Well, Gemini tried as well but these are the results. I did not try repeated refinements of my text to get them.
I do have an image from NightCafe and I will paste it here when I find it.
No comments:
Post a Comment