Description of the bug:
Hello
I followed the sample code to analyze multiple images in the official doc
One weird thing I found is that the client cannot correctly identify the number of images uploaded so it only give the response on certain image
Followings are the images I test (I think any image should be able to reproduce):
fire
lighthouse
tree
img1 = Image.open("fire.jpg")
img2 = Image.open("lighthouse.jpg")
img3 = Image.open("tree.jpg")
MODEL_NAME = 'gemini-2.0-flash-001'
client = genai.Client(api_key="My api key")
response = client.models.generate_content(model=MODEL_NAME,
contents=['Describe each image', img1, img2, img3])
print(response.text)
The output only contains the last image:
The image shows a single, large tree standing in a green field against a bright blue sky with scattered white clouds. The tree has a full, round crown of green leaves. The trunk of the tree is thick and gray. The field is a vibrant green, and the sky is a clear blue with wispy white clouds. The image is well-lit and has a clear, crisp focus.
When I further ask how many images do you see
response = client.models.generate_content(model=MODEL_NAME,
contents=['How many images do you see?', img1, img2, img3])
print(response.text)
I see one image.
One workaround I found is to add system prompt to force it consider all images upload before answer, but this feel very unnatural
response = client.models.generate_content(model=MODEL_NAME,config=types.GenerateContentConfig(system_instruction="Consider all images uploaded by users before answering any question"),
contents=['Describe each image', img1,img2,img3])
The output
Here are the descriptions of each image:
- A bonfire that appears to be made out of branches is ablaze, with flames reaching high into the sky.
- A lighthouse stands on a shore by the ocean, its light shining across the water under a purple sky with the setting sun.
- A large green tree stands tall in a field of grass under a bright blue sky with fluffy white clouds.
When asking the number of the image uploaded:
response = client.models.generate_content(model=MODEL_NAME,config=types.GenerateContentConfig(system_instruction="Consider all images uploaded by users before answering any question"),
contents=['How many images do you see?', img1,img2,img3])
print(response.text)
I see 3 images.
Actual vs expected behavior:
I think it should know how many images to analyze without using further prompting? It's feel awkward
Any other information you'd like to share?
What I have tried:
- I found appending the correct number of image tag (
<image>) at the end of question helps but not as stable as system prompt
- Sometimes, when asked about how many images, it will add the cropped ones. It answer something like I saw one image and X cropped tiles or just give me the numbers of original + cropped (In this case, it's 3+15=18), just weird from user perspective, this is a bit hard to trigger but you can get one when trying multiple times
Description of the bug:
Hello
I followed the sample code to analyze multiple images in the official doc
One weird thing I found is that the client cannot correctly identify the number of images uploaded so it only give the response on certain image
Followings are the images I test (I think any image should be able to reproduce):
fire
lighthouse
tree
The output only contains the last image:
The image shows a single, large tree standing in a green field against a bright blue sky with scattered white clouds. The tree has a full, round crown of green leaves. The trunk of the tree is thick and gray. The field is a vibrant green, and the sky is a clear blue with wispy white clouds. The image is well-lit and has a clear, crisp focus.
When I further ask
how many images do you seeOne workaround I found is to add system prompt to force it consider all images upload before answer, but this feel very unnatural
The output
Here are the descriptions of each image:
When asking the number of the image uploaded:
Actual vs expected behavior:
I think it should know how many images to analyze without using further prompting? It's feel awkward
Any other information you'd like to share?
What I have tried:
<image>) at the end of question helps but not as stable as system prompt