Gemini multi image capability seems to be unstable and fail very basic question

### Description of the bug:


Hello
I followed the sample code to analyze multiple images in the [official doc](https://ai.google.dev/gemini-api/docs/vision?lang=python#multiple-images)

One weird thing I found is that the client cannot correctly identify the number of images uploaded so it only give the response on certain image

Followings are the images I test (I think any image should be able to reproduce):

[fire](https://upload.wikimedia.org/wikipedia/commons/3/36/Large_bonfire.jpg)
[lighthouse](https://media.istockphoto.com/id/139497126/photo/lighthouse.jpg?s=612x612&w=0&k=20&c=ahggOFZzfeQhOuC2x4aHlb8bB1P0rlN5eegjY5aMZyA=)
[tree](https://media.istockphoto.com/id/1225517650/photo/single-big-oak-tree-in-meadow.jpg?s=612x612&w=0&k=20&c=2DOWsqMW1hDOmn7XlF09IAuOToq28EXpsNgZ8wzhURU=)

```
img1 = Image.open("fire.jpg")
img2 = Image.open("lighthouse.jpg")
img3 = Image.open("tree.jpg")

MODEL_NAME = 'gemini-2.0-flash-001'
client = genai.Client(api_key="My api key")
response = client.models.generate_content(model=MODEL_NAME,
                                          contents=['Describe each image', img1, img2, img3])
print(response.text)
```
The output only contains the last image:
The image shows a single, large tree standing in a green field against a bright blue sky with scattered white clouds. The tree has a full, round crown of green leaves. The trunk of the tree is thick and gray. The field is a vibrant green, and the sky is a clear blue with wispy white clouds. The image is well-lit and has a clear, crisp focus.

When I further ask `how many images do you see`
```
response = client.models.generate_content(model=MODEL_NAME,
                                          contents=['How many images do you see?', img1, img2, img3])
print(response.text)
I see one image.
```

One workaround I found is to add system prompt to force it consider all images upload before answer, but this feel very unnatural
```
response = client.models.generate_content(model=MODEL_NAME,config=types.GenerateContentConfig(system_instruction="Consider all images uploaded by users before answering any question"),
                                          contents=['Describe each image', img1,img2,img3])
```
The output
Here are the descriptions of each image:

1. A bonfire that appears to be made out of branches is ablaze, with flames reaching high into the sky.
2. A lighthouse stands on a shore by the ocean, its light shining across the water under a purple sky with the setting sun.
3. A large green tree stands tall in a field of grass under a bright blue sky with fluffy white clouds.

When asking the number of the image uploaded:
```
response = client.models.generate_content(model=MODEL_NAME,config=types.GenerateContentConfig(system_instruction="Consider all images uploaded by users before answering any question"),
                                          contents=['How many images do you see?', img1,img2,img3])
print(response.text)
I see 3 images.
```






### Actual vs expected behavior:


I think it should know how many images to analyze without using further prompting? It's feel awkward

### Any other information you'd like to share?


What I have tried:
1. I found appending the correct number of image tag (`<image>`) at the end of question helps but not as stable as system prompt
2. Sometimes, when asked about how many images, it will add the cropped ones. It answer something like I saw one image and X cropped tiles or just give me the numbers of original + cropped (In this case, it's 3+15=18), just weird from user perspective, this is a bit hard to trigger but you can get one when trying multiple times

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini multi image capability seems to be unstable and fail very basic question #706

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gemini multi image capability seems to be unstable and fail very basic question #706

Description

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions