Overview
This post shows how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro) model and Gemini Pro Vision (gemini-pro-vision) model.
Gemini
Gemini is a series of multimodal generative AI models developed by Google DeepMind. Gemini models support prompts that include text, image, and video as input and support text responses as output.
Vertex AI Gemini API
The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are currently two models available in the Gemini API:
- Gemini Pro model (
gemini-pro
): Designed to handle natural language tasks, multiturn text and code chat, and code generation. - Gemini Pro Vision model (
gemini-pro-vision
): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.
You can interact with the Gemini API using the following methods:
- Use the Vertex AI Studio for quick testing and command generation
- Use cURL commands
- Use the Vertex AI SDK
This notebook focuses on using the cURL commands to call the Vertex AI Gemini API.
For more information, see the Generative AI on Vertex A documentation.
Objectives
In this tutorial, you learn how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro
) model and the Gemini Pro Vision (gemini-pro-vision
) model.
You will complete the following tasks:
- Install the Python SDK.
- Use the Vertex AI Gemini API to interact with each model.
- Gemini Pro (
gemini-pro
) model:- Generate text from text prompts.
- Explore various features and configuration options.
- Gemini Pro Vision (
gemini-pro-vision
) model:- Generate text from image and text prompts.
- Generate text from video.
- Gemini Pro (
Task 1. Open Python Notebook and Install Packages
- In the Google Cloud console, navigate to Vertex AI Workbench. In the top search bar of the Google Cloud console, enter Vertex AI Workbench, and click on the first result.
- Click on User managed notebooks and then click on Open JupyterLab for generative-ai-jupyterlab notebook.
The JupyterLab will run in a new tab.
- On the Launcher, under Notebook, click on Python 3 to open a new python notebook.
- Install Vertex AI SDK for Python by the running the following command in the first cell of the notebook. Either click the play play button at the top or enter SHIFT+ENTER on your keyboard to execute the cell.
! pip3 install --upgrade --user google-cloud-aiplatform
- To use the newly installed packages in this Jupyter runtime, it is recommended to restart the runtime. Restart the kernel by running the below code snippet or clicking the refresh button at the top, followed by clicking Restart button.
import IPython app = IPython.Application.instance() app.kernel.do_shutdown(True)
After the restart is complete, click Ok on the prompt to continute.
Task 2: Use the Gemini Pro Model
The Gemini Pro (gemini-pro
) model is tailored for natural language tasks such as classification, summarization, extraction, and writing.
- Set the Google Cloud project and define environment variables for cURL commands.
PROJECT_ID = "GCP Project id" LOCATION = "GCP Region"
Send a text prompt to the model. The Gemini Pro (gemini-pro
) model provides a streaming response mechanism. With this approach, we don’t need to wait for the complete response; we can start processing fragments as soon as they’re accessible.
- Run the following code snippet to generate text from text.
%%bash MODEL_ID="gemini-pro" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d '{ "contents": { "role": "USER", "parts": { "text": "Why is the sky blue?" } } }'
Output:
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": "The sky is blue due to a phenomenon known as Rayleigh scattering. Here's" } ] }, "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ] }
Model parameters
Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.
- Run the following code snippet to generate a response which includes parameter values.
%%bash MODEL_ID="gemini-pro-vision" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d '{ "contents": { "role": "USER", "parts": [ {"text": "Describe this image"}, {"file_data": { "mime_type": "image/png", "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" }} ] }, "generation_config": { "temperature": 0.2, "top_p": 0.1, "top_k": 16, "max_output_tokens": 2048, "candidate_count": 1, "stop_sequences": [] }, "safety_settings": { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_LOW_AND_ABOVE" } }'
Output:
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": " A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The" } ] }, "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ] }
Chat
The Gemini Pro model supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.
We should specify the role
field only if the content represents a turn in a conversation. You can set role
to one of the following values: user
, model
.
- Run the following code snippet to chat.
%%bash MODEL_ID="gemini-pro" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d '{ "contents": [ { "role": "user", "parts": [ { "text": "Hello" } ] }, { "role": "model", "parts": [ { "text": "Hello! I am glad you could both make it." } ] }, { "role": "user", "parts": [ { "text": "So what is the first order of business?" } ] } ] }'
Output:
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": " updates, or project management software.\n\n7. **Creating a Project Schedule:**\n\n* Draft a preliminary project schedule that outlines key milestones, deadlines, and dependencies for the selected ideas or tasks.\n* Use project management tools or software to visualize the project timeline and identify potential bottlenecks or resource constraints.\n\n8. **Identifying Risks and Mitigation Strategies:**\n\n* Conduct a risk assessment to identify potential risks that could impact the project's success.\n* Develop mitigation strategies or contingency plans to address identified risks and minimize their impact on the project timeline or budget.\n\n9. **Defining Deliverables and Quality Standards:**\n\n* Specify the deliverables that will be produced during the project, including reports, prototypes, or final products.\n* Establish quality standards and acceptance criteria to ensure that the deliverables meet the required standards and expectations.\n\n10. **Seeking Initial Stakeholder Feedback:**\n\n* Share the initial project plan and key decisions with relevant stakeholders to gather feedback and ensure alignment with their expectations.\n* Address any concerns or suggestions raised by stakeholders and incorporate necessary adjustments into the project plan." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ], "usageMetadata": { "promptTokenCount": 21, "candidatesTokenCount": 532, "totalTokenCount": 553 } }
Task 3. Use the Gemini Pro Vision Model
The Gemini Pro Vision (gemini-pro-vision
) is a multimodal model that supports adding image and video in text or chat prompts for a text response.Note: Text-only prompts are not supported by the Gemini Pro Vision model. Instead, use the Gemini Pro model for text-only prompts.
- Run the following code snippet to download an image from Google Cloud Storage.
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg
Output:
Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg... / [1 files][ 17.4 KiB/ 17.4 KiB] Operation completed over 1 objects/17.4 KiB.
Generate text from a local image
Specify the base64 encoding of the image or video to include inline in the prompt and the mime_type
field. The supported MIME types for images include image/png
and image/jpeg
.
- Run the following code snippet with supported
mine_type
to generate the response.
%%bash data=$(base64 -w 0 image.jpg) MODEL_ID="gemini-pro-vision" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d "{ 'contents': { 'role': 'USER', 'parts': [ { 'text': 'Is it a cat?' }, { 'inline_data': { 'data': '${data}', 'mime_type':'image/jpeg' } } ] } }"
Output:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 24907 0 813 100 24094 151 4480 0:00:05 0:00:05 --:--:-- 195 [{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": " Yes, it is a cat." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ], "usageMetadata": { "promptTokenCount": 263, "candidatesTokenCount": 7, "totalTokenCount": 270 } } ]
Generate text from an image on Google Cloud Storage
Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that’s sending the request. You must also specify the mime_type
field. The supported image MIME types include image/png
and image/jpeg
.
- Run the following code snippet to generate text from an image on Google Cloud Storage.
%%bash MODEL_ID="gemini-pro-vision" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d '{ "contents": { "role": "USER", "parts": [ { "text": "Describe this image" }, { "file_data": { "mime_type": "image/png", "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" } } ] }, "generation_config": { "temperature": 0.2, "top_p": 0.1, "top_k": 16, "max_output_tokens": 2048, "candidate_count": 1, "stop_sequences": [] }, "safety_settings": { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_LOW_AND_ABOVE" } }'
Output:
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": " A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The" } ] }, "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ] }
Generate text from a video file
Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that’s sending the request. You must also specify the mime_type
field. The supported MIME types for video include video/mp4
.
- Run the following code snippet to generate text from a video file:
%%bash MODEL_ID="gemini-pro-vision" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \ -d \ '{ "contents": { "role": "USER", "parts": [ { "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted?Which city was this recorded in?Provide the answer JSON." }, { "file_data": { "mime_type": "video/mp4", "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4" } } ] } }'
Output:
[{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": " ```json\n{\n \"profession\": \"photographer\",\n \"features\": \"Night Sight, Video Boost\",\n \"city\": \"Tokyo" } ] }, "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ], "usageMetadata": { "promptTokenCount": 1072, "candidatesTokenCount": 31, "totalTokenCount": 1103 } } , { "candidates": [ { "content": { "role": "model", "parts": [ { "text": "\"\n}\n```" } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_HARASSMENT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_HATE_SPEECH", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "probability": "NEGLIGIBLE" } ] } ], "usageMetadata": { "promptTokenCount": 1072, "candidatesTokenCount": 36, "totalTokenCount": 1108 } } ]
Conclusion
You have successfully learned how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro) model and Gemini Pro Vision (gemini-pro-vision) model to generate text, add model parameters, chat, generate text from a local image, generate text from an image on google cloud storage and generate text from a video file.