Getting Started with the Vertex AI Gemini API with cURL

Getting Started with the Vertex AI Gemini API with cURL
Getting Started with the Vertex AI Gemini API with cURL

Overview

This post shows how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro) model and Gemini Pro Vision (gemini-pro-vision) model.

Gemini

Gemini is a series of multimodal generative AI models developed by Google DeepMind. Gemini models support prompts that include text, image, and video as input and support text responses as output.

Vertex AI Gemini API

The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are currently two models available in the Gemini API:

  1. Gemini Pro model (gemini-pro): Designed to handle natural language tasks, multiturn text and code chat, and code generation.
  2. Gemini Pro Vision model (gemini-pro-vision): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.

You can interact with the Gemini API using the following methods:

  • Use the Vertex AI Studio for quick testing and command generation
  • Use cURL commands
  • Use the Vertex AI SDK

This notebook focuses on using the cURL commands to call the Vertex AI Gemini API.

For more information, see the Generative AI on Vertex A documentation.

Objectives

In this tutorial, you learn how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro) model and the Gemini Pro Vision (gemini-pro-vision) model.

You will complete the following tasks:

  • Install the Python SDK.
  • Use the Vertex AI Gemini API to interact with each model.
    • Gemini Pro (gemini-pro) model:
      • Generate text from text prompts.
      • Explore various features and configuration options.
    • Gemini Pro Vision (gemini-pro-vision) model:
      • Generate text from image and text prompts.
      • Generate text from video.

Task 1. Open Python Notebook and Install Packages

  1. In the Google Cloud console, navigate to Vertex AI Workbench. In the top search bar of the Google Cloud console, enter Vertex AI Workbench, and click on the first result.
Vertex AI
  1. Click on User managed notebooks and then click on Open JupyterLab for generative-ai-jupyterlab notebook.

The JupyterLab will run in a new tab.

Jupyter Notebook
  1. On the Launcher, under Notebook, click on Python 3 to open a new python notebook.
  2. Install Vertex AI SDK for Python by the running the following command in the first cell of the notebook. Either click the play play play button at the top or enter SHIFT+ENTER on your keyboard to execute the cell.
! pip3 install --upgrade --user google-cloud-aiplatform
Installing packages Output
  1. To use the newly installed packages in this Jupyter runtime, it is recommended to restart the runtime. Restart the kernel by running the below code snippet or clicking the refresh button restart kernel at the top, followed by clicking Restart button.
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)
Kernel Restart

After the restart is complete, click Ok on the prompt to continute.

Task 2: Use the Gemini Pro Model

The Gemini Pro (gemini-pro) model is tailored for natural language tasks such as classification, summarization, extraction, and writing.

  1. Set the Google Cloud project and define environment variables for cURL commands.
PROJECT_ID = "GCP Project id"
LOCATION = "GCP Region"

Send a text prompt to the model. The Gemini Pro (gemini-pro) model provides a streaming response mechanism. With this approach, we don’t need to wait for the complete response; we can start processing fragments as soon as they’re accessible.

  1. Run the following code snippet to generate text from text.
%%bash

MODEL_ID="gemini-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'

Output:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The sky is blue due to a phenomenon known as Rayleigh scattering. Here's"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ]
}

Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

  1. Run the following code snippet to generate a response which includes parameter values.
%%bash

MODEL_ID="gemini-pro-vision"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {"text": "Describe this image"},
        {"file_data": {
          "mime_type": "image/png",
          "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
        }}
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

Output:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ]
}

Chat

The Gemini Pro model supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.

We should specify the role field only if the content represents a turn in a conversation. You can set role to one of the following values: usermodel.

  1. Run the following code snippet to chat.
%%bash

MODEL_ID="gemini-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Hello" }
        ]
      },
      {
        "role": "model",
        "parts": [
          { "text": "Hello! I am glad you could both make it." }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "text": "So what is the first order of business?" }
        ]
      }
    ]
  }'

Output:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " updates, or project management software.\n\n7. **Creating a Project Schedule:**\n\n* Draft a preliminary project schedule that outlines key milestones, deadlines, and dependencies for the selected ideas or tasks.\n* Use project management tools or software to visualize the project timeline and identify potential bottlenecks or resource constraints.\n\n8. **Identifying Risks and Mitigation Strategies:**\n\n* Conduct a risk assessment to identify potential risks that could impact the project's success.\n* Develop mitigation strategies or contingency plans to address identified risks and minimize their impact on the project timeline or budget.\n\n9. **Defining Deliverables and Quality Standards:**\n\n* Specify the deliverables that will be produced during the project, including reports, prototypes, or final products.\n* Establish quality standards and acceptance criteria to ensure that the deliverables meet the required standards and expectations.\n\n10. **Seeking Initial Stakeholder Feedback:**\n\n* Share the initial project plan and key decisions with relevant stakeholders to gather feedback and ensure alignment with their expectations.\n* Address any concerns or suggestions raised by stakeholders and incorporate necessary adjustments into the project plan."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 21,
    "candidatesTokenCount": 532,
    "totalTokenCount": 553
  }
}

Task 3. Use the Gemini Pro Vision Model

The Gemini Pro Vision (gemini-pro-vision) is a multimodal model that supports adding image and video in text or chat prompts for a text response.Note: Text-only prompts are not supported by the Gemini Pro Vision model. Instead, use the Gemini Pro model for text-only prompts.

  1. Run the following code snippet to download an image from Google Cloud Storage.
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

Output:

Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.

Generate text from a local image

Specify the base64 encoding of the image or video to include inline in the prompt and the mime_type field. The supported MIME types for images include image/png and image/jpeg.

  1. Run the following code snippet with supported mine_type to generate the response.
%%bash

data=$(base64 -w 0 image.jpg)

MODEL_ID="gemini-pro-vision"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d "{
      'contents': {
        'role': 'USER',
        'parts': [
          {
            'text': 'Is it a cat?'
          },
          {
            'inline_data': {
              'data': '${data}',
              'mime_type':'image/jpeg'
            }
          }
        ]
       }
     }"

Output:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24907    0   813  100 24094    151   4480  0:00:05  0:00:05 --:--:--   195

[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " Yes, it is a cat."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 263,
    "candidatesTokenCount": 7,
    "totalTokenCount": 270
  }
}
]

Generate text from an image on Google Cloud Storage

Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that’s sending the request. You must also specify the mime_type field. The supported image MIME types include image/png and image/jpeg.

  1. Run the following code snippet to generate text from an image on Google Cloud Storage.
%%bash

MODEL_ID="gemini-pro-vision"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Describe this image"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
          }
        }
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

Output:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ]
}

Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that’s sending the request. You must also specify the mime_type field. The supported MIME types for video include video/mp4.

  1. Run the following code snippet to generate text from a video file:
%%bash

MODEL_ID="gemini-pro-vision"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d \
'{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted?Which city was this recorded in?Provide the answer JSON."
        },
        {
          "file_data": {
            "mime_type": "video/mp4",
            "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
          }
        }
      ]
    }
  }'

Output:

[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " ```json\n{\n  \"profession\": \"photographer\",\n  \"features\": \"Night Sight, Video Boost\",\n  \"city\": \"Tokyo"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 1072,
    "candidatesTokenCount": 31,
    "totalTokenCount": 1103
  }
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "\"\n}\n```"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 1072,
    "candidatesTokenCount": 36,
    "totalTokenCount": 1108
  }
}
]

Conclusion

You have successfully learned how to use the Vertex AI Gemini API with cURL commands to interact with the Gemini Pro (gemini-pro) model and Gemini Pro Vision (gemini-pro-vision) model to generate text, add model parameters, chat, generate text from a local image, generate text from an image on google cloud storage and generate text from a video file.

Author

  • Mohamed BEN HASSINE

    Mohamed BEN HASSINE is a Hands-On Cloud Solution Architect based out of France. he has been working on Java, Web , API and Cloud technologies for over 12 years and still going strong for learning new things. Actually , he plays the role of Cloud / Application Architect in Paris ,while he is designing cloud native solutions and APIs ( REST , gRPC). using cutting edge technologies ( GCP / Kubernetes / APIGEE / Java / Python )

    View all posts
0 Shares:
You May Also Like
Getting Started Guide 2024 LangChain
Read More

LangChain : Getting Started Guide

Table of Contents Hide What is Langchain?Install Langchain Python ModelOpenaiHuggingfacePromptsMemoryChainsAgents and ToolsDocument LoadersIndexAuthor In daily life, we mainly…
Make Kubernetes simpler! 8 AI Tools You Must Know
Read More

Make Kubernetes simpler! 8 AI Tools You Must Know

Table of Contents Hide OverviewK8sGPTInstallPrerequisiteskubectl-aiInstall via Homebrew:Install via Krew:DemoKoPylotFunctionOperating principleKopilotInstallKubectl-GPTInstallPrerequisitesKube-CopilotInstallSet operationKubernetes ChatGPT botDemoAppilotAuthor Overview Kubernetes users inevitably face…