Get Started with Gemini in Java

Get Started with Gemini in Java
Get Started with Gemini in Java


Google unveiled Gemini, its most advanced Large Language Model to date. Unlike previous models, Gemini is multimodal, allowing it to process not just text but also images and videos.

I’m excited to share with you some of the capabilities that Gemini offers when used with Java.

To get started, you’ll need a Google Cloud account and a project set up. Make sure to enable the Vertex AI API, which grants access to Google’s Generative AI services, including the cutting-edge Gemini large language model. Don’t forget to follow the setup instructions carefully.

Preparing your project build

To kick off coding with the latest advancements in AI through Google Cloud, setting up your project’s build environment is a foundational step. This involves choosing between Gradle and Maven for your project management and build processes, and then integrating the necessary Google Cloud dependencies.

Specifically, your project will need to incorporate the Google Cloud Libraries Bill of Materials (BOM) and the google-cloud-vertexai library to leverage the capabilities of Google’s Vertex AI and the Gemini model. Here’s how you can set this up using Maven:

In your pom.xml file, you’ll start by including the Google Cloud libraries BOM in the <dependencyManagement> section. This BOM manages the versions of the Google Cloud Java libraries you use, ensuring compatibility and simplifying dependency declarations.

<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>26.29.0</version> <!-- Be sure to check for the latest version -->
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

Next, you’ll add the google-cloud-vertexai library to your project’s dependencies. This library provides the Java client for interacting with Vertex AI, enabling your application to communicate with the Gemini model and other AI services offered by Google Cloud.

<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-vertexai</artifactId>
</dependency>
<!-- Add other dependencies as needed -->
</dependencies>

This setup ensures that your project is ready to harness the power of Google Cloud’s AI capabilities, allowing you to focus on developing innovative applications with Gemini and other AI technologies.

Your first queries

Now let’s have a look at our first multimodal example, mixing text prompts and images:

try (VertexAI vertexAI = new VertexAI(projectId, location)) {
    byte[] imageBytes = Base64.getDecoder().decode(dataImageBase64);

    GenerativeModel model = new GenerativeModel("gemini-pro-vision", vertexAI);
    GenerateContentResponse response = model.generateContent(
        ContentMaker.fromMultiModalData(
            "What is this image about?",
            PartMaker.fromMimeTypeAndData("image/jpg", imageBytes)
        ));

    System.out.println(ResponseHandler.getText(response));
}

You begin by initializing VertexAI with your Google Cloud project ID and selecting the region of your choice for deployment. To submit images to Gemini, you have the flexibility to either send the image bytes directly or use a URI pointing to an image in a cloud storage bucket, such as gs://my-bucket/my-img.jpg.

Next, you set up an instance of the model you wish to use; in this case, we start with gemini-pro-vision, though a gemini-ultra-vision model is slated for future release.

To generate content, employ the generateContent() method, supplying both a text prompt and an image. The ContentMaker and PartMaker classes offer streamlined ways to craft complex prompts that integrate various modalities.

Alternatively, for simpler use cases, a straightforward string can suffice as the argument for the generateContent() method. The ResponseHandler utility is then used to gather the model’s textual response in full.

For those interested in receiving the output incrementally as the text is generated, a streaming method can be adopted, enhancing the interaction with the model by providing real-time feedback.

model.generateContentStream("Why is the sky blue?")
    .stream()
    .forEach(System.out::print);

You can also iterate over the stream with a for loop:

ResponseStream<GenerateContentResponse> responseStream =
    model.generateContentStream("Why is the sky blue?");

for (GenerateContentResponse responsePart: responseStream) {
    System.out.print(ResponseHandler.getText(responsePart));
}

Let’s chat!

Gemini is a multimodal model, and it’s actually both a text generation model, but also a chat model. So you can chat with Gemini, and ask a series of questions in context. There’s a handy ChatSession utility class which simplifies the handling of the conversation:

try (VertexAI vertexAI = new VertexAI(projectId, location)) {
    GenerateContentResponse response;

    GenerativeModel model = new GenerativeModel(modelName, vertexAI);
    ChatSession chatSession = new ChatSession(model);

    response = chatSession.sendMessage("Hello.");
    System.out.println(ResponseHandler.getText(response));

    response = chatSession.sendMessage("What is the capital of Tunisia?");
    System.out.println(ResponseHandler.getText(response));

    response = chatSession.sendMessage("Are Many Tunisian talented in IT ? ");
    System.out.println(ResponseHandler.getText(response));
}

This is convenient to use ChatSession as it takes care of keeping track of past questions from the user, and answers from the assistant.

Going further

This is just a few examples of the capabilities of Gemini. Be sure to check out some of the samples that are available on Github. Read more about Gemini and Generative AI in the Google Cloud documentation.

Author

  • Mohamed BEN HASSINE

    Mohamed BEN HASSINE is a Hands-On Cloud Solution Architect based out of France. he has been working on Java, Web , API and Cloud technologies for over 12 years and still going strong for learning new things. Actually , he plays the role of Cloud / Application Architect in Paris ,while he is designing cloud native solutions and APIs ( REST , gRPC). using cutting edge technologies ( GCP / Kubernetes / APIGEE / Java / Python )

0 Shares:
Leave a Reply