Trusted Certifications for 10 Years | Flat 25% OFF | Code: GROWTH
Blockchain Council
ai6 min read

Gemini 3.5 Flash API Tutorial: Authentication, Rate Limits, and Example Requests

Suyash RaizadaSuyash Raizada
Gemini 3.5 Flash API Tutorial: Authentication, Rate Limits, and Example Requests

Gemini 3.5 Flash API tutorial is a practical guide for developers who want fast, cost-efficient text generation and chat experiences through Google's Gemini API. Flash models are designed for low latency and high throughput, making them a strong fit for chatbots, interactive tools, and high-volume content workflows. Gemini 3.5 Flash follows the same REST and SDK mechanics as earlier Gemini Flash generations, so you can reuse proven patterns for authentication, quota management, and request structure.

What is Gemini 3.5 Flash, and How Do You Access It?

Gemini is a family of models that has progressed from 1.0 and 1.5 through 2.0 and into the 3.x generation. Within each generation, Flash variants prioritize speed and throughput, typically at a lower cost than more capable tiers like Pro or Ultra. Google exposes these models through a unified set of endpoints where the primary difference is the model ID you specify, such as gemini-1.5-flash, gemini-2.0-flash, and gemini-3.5-flash.

Certified Artificial Intelligence Expert Ad Strip

You can access Gemini 3.5 Flash in two main ways:

  • Gemini API (AI Studio API key): best for individual developers, prototypes, and quick integrations.
  • Vertex AI (Google Cloud credentials): best for enterprise workloads requiring IAM, centralized governance, and quota management.

Both options support client libraries and raw REST calls. The canonical REST pattern uses endpoints like /v1/models/{model}:generateContent and a corresponding streaming endpoint.

Authentication for the Gemini 3.5 Flash API

Your authentication choice depends on whether you are building a prototype or running production workloads with governance requirements. A common progression is to start with an AI Studio key, then migrate to Vertex AI as the application matures.

Option 1: AI Studio API Key (Gemini API)

To use the Gemini API, create an API key in Google AI Studio and store it securely. For local development, set it as an environment variable. Google recognizes these variable names:

  • GEMINI_API_KEY
  • GOOGLE_API_KEY (takes precedence if both are set)

Example (macOS or Linux):

export GEMINI_API_KEY="your_api_key_here"

Security note: do not commit API keys to source control. Use a secrets manager in CI and production environments.

Using the API Key with the Python SDK

Google's GenAI SDKs can auto-detect your key from environment variables, which simplifies deployment configuration. Python example:

pip install -U google-genai

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain how rate limiting works in the Gemini API."
)

print(response.text)

Using the API Key with the Node.js SDK

Node.js example using @google/genai:

npm install @google/genai

import { GoogleGenerativeAI } from "@google/genai";

const client = new GoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY,
});

From here, you call the SDK's generate methods in the same way as Python, passing a model name and your prompt contents.

Option 2: Vertex AI Authentication (Google Cloud Credentials)

For enterprise environments, Vertex AI is generally preferred because it integrates with IAM, service accounts, logging, and quota governance. A standard setup uses Application Default Credentials (ADC):

  1. Create a Google Cloud project and enable the relevant Gemini or Vertex AI APIs.
  2. Create a service account with a role such as Vertex AI User.
  3. Download the JSON key file for local testing, or use workload identity in production.
  4. Set GOOGLE_APPLICATION_CREDENTIALS to the JSON key path.
  5. Set GOOGLE_CLOUD_PROJECT (or the equivalent project ID variable) for your tooling.

This approach reduces reliance on developer-managed API keys and supports centralized access controls.

Gemini 3.5 Flash Rate Limits and Quotas

Rate limits for Gemini 3.5 Flash are quota-based rather than a single fixed threshold applied to all users. Limits vary based on several factors:

  • Billing status (free tier vs. paid)
  • Channel (AI Studio Gemini API vs. Vertex AI)
  • Model tier (Flash vs. Pro vs. Ultra)
  • Region and project configuration

In practice, you will encounter quotas expressed as requests per minute (RPM) and tokens per minute (TPM). Flash models are designed for higher throughput than Pro or Ultra, which is why they are commonly used for chat and high-volume workloads.

What Happens When You Exceed Rate Limits?

When you exceed rate limits, the API returns an HTTP 429 Too Many Requests error with a message indicating quota exhaustion. Your client should handle 429 responses by implementing:

  • Exponential backoff with jitter
  • Retry budgets to prevent retry storms
  • Queueing for bursty traffic, especially in multi-tenant backends
  • Request shaping such as batching where appropriate

How to Check and Manage Your Quotas

  • AI Studio: review the usage dashboard to see current consumption and remaining quota.
  • Google Cloud Console (Vertex AI): review quotas for Vertex AI and Gemini-related services, and submit quota increase requests for production workloads.

If you are building a multi-tenant API, plan your quota strategy early. A common architecture implements per-tenant throttles in your backend so that one tenant cannot exhaust the entire project quota.

Example REST Requests for Gemini 3.5 Flash

The core REST call for text generation is a POST to the generateContent method. The API key is passed as a query parameter.

Text Generation with curl

curl \
  -X POST \
  -H "Content-Type: application/json" \
  "https://generativelanguage.googleapis.com/v1/models/gemini-3.5-flash:generateContent?key=${GEMINI_API_KEY}" \
  -d '{
    "contents": [
      {
        "parts": [
          { "text": "Summarize the key differences between Gemini Flash and Pro models." }
        ]
      }
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 512
    }
  }'

The response includes a candidates array containing generated content parts. Most applications read the first candidate and extract the text field for display.

Streaming Responses for Chat UIs

Streaming suits interactive applications where you want to render output as it is generated. In Python, the SDK wraps the streaming endpoint in a convenient iterator:

from google import genai

client = genai.Client()

stream = client.models.generate_content_stream(
    model="gemini-3.5-flash",
    contents="Write a short tutorial outline for Gemini 3.5 Flash."
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="", flush=True)

For production UIs, consider buffering and emitting partial tokens at sensible intervals to avoid UI thrashing.

Multi-Turn Chat Example

For conversational workflows, use the SDK chat helper to maintain conversation state on the client side and send history with each request:

from google import genai

client = genai.Client()
chat = client.chats.create(model="gemini-3.5-flash")

while True:
    user_input = input("You: ")
    if user_input.lower() in {"exit", "quit"}:
        break
    response = chat.send_message(user_input)
    print("Model:", response.text)

This pattern serves as a baseline for assistants, internal help desks, and customer support bots where Flash's low latency directly affects user experience.

Tools, Grounding, and Multimodal Inputs

Beyond plain text, the Gemini API supports capabilities such as Google Search grounding, code execution, URL context, file search, and computer use. A simple grounding example with Google Search:

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What is the latest guidance on EU AI regulation?",
    tools=[{"google_search_retrieval": {}}]
)
print(response.text)

When designing production workflows with tools, treat them as external dependencies: log tool calls, track latency, and apply allowlists to reduce risk.

Practical Deployment Checklist

  • Choose the right authentication path: AI Studio key for prototypes, Vertex AI for enterprise controls.
  • Centralize secrets: use a secrets manager, rotate keys regularly, and limit access scope.
  • Handle rate limits properly: implement retries with exponential backoff for 429 errors, plus queueing and per-tenant throttles.
  • Monitor usage: track RPM, TPM, latency, and error rates in production.
  • Choose Flash vs. Pro deliberately: Flash for throughput and responsiveness, Pro for tasks requiring deeper reasoning.

Conclusion

This Gemini 3.5 Flash API tutorial covered the core elements needed to build reliable integrations: authentication via AI Studio API keys or Vertex AI credentials, quota-driven rate limits with 429 handling strategies, and concrete REST and SDK examples for text generation, streaming, and multi-turn chat. Because Gemini 3.5 Flash shares the same endpoint patterns as other Gemini Flash models, you can begin with the documented request formats immediately and scale using Vertex AI quotas and IAM as your workload grows.

For teams looking to formalize AI development skills and secure implementation practices, Blockchain Council offers structured training programs in AI development, prompt engineering, and cybersecurity that support production-ready standards across engineering organizations.

Related Articles

View All

Trending Articles

View All