Vertex AI AI

local LLM inference via Ollama, zero cloud bills

Why

Every Vertex AI API call costs money. Every prompt iteration, every integration test, every debug session burns credits. localgcp proxies your Vertex AI calls to local models running via Ollama. The official google.golang.org/genai SDK works unchanged -- just set the BaseURL to localgcp. No API keys, no quotas, no bills.

"But I can already use Ollama directly?"

You can. Ollama is great. But then your local code uses Ollama's API, and your production code uses the Vertex AI SDK. Two different APIs, two different code paths, two different sets of bugs to find in production.

With localgcp, your production code is your test code:

// Production: hits Vertex AI on Google Cloud
client, _ := genai.NewClient(ctx, prodConfig)
resp, _ := client.Models.GenerateContent(ctx, "gemini-2.5-flash", prompt, nil)

// Local dev: hits localgcp -> Ollama -> Gemma running on your laptop
client, _ := genai.NewClient(ctx, localConfig)  // only BaseURL differs
resp, _ := client.Models.GenerateContent(ctx, "gemini-2.5-flash", prompt, nil)

Same SDK. Same model name. Same response parsing. Same error handling. The only difference is one config line: BaseURL: "http://localhost:8090".

This also means your CI/CD pipeline tests the real integration path (SDK call, response parsing, error handling) without API keys or cloud bills. Stub mode returns deterministic responses, no Ollama needed in CI.

Prerequisites

Install Ollama and pull a model:

$ brew install ollama     # or download from ollama.com
$ ollama serve            # start the Ollama server
$ ollama pull gemma3      # pull a model

Without Ollama running, localgcp falls back to deterministic stub responses (see Stub Mode below).

Configuration

FlagDefaultDescription
--port-vertexai 8090 Vertex AI emulator port
--ollama-host http://localhost:11434 Ollama API endpoint
--vertex-model-map (built-in defaults) Map Vertex model names to Ollama models
$ localgcp up \
    --ollama-host=http://localhost:11434 \
    --vertex-model-map="gemini-2.5-flash=llama3.2,text-embedding-004=nomic-embed-text"

Go SDK example

This is a complete, runnable example using the official google.golang.org/genai SDK:

package main

import (
    "context"
    "fmt"
    "log"

    "google.golang.org/genai"
)

func main() {
    ctx := context.Background()

    client, err := genai.NewClient(ctx, &genai.ClientConfig{
        Project:  "my-project",
        Location: "us-central1",
        Backend:  genai.BackendVertexAI,
        HTTPOptions: genai.HTTPOptions{
            BaseURL: "http://localhost:8090",
        },
    })
    if err != nil {
        log.Fatal(err)
    }

    resp, err := client.Models.GenerateContent(ctx,
        "gemini-2.5-flash",
        genai.Text("Explain quantum computing in one sentence"),
        nil,
    )
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(resp.Candidates[0].Content.Parts[0].Text)
    // Response comes from Ollama (e.g. llama3.2) running locally
}
Key point: The only change from production code is setting BaseURL. Your model name (gemini-2.5-flash) is automatically mapped to the local Ollama model via --vertex-model-map.

Model alias table

Map Vertex AI model names to Ollama models with --vertex-model-map:

Vertex AI ModelOllama ModelUse Case
gemini-2.5-flashllama3.2Fast text generation
gemini-2.5-progemma3Higher quality generation
gemini-2.0-flashllama3.2Legacy model alias
text-embedding-004nomic-embed-textText embeddings

Override defaults with comma-separated pairs:

$ localgcp up --vertex-model-map="gemini-2.5-flash=mistral,gemini-2.5-pro=llama3.2:70b"

Embeddings example

package main

import (
    "context"
    "fmt"
    "log"

    "google.golang.org/genai"
)

func main() {
    ctx := context.Background()

    client, err := genai.NewClient(ctx, &genai.ClientConfig{
        Project:  "my-project",
        Location: "us-central1",
        Backend:  genai.BackendVertexAI,
        HTTPOptions: genai.HTTPOptions{
            BaseURL: "http://localhost:8090",
        },
    })
    if err != nil {
        log.Fatal(err)
    }

    resp, err := client.Models.EmbedContent(ctx,
        "text-embedding-004",
        genai.Text("What is the meaning of life?"),
        nil,
    )
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Embedding dimensions: %d\n", len(resp.Embeddings[0].Values))
    // Proxied to nomic-embed-text via Ollama
}

Stub mode for CI/CD

When Ollama is not running, localgcp automatically returns deterministic stub responses. This is ideal for CI/CD pipelines that need to test Vertex AI integration code without running a model:

# CI/CD: just start localgcp, no Ollama needed
$ localgcp up &
$ go test ./...   # Vertex AI calls return stub responses

Limitations

See the roadmap for upcoming features.