Vertex AI AI
Why
Every Vertex AI API call costs money. Every prompt iteration, every integration test, every debug session burns credits. localgcp proxies your Vertex AI calls to local models running via Ollama. The official google.golang.org/genai SDK works unchanged -- just set the BaseURL to localgcp. No API keys, no quotas, no bills.
"But I can already use Ollama directly?"
You can. Ollama is great. But then your local code uses Ollama's API, and your production code uses the Vertex AI SDK. Two different APIs, two different code paths, two different sets of bugs to find in production.
With localgcp, your production code is your test code:
// Production: hits Vertex AI on Google Cloud client, _ := genai.NewClient(ctx, prodConfig) resp, _ := client.Models.GenerateContent(ctx, "gemini-2.5-flash", prompt, nil) // Local dev: hits localgcp -> Ollama -> Gemma running on your laptop client, _ := genai.NewClient(ctx, localConfig) // only BaseURL differs resp, _ := client.Models.GenerateContent(ctx, "gemini-2.5-flash", prompt, nil)
Same SDK. Same model name. Same response parsing. Same error handling. The only difference is one config line: BaseURL: "http://localhost:8090".
This also means your CI/CD pipeline tests the real integration path (SDK call, response parsing, error handling) without API keys or cloud bills. Stub mode returns deterministic responses, no Ollama needed in CI.
Prerequisites
Install Ollama and pull a model:
$ brew install ollama # or download from ollama.com $ ollama serve # start the Ollama server $ ollama pull gemma3 # pull a model
Without Ollama running, localgcp falls back to deterministic stub responses (see Stub Mode below).
Configuration
| Flag | Default | Description |
|---|---|---|
--port-vertexai |
8090 |
Vertex AI emulator port |
--ollama-host |
http://localhost:11434 |
Ollama API endpoint |
--vertex-model-map |
(built-in defaults) | Map Vertex model names to Ollama models |
$ localgcp up \ --ollama-host=http://localhost:11434 \ --vertex-model-map="gemini-2.5-flash=llama3.2,text-embedding-004=nomic-embed-text"
Go SDK example
This is a complete, runnable example using the official google.golang.org/genai SDK:
package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main() { ctx := context.Background() client, err := genai.NewClient(ctx, &genai.ClientConfig{ Project: "my-project", Location: "us-central1", Backend: genai.BackendVertexAI, HTTPOptions: genai.HTTPOptions{ BaseURL: "http://localhost:8090", }, }) if err != nil { log.Fatal(err) } resp, err := client.Models.GenerateContent(ctx, "gemini-2.5-flash", genai.Text("Explain quantum computing in one sentence"), nil, ) if err != nil { log.Fatal(err) } fmt.Println(resp.Candidates[0].Content.Parts[0].Text) // Response comes from Ollama (e.g. llama3.2) running locally }
BaseURL. Your model name (gemini-2.5-flash) is automatically mapped to the local Ollama model via --vertex-model-map.
Model alias table
Map Vertex AI model names to Ollama models with --vertex-model-map:
| Vertex AI Model | Ollama Model | Use Case |
|---|---|---|
gemini-2.5-flash | llama3.2 | Fast text generation |
gemini-2.5-pro | gemma3 | Higher quality generation |
gemini-2.0-flash | llama3.2 | Legacy model alias |
text-embedding-004 | nomic-embed-text | Text embeddings |
Override defaults with comma-separated pairs:
$ localgcp up --vertex-model-map="gemini-2.5-flash=mistral,gemini-2.5-pro=llama3.2:70b"
Embeddings example
package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main() { ctx := context.Background() client, err := genai.NewClient(ctx, &genai.ClientConfig{ Project: "my-project", Location: "us-central1", Backend: genai.BackendVertexAI, HTTPOptions: genai.HTTPOptions{ BaseURL: "http://localhost:8090", }, }) if err != nil { log.Fatal(err) } resp, err := client.Models.EmbedContent(ctx, "text-embedding-004", genai.Text("What is the meaning of life?"), nil, ) if err != nil { log.Fatal(err) } fmt.Printf("Embedding dimensions: %d\n", len(resp.Embeddings[0].Values)) // Proxied to nomic-embed-text via Ollama }
Stub mode for CI/CD
When Ollama is not running, localgcp automatically returns deterministic stub responses. This is ideal for CI/CD pipelines that need to test Vertex AI integration code without running a model:
- generateContent returns a fixed text response
- embedContent returns a fixed-dimension embedding vector
- No Ollama installation needed
- Responses are deterministic and fast
# CI/CD: just start localgcp, no Ollama needed $ localgcp up & $ go test ./... # Vertex AI calls return stub responses
Limitations
- No streaming (
streamGenerateContent) -- responses are returned in full - No tool/function calling
- No multimodal input (images, audio, video)
- No multi-provider backends (OpenAI, Anthropic) -- Ollama only
- No system instructions or safety settings
See the roadmap for upcoming features.