M.010.20 — Android Google Lens Integration (Execution Spec)
Date: 2026-04-04
Owner: Copilot execution lane
Status: Ready for implementation
Goal
Integrate Google Lens-style visual understanding into HoloScript mobile flow so users can point at real objects, extract semantic labels/relations, and inject those as typed entities into a .holo scene graph.
Product behavior
- User opens camera in mobile authoring mode.
- User points at an object and taps Understand.
- Vision service returns:
- object label(s)
- confidence
- optional text/OCR
- coarse spatial bounds
- System maps result to HoloScript semantic nodes (
object,tag,metadata). - Scene graph updates in real time and can be edited before commit.
Door-opening impact
- Dramatically lowers scene-authoring friction from blank-canvas to semantic bootstrap.
- Converts real world context into structured holographic primitives.
- Creates a practical “AI-to-Holo” pipeline useful for demos, enterprise PoCs, and onboarding.
Scope
In scope (v1)
- Mobile camera capture + analysis trigger
- Vision-to-scene mapping for top-k object labels
- Confidence-aware insertion and user review UI
.holoexport path with attached semantic metadata
Out of scope (v1)
- Full segmentation mesh extraction
- Continuous always-on analysis stream
- Multi-frame SLAM-grade object persistence
Data contracts
ts
interface LensDetection {
id: string;
label: string;
confidence: number; // 0..1
boundingBox?: { x: number; y: number; w: number; h: number };
text?: string;
attributes?: Record<string, string | number | boolean>;
}
interface SceneSemanticInsertion {
nodeId: string;
kind: 'object' | 'tag' | 'annotation';
sourceDetectionId: string;
label: string;
confidence: number;
metadata: Record<string, unknown>;
}Technical architecture
1) Mobile capture adapter
- Capture frame on user action
- Normalize image dimensions + orientation
- Route to configured analyzer backend (Lens-compatible interface)
2) Analyzer gateway
- Adapter interface for provider abstraction
- Returns canonical
LensDetection[] - Applies confidence threshold + dedup
3) Semantic mapper
- Maps detections into scene entities and tags
- Emits staged insertions in review buffer
- Supports accept/reject per insertion
4) Scene graph commit
- Accepted insertions merged into active
.holocomposition - Adds provenance metadata:
- analyzer provider
- detection timestamp
- confidence score
Proposed code touch points
packages/studio/:- mobile camera authoring panel
- review/approve insertion UX
packages/core/:- semantic insertion schema + scene graph merge utility
packages/llm-provider/or adapter layer:- analyzer gateway + provider abstraction
Failure taxonomy
ANALYZER_UNAVAILABLEFRAME_CAPTURE_FAILEDNO_DETECTIONSLOW_CONFIDENCE_ONLYSCENE_INSERTION_CONFLICT
Each failure should provide actionable UX fallback (retry, manual label entry, skip).
Acceptance criteria
- Single-frame object analysis returns semantic candidates in under 2s on target device/network.
- User can approve/reject candidates before scene mutation.
- Approved candidates appear in scene graph and persist in exported
.holo. - Insertion metadata includes confidence + source provenance.
- Failure states never corrupt existing scene graph.
Test plan
Unit
- detection canonicalization and threshold filtering
- semantic mapping correctness
- conflict-safe insertion merge
Integration
- camera capture -> analyzer -> review buffer -> commit
- analyzer timeout fallback path
- duplicate detection suppression
Device validation
- Android capture orientation correctness
- low-light and cluttered-background behavior
- latency and success-rate sampling
Shipping slices
- Slice A: analyzer gateway + schema + mapper core
- Slice B: mobile review UX + commit pipeline
- Slice C: export/provenance + reliability hardening
Metrics
lens_analysis_requests_totallens_analysis_success_totallens_analysis_latency_mssemantic_candidates_generated_totalsemantic_candidates_accepted_totalscene_insert_conflict_total
Definition of done
- End-to-end point-and-understand flow works on Android mobile authoring path
- acceptance criteria pass
.holoexports contain semantic insertions with provenance- operator docs include setup, thresholds, and known limitations