Skip to content

Metadata Support for ADK Part Types #855

@sudiptamodak1987

Description

@sudiptamodak1987

Feature Request: Metadata Support for ADK Part Types

Summary

We need a mechanism to attach arbitrary key-value metadata to response content (Part) in Google ADK. Currently, no existing Part type supports attaching custom metadata without semantic constraints or validation errors.

Problem Statement

Use Case

We're building a multimodal voice agent where agent responses need to carry supplementary metadata alongside the primary content:

Metadata Key Purpose Example Value
textForAudio TTS-optimized version of response (shorter, conversational) "Found 3 items for you"
citations Source references for grounding ["doc1.pdf", "kb:article-123"]
responseType Classification for client handling "answer", "clarification", "handoff"
confidence Model confidence score 0.92

Current Workaround (FunctionResponse)

We attempted to use FunctionResponse as a metadata carrier:

// In agent code - creating metadata-carrying response
Content responseContent = Content.fromParts(
    Part.fromText(displayText),  // Primary response
    Part.fromFunctionResponse(   // Metadata carrier (WORKAROUND)
        FunctionResponse.builder()
            .name("response_metadata")
            .response(Map.of(
                "textForAudio", "TTS optimized text",
                "responseType", "answer"
            ))
            .build()
    )
);

Extraction logic:

public static String extractTextForAudioFromPart(Part part) {
    if (part.functionResponse().isEmpty()) {
        return null;
    }
    
    var functionResponse = part.functionResponse().orElse(null);
    if (functionResponse == null || functionResponse.response().isEmpty()) {
        return null;
    }
    
    Object responseData = functionResponse.response().orElse(null);
    if (responseData instanceof Map<?, ?> responseMap) {
        return (String) responseMap.get("textForAudio");
    }
    return null;
}

Why This Workaround Fails

  1. Validation Exception: ADK enforces that FunctionResponse must accompany a prior FunctionCall in the conversation history. Without a matching call, we get:

    IllegalStateException: A text or function call is mandatory for function response
    
  2. Required Mock Function Call: To satisfy validation, we must inject a fake FunctionCall:

    // HACK: Mock function call to satisfy validation
    Content mockCallContent = Content.builder()
        .role("model")
        .parts(List.of(
            Part.fromFunctionCall(
                FunctionCall.builder()
                    .name("response_metadata")
                    .args(Map.of())
                    .build()
            )
        ))
        .build();
  3. Semantic Violation: FunctionResponse is semantically meant for tool/function execution results, not arbitrary metadata. This breaks the A2A/ADK protocol contract.

  4. Conversation History Pollution: Mock function calls pollute the conversation history, potentially confusing the model in subsequent turns.


Analysis of Existing Part Types

Part Type Intended Purpose Metadata Capability Why It Doesn't Fit
text Plain text content ❌ None No extension point for metadata
functionCall Model requests tool execution Schema-bound (name, args) Triggers tool execution flow
functionResponse Tool execution result ✅ Has response: Object Requires prior functionCall, validation enforced
inlineData Binary blobs (images, audio) Only mimeType Binary-focused, no key-value support
fileData File URI references Only fileUri, mimeType Reference-focused
executableCode Code for execution language, code Execution-specific
codeExecutionResult Execution output outcome, output Execution-specific
thought Model reasoning (CoT) ❌ None Internal model use
thoughtSignature Thought verification ❌ None Internal/security use
videoMetadata Video-specific metadata Video fields only Domain-specific

Gap: No Part type supports arbitrary, user-defined metadata that can be attached to or alongside content.


Proposed Solutions

Option A: Add metadata Field to Part (Recommended)

Extend the Part proto/class with an optional metadata field:

Proto Definition:

message Part {
  oneof data {
    string text = 1;
    Blob inline_data = 2;
    FunctionCall function_call = 3;
    FunctionResponse function_response = 4;
    FileData file_data = 5;
    ExecutableCode executable_code = 6;
    CodeExecutionResult code_execution_result = 7;
    // ... existing fields
  }
  
  // NEW: Optional metadata map for any Part type
  map<string, Value> metadata = 15;
}

Java Usage:

Part responseWithMetadata = Part.builder()
    .text("Here are 3 items matching your search for wireless headphones.")
    .metadata(Map.of(
        "textForAudio", "Found 3 items for wireless headphones.",
        "responseType", "search_results",
        "confidence", 0.95
    ))
    .build();

// Extraction
part.metadata()
    .map(m -> m.get("textForAudio"))
    .ifPresent(ttsText -> synthesizeSpeech(ttsText.toString()));

Advantages:

  • Non-breaking: Optional field, backward compatible
  • Works with any Part type (text, inlineData, etc.)
  • Clean semantic: metadata is clearly auxiliary to primary content
  • No protocol violations

Option B: New AnnotatedPart Wrapper Type

Introduce a wrapper that pairs any Part with metadata:

Proto Definition:

message AnnotatedPart {
  Part part = 1;
  map<string, Value> annotations = 2;
}

message Part {
  oneof data {
    // ... existing fields
    AnnotatedPart annotated_part = 16;  // Self-referential for nesting
  }
}

Java Usage:

Part annotated = Part.fromAnnotatedPart(
    AnnotatedPart.builder()
        .part(Part.fromText("Full response text here"))
        .annotations(Map.of(
            "textForAudio", "Shorter TTS version",
            "citations", List.of("source1", "source2")
        ))
        .build()
);

// Extraction with unwrapping
part.annotatedPart().ifPresent(ap -> {
    String displayText = ap.part().text().orElse("");
    String ttsText = ap.annotations().get("textForAudio").toString();
});

Advantages:

  • Clear separation: annotations distinct from content
  • Composable: Can annotate any Part type
  • Explicit: No ambiguity about what's metadata vs content

Disadvantages:

  • Adds nesting complexity
  • Clients must handle unwrapping

Option C: Standalone FunctionResponse (Relaxed Validation)

Allow FunctionResponse to be used without a prior FunctionCall when used for metadata purposes:

Proposed Validation Change:

// Current (strict)
if (hasFunctionResponse && !hasPriorFunctionCall) {
    throw new IllegalStateException("FunctionResponse requires prior FunctionCall");
}

// Proposed (relaxed for metadata pattern)
if (hasFunctionResponse && !hasPriorFunctionCall) {
    if (!functionResponse.name().startsWith("metadata:")) {
        throw new IllegalStateException("FunctionResponse requires prior FunctionCall");
    }
    // Allow metadata: prefixed function responses as standalone
}

Java Usage:

Part metadataPart = Part.fromFunctionResponse(
    FunctionResponse.builder()
        .name("metadata:response_annotations")  // Prefix signals metadata use
        .response(Map.of(
            "textForAudio", "TTS text",
            "responseType", "answer"
        ))
        .build()
);

Advantages:

  • No proto changes required
  • Backward compatible
  • Uses existing infrastructure

Disadvantages:

  • Semantic overloading of FunctionResponse
  • Convention-based (prefix), not type-safe
  • May confuse tooling that expects function call/response pairs

Option D: New MetadataPart Type

Add a dedicated Part type for metadata:

Proto Definition:

message MetadataPart {
  string namespace = 1;           // e.g., "tts", "citations", "custom"
  map<string, Value> data = 2;    // Key-value pairs
}

message Part {
  oneof data {
    // ... existing fields
    MetadataPart metadata_part = 16;
  }
}

Java Usage:

Content response = Content.fromParts(
    Part.fromText("Full display response"),
    Part.fromMetadataPart(
        MetadataPart.builder()
            .namespace("tts")
            .data(Map.of("textForAudio", "Short TTS version"))
            .build()
    ),
    Part.fromMetadataPart(
        MetadataPart.builder()
            .namespace("grounding")
            .data(Map.of("citations", List.of("doc1", "doc2")))
            .build()
    )
);

Advantages:

  • Explicit purpose: clearly for metadata
  • Namespaced: avoids key collisions
  • Multiple metadata parts allowed

Disadvantages:

  • New type = more complexity
  • Metadata separated from content it describes

Recommendation

Option A (metadata field on Part) provides the cleanest solution:

  • Minimal protocol change
  • Keeps metadata co-located with content
  • Works universally across Part types
  • No semantic overloading

Questions for ADK Team

  1. Is there an existing pattern for metadata attachment we may have overlooked?
  2. Are there plans to support Part-level metadata in the roadmap?
  3. Would any of the proposed options align with ADK's design philosophy?
  4. Is there a recommended workaround using current primitives that doesn't violate protocol semantics?

Environment

Component Version
ADK 0.4.0
Language Java 21
genai-types 1.24.0
Use Case Multimodal voice agent with TTS synthesis

Related Code References

  • Workaround implementation: TextForAudioExtractor.java
  • Consumer code: SparkyMultimodalRunner.java
  • Config flag: multimodalConfig.textExtraction().useFunctionResponseExtraction()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions