-
Notifications
You must be signed in to change notification settings - Fork 281
Description
Feature Request: Metadata Support for ADK Part Types
Summary
We need a mechanism to attach arbitrary key-value metadata to response content (Part) in Google ADK. Currently, no existing Part type supports attaching custom metadata without semantic constraints or validation errors.
Problem Statement
Use Case
We're building a multimodal voice agent where agent responses need to carry supplementary metadata alongside the primary content:
| Metadata Key | Purpose | Example Value |
|---|---|---|
textForAudio |
TTS-optimized version of response (shorter, conversational) | "Found 3 items for you" |
citations |
Source references for grounding | ["doc1.pdf", "kb:article-123"] |
responseType |
Classification for client handling | "answer", "clarification", "handoff" |
confidence |
Model confidence score | 0.92 |
Current Workaround (FunctionResponse)
We attempted to use FunctionResponse as a metadata carrier:
// In agent code - creating metadata-carrying response
Content responseContent = Content.fromParts(
Part.fromText(displayText), // Primary response
Part.fromFunctionResponse( // Metadata carrier (WORKAROUND)
FunctionResponse.builder()
.name("response_metadata")
.response(Map.of(
"textForAudio", "TTS optimized text",
"responseType", "answer"
))
.build()
)
);Extraction logic:
public static String extractTextForAudioFromPart(Part part) {
if (part.functionResponse().isEmpty()) {
return null;
}
var functionResponse = part.functionResponse().orElse(null);
if (functionResponse == null || functionResponse.response().isEmpty()) {
return null;
}
Object responseData = functionResponse.response().orElse(null);
if (responseData instanceof Map<?, ?> responseMap) {
return (String) responseMap.get("textForAudio");
}
return null;
}Why This Workaround Fails
-
Validation Exception: ADK enforces that
FunctionResponsemust accompany a priorFunctionCallin the conversation history. Without a matching call, we get:IllegalStateException: A text or function call is mandatory for function response -
Required Mock Function Call: To satisfy validation, we must inject a fake
FunctionCall:// HACK: Mock function call to satisfy validation Content mockCallContent = Content.builder() .role("model") .parts(List.of( Part.fromFunctionCall( FunctionCall.builder() .name("response_metadata") .args(Map.of()) .build() ) )) .build();
-
Semantic Violation:
FunctionResponseis semantically meant for tool/function execution results, not arbitrary metadata. This breaks the A2A/ADK protocol contract. -
Conversation History Pollution: Mock function calls pollute the conversation history, potentially confusing the model in subsequent turns.
Analysis of Existing Part Types
| Part Type | Intended Purpose | Metadata Capability | Why It Doesn't Fit |
|---|---|---|---|
text |
Plain text content | ❌ None | No extension point for metadata |
functionCall |
Model requests tool execution | Schema-bound (name, args) |
Triggers tool execution flow |
functionResponse |
Tool execution result | ✅ Has response: Object |
Requires prior functionCall, validation enforced |
inlineData |
Binary blobs (images, audio) | Only mimeType |
Binary-focused, no key-value support |
fileData |
File URI references | Only fileUri, mimeType |
Reference-focused |
executableCode |
Code for execution | language, code |
Execution-specific |
codeExecutionResult |
Execution output | outcome, output |
Execution-specific |
thought |
Model reasoning (CoT) | ❌ None | Internal model use |
thoughtSignature |
Thought verification | ❌ None | Internal/security use |
videoMetadata |
Video-specific metadata | Video fields only | Domain-specific |
Gap: No Part type supports arbitrary, user-defined metadata that can be attached to or alongside content.
Proposed Solutions
Option A: Add metadata Field to Part (Recommended)
Extend the Part proto/class with an optional metadata field:
Proto Definition:
message Part {
oneof data {
string text = 1;
Blob inline_data = 2;
FunctionCall function_call = 3;
FunctionResponse function_response = 4;
FileData file_data = 5;
ExecutableCode executable_code = 6;
CodeExecutionResult code_execution_result = 7;
// ... existing fields
}
// NEW: Optional metadata map for any Part type
map<string, Value> metadata = 15;
}Java Usage:
Part responseWithMetadata = Part.builder()
.text("Here are 3 items matching your search for wireless headphones.")
.metadata(Map.of(
"textForAudio", "Found 3 items for wireless headphones.",
"responseType", "search_results",
"confidence", 0.95
))
.build();
// Extraction
part.metadata()
.map(m -> m.get("textForAudio"))
.ifPresent(ttsText -> synthesizeSpeech(ttsText.toString()));Advantages:
- Non-breaking: Optional field, backward compatible
- Works with any Part type (text, inlineData, etc.)
- Clean semantic: metadata is clearly auxiliary to primary content
- No protocol violations
Option B: New AnnotatedPart Wrapper Type
Introduce a wrapper that pairs any Part with metadata:
Proto Definition:
message AnnotatedPart {
Part part = 1;
map<string, Value> annotations = 2;
}
message Part {
oneof data {
// ... existing fields
AnnotatedPart annotated_part = 16; // Self-referential for nesting
}
}Java Usage:
Part annotated = Part.fromAnnotatedPart(
AnnotatedPart.builder()
.part(Part.fromText("Full response text here"))
.annotations(Map.of(
"textForAudio", "Shorter TTS version",
"citations", List.of("source1", "source2")
))
.build()
);
// Extraction with unwrapping
part.annotatedPart().ifPresent(ap -> {
String displayText = ap.part().text().orElse("");
String ttsText = ap.annotations().get("textForAudio").toString();
});Advantages:
- Clear separation: annotations distinct from content
- Composable: Can annotate any Part type
- Explicit: No ambiguity about what's metadata vs content
Disadvantages:
- Adds nesting complexity
- Clients must handle unwrapping
Option C: Standalone FunctionResponse (Relaxed Validation)
Allow FunctionResponse to be used without a prior FunctionCall when used for metadata purposes:
Proposed Validation Change:
// Current (strict)
if (hasFunctionResponse && !hasPriorFunctionCall) {
throw new IllegalStateException("FunctionResponse requires prior FunctionCall");
}
// Proposed (relaxed for metadata pattern)
if (hasFunctionResponse && !hasPriorFunctionCall) {
if (!functionResponse.name().startsWith("metadata:")) {
throw new IllegalStateException("FunctionResponse requires prior FunctionCall");
}
// Allow metadata: prefixed function responses as standalone
}Java Usage:
Part metadataPart = Part.fromFunctionResponse(
FunctionResponse.builder()
.name("metadata:response_annotations") // Prefix signals metadata use
.response(Map.of(
"textForAudio", "TTS text",
"responseType", "answer"
))
.build()
);Advantages:
- No proto changes required
- Backward compatible
- Uses existing infrastructure
Disadvantages:
- Semantic overloading of
FunctionResponse - Convention-based (prefix), not type-safe
- May confuse tooling that expects function call/response pairs
Option D: New MetadataPart Type
Add a dedicated Part type for metadata:
Proto Definition:
message MetadataPart {
string namespace = 1; // e.g., "tts", "citations", "custom"
map<string, Value> data = 2; // Key-value pairs
}
message Part {
oneof data {
// ... existing fields
MetadataPart metadata_part = 16;
}
}Java Usage:
Content response = Content.fromParts(
Part.fromText("Full display response"),
Part.fromMetadataPart(
MetadataPart.builder()
.namespace("tts")
.data(Map.of("textForAudio", "Short TTS version"))
.build()
),
Part.fromMetadataPart(
MetadataPart.builder()
.namespace("grounding")
.data(Map.of("citations", List.of("doc1", "doc2")))
.build()
)
);Advantages:
- Explicit purpose: clearly for metadata
- Namespaced: avoids key collisions
- Multiple metadata parts allowed
Disadvantages:
- New type = more complexity
- Metadata separated from content it describes
Recommendation
Option A (metadata field on Part) provides the cleanest solution:
- Minimal protocol change
- Keeps metadata co-located with content
- Works universally across Part types
- No semantic overloading
Questions for ADK Team
- Is there an existing pattern for metadata attachment we may have overlooked?
- Are there plans to support Part-level metadata in the roadmap?
- Would any of the proposed options align with ADK's design philosophy?
- Is there a recommended workaround using current primitives that doesn't violate protocol semantics?
Environment
| Component | Version |
|---|---|
| ADK | 0.4.0 |
| Language | Java 21 |
| genai-types | 1.24.0 |
| Use Case | Multimodal voice agent with TTS synthesis |
Related Code References
- Workaround implementation:
TextForAudioExtractor.java - Consumer code:
SparkyMultimodalRunner.java - Config flag:
multimodalConfig.textExtraction().useFunctionResponseExtraction()