Android GenAI Prompt API Enables Natural Language Requests with Gemini Nano

Android GenAI Prompt API: On-Device AI Flexibility for Android Developers

The ML Kit GenAI Prompt API, now in alpha, allows Android developers to send natural language and multimodal (image + text) requests to Gemini Nano, a lightweight on-device large language model (LLM). This API extends Google’s earlier GenAI capabilities, enabling tasks like text summarization, image description, and custom data transformation while prioritizing user privacy through local processing. Unlike pre-built APIs that limit output formats (e.g., fixed bullet points for summarization), the Prompt API provides greater flexibility by enabling developers to craft custom prompts for complex use cases.

Key Features and Capabilities

Multimodal Input Support: Accepts text, images, or a combination of both.
On-Device Processing: Uses Gemini Nano (Nano-v3 on Pixel 10, Nano-v2 on other devices) to ensure offline functionality and data privacy.
Custom Prompt Engineering: Developers can define their own prompts instead of relying on pre-optimized ones.
Use Cases:
- Image classification (e.g., categorizing objects in photos).
- Content generation with tailored instructions.
- Enhanced user experiences (e.g., Kakao Mobility’s bike-parking detection and address entry improvements).

Comparison with Existing APIs

Pre-Built APIs:
- Summarization: Limited to 1–3 bullet points.
- Image Description: Produces generic captions.
- Rewriting: Supports fixed styles (e.g., elaborate, friendly).
Prompt API:
- Flexibility: Allows custom prompts for unique tasks.
- Tradeoff: Requires more integration effort and technical expertise.

Technical Implementation Example

The following code demonstrates how to use the Prompt API to classify an image:

Generation.getClient().generateContent(
    generateContentRequest(
        ImagePart(bitmapImage),
        TextPart("Categorize this image as one of the following: car, motorcycle, bike, scooter, other. Return only the category as the response.")
    )
) {
    // Optional parameters for fine-tuning
    temperature = 0.2f
    topK = 10
    candidateCount = 1
    maxOutputTokens = 10
}

Explanation:

ImagePart(bitmapImage): Sends an image captured via the app (e.g., a photo of a bike).
TextPart(...): Provides a custom prompt instructing the model to classify the image into predefined categories.
Parameters:
- temperature: Controls randomness (lower values = more deterministic outputs).
- maxOutputTokens: Limits response length to ensure efficiency.

Device Compatibility and Limitations

Supported Devices:
- Pixel 10 series: Uses Gemini Nano-v3 (best performance).
- Other devices: Pixel 9, Samsung Galaxy Z Fold7, Xiaomi 15, etc., use Gemini Nano-v2 (less capable).
Limitations:
- Battery Constraints: Enforced quotas may limit continuous use.
- No Background Execution: The API cannot run in the background, requiring foreground interaction.

Expert Insights and Recommendations

Kakao Mobility’s Use Case: Demonstrates real-world utility (e.g., detecting improper bike parking via image analysis).
JobNimbus’s Perspective: Highlights potential for privacy-critical environments but notes current limitations (e.g., lack of background support).
Best Practices:
- Test on Supported Devices: Prioritize Pixel 10 for optimal performance.
- Optimize Prompts: Use clear, concise instructions to avoid ambiguous outputs.
- Monitor Battery Usage: Avoid overloading the device with frequent on-device AI tasks.

Potential Pitfalls

Over-Reliance on Device Capabilities: Nano-v2 may struggle with complex tasks on older hardware.
Prompt Engineering Challenges: Poorly designed prompts can lead to inaccurate or irrelevant responses.
Integration Complexity: Requires familiarity with ML Kit and Gemini Nano’s constraints.

For more details, visit the InfoQ article.

On This Page

Android GenAI Prompt API: On-Device AI Flexibility for Android Developers

Key Features and Capabilities

Comparison with Existing APIs

Technical Implementation Example

Device Compatibility and Limitations

Expert Insights and Recommendations

Potential Pitfalls

Continue reading

Related Content

Mastering Edge AI Performance and Power on Android: Stop Guessing, Start Profiling

From PyTorch to Shipping Local AI on Android

MiniMax Releases M2.1: An Enhanced M2 Version with Features like Multi-Coding Language Support, API Integration, and Improved Tools for Structured Coding