AI API
Provides an API to interact with LLMs
ai.agent
Signatures
agent(goal, opts \ [], state)
Description
AI agent that iteratively navigates and interacts with devices using vision and control.
Parameters
goal(string) - Natural language description of what to accomplishopts(table) - Optional configurationmodel- AI model to use (default: from Sidecar.LLM)maxAttempts- Maximum iterations (default: 20)screenshotDelayMs- Optional. Milliseconds to wait before each screenshot between iteration loops (default: 0 when omitted)tools- List of tool names (default: ["remote_control", "screenshot"])
Returns
Returns two values: success (boolean) and result (table)
success: true if goal completed, false otherwiseresult.actions: String of actions taken (e.g., "up,right,tap,complete")result.attempts: Number of iterationsresult.finalScreen: Screenshot after completion (userdata)result.reasoning: AI's explanation of what it did
Example
local success, result = ai.agent("Navigate to Settings", {maxAttempts = 10})
if success then
print("Completed in " .. result.attempts .. " attempts")
print("Actions: " .. result.actions)
end
ai.ask
Signatures
ask(question, opts \ [])
ask(arg, question, opts)
Description
Ask a yes/no question about the current screen or a specified region.
This function captures the current screen and optionally crops it to a specific region before asking the AI a yes/no question about it.
Parameters
question(string) - A yes/no question about the screenopts(table) - Optional parameters
Options
region- Optional region name to focus the AI analysis on a specific area of the screen
Example
local is_playing = ai.ask("Is the video currently playing?")
if is_playing then
print("Video is playing")
else
print("Video is not playing")
end
-- With region
local has_error = ai.ask("Is there an error message?", {region = "error_dialog"})
ai.ask
Signatures
ask(question, opts \ [])
ask(arg, question, opts)
Description
Ask a yes/no question about a provided image and returns a boolean answer.
Uses OpenAI's structured output feature to guarantee a reliable true/false response.
Parameters
image(userdata) - Image object to analyzequestion(string) - A yes/no question about the imageopts(table) - Optional parameters
Example
local screenshot = screen.capture()
local is_visible = ai.ask(screenshot, "Is the Netflix app visible on the screen?")
if is_visible then
print("Netflix is visible!")
end
ai.codegen
Signatures
codegen(prompt)
Description
Generate lua code from the prompt.
Parameters
prompt(string) - Natural language description of actions to convert to Lua code
Example
local codegen = ai.codegen("Navigate home, wait 5 seconds, then press up and enter")
print(codegen)
-- control.home() wait(5000) control.up() control.ok()
ai.navigate
Signatures
navigate(target, opts \ [])
navigate(arg, target, opts)
Description
Navigate menus using AI with the current screen or a specified region.
This function captures the current screen and optionally crops it to a specific region before attempting AI-guided navigation to the target.
Parameters
target(string) - Navigation target descriptionopts(table) - Optional parameters
Options
region- Optional region name to focus the navigation on a specific area of the screen
Example
local sequence, prompt, result, viewpoint = ai.navigate("Settings menu")
-- With region
local sequence, prompt, result, viewpoint = ai.navigate("Volume control", {region = "audio_panel"})
ai.navigate
Signatures
navigate(target, opts \ [])
navigate(arg, target, opts)
Description
Navigate menus using AI with a provided image.
Parameters
image(userdata) - Image object to analyze for navigationtarget(string) - Navigation target descriptionopts(table) - Optional parameters
Options
model- AI model to use for navigationtemperature- AI temperature settingimageScale- Scale factor for image processingpromptType- Type of navigation promptadditionalInstructions- Extra instructions for the AI
ai.prompt
Signatures
prompt(user_prompt, opts \ [])
prompt(arg, prompt, opts)
Description
Runs a prompt with the current screen or a specified region and returns the response and viewpoint.
This function captures the current screen and optionally crops it to a specific region before sending the prompt to the AI.
Parameters
user_prompt(string) - The prompt to send to the AIopts(table) - Optional parameters
Options
region- Optional region name to focus the AI analysis on a specific area of the screen
Example
local response, viewpoint = ai.prompt("What is the text on the screen?")
print(response)
-- "The screen shows a login form with username and password fields."
-- With region
local response, viewpoint = ai.prompt("What color is this button?", {region = "submit_button"})
ai.prompt
Signatures
prompt(user_prompt, opts \ [])
prompt(arg, prompt, opts)
Description
Runs a prompt with a provided image, or the current screen, and returns the response and the viewpoint of the AI.
Parameters
image(userdata) - Image object to analyzeprompt(string) - The prompt to send to the AIopts(table) - Optional parameters
Example
local response, viewpoint = ai.prompt("What is the color of the screen?")
print(response)
-- "The general color of the screen is blue."
ai.run
Signatures
run(prompt, state)
Description
The same as codegen, but runs the generated code immediately.
Parameters
prompt(string) - Natural language description of actions to execute
Example
ai.run("Navigate home, wait 5 seconds, then press up and enter")