Vision Scripting Guide

TV Labs' scripting environment uses powerful vision-based methods for screen monitoring and automation. Instead of relying on DOM elements or coordinates, you interact with TV apps using visual screen definitions and wait for specific screens to appear.

Why Vision-Based Testing?

Unlike DOM-based automation, our platform uses computer vision to test what users actually see:

-- Wait for a specific screen to load
wait.forScreen("Home", { timeout = 10000 })

-- Navigate and wait for next screen
control.ok()
wait.forScreen("Search", { timeout = 10000 })

Key Benefits

Platform Independence: Tests work across different TV operating systems
Layout Resilience: Continue working when UI layouts change
User-Centric: Test what users actually see, not internal state
Natural Debugging: Screenshots make failures easy to understand
Performance Insight: Built-in timing for all visual operations

This approach works across all TV platforms because it tests the actual user experience, not internal code structure.

Waiting for Screens

The core vision operation is waiting for specific screens using screen definitions:

-- Wait for a specific screen (matches <Screen name="Home"> in XML)
wait.forScreen("Home", { timeout = 10000 })

-- Wait for multiple possible screens (first match wins)
local time, frame = wait.forScreen({"Episode", "Episode_Alt"}, { timeout = 10000 })

-- Get timing and screen capture
local time, frame = wait.forScreen("Profiles", { timeout = 30000 })
print("Screen appeared after: " .. tostring(time) .. "ms")

The screen names like "Home", "Profiles", and "Search" used in your Lua code correspond directly to the <Screen name="..."> definitions in your XML files.

Screen Definition Files

Screens are defined in XML configuration files that specify visual elements. Each screen can contain multiple elements that must be detected for the screen to be considered "active":

<App name="Netflix">
  <Screen name="Home">
    <Text height="35" left="162" match="Trending Now|Today|New Releases|Only on Netflix"
          name="TrendingLabel" required="true" top="859" width="220">
    </Text>
  </Screen>

  <Screen name="Profiles">
    <Text height="32" left="192" match="Choose a Profile"
          name="ChooseProfileLabel" required="true" top="123" width="220">
    </Text>
  </Screen>

  <Screen name="Loading">
    <Brightness below="0.1" height="538" left="1254" name="BrightnessMarker"
                required="true" top="100" width="338">
    </Brightness>
  </Screen>
</App>

Element Types

The XML contains different element types for detection. Here are just a few of the many available:

<Text> - Detects text content using OCR
<Brightness> - Detects brightness/darkness in regions
<Color> - Detects specific colors or color ranges
<Button> - Defines interactive button elements
<Image> - Matches images using template matching
<Region> - Defines clickable areas and boundaries
<Component> - Groups multiple elements together
<Form> - Defines form layouts with inputs and submit buttons
<Input> - Text input fields
<Asset> - References to uploaded image assets
<AssetList> - Collections of image assets

Each element has position, size, and detection criteria that you configure visually in the Designer.

These XML files are generated using the Designer - a graphical tool that lets you visually define screen regions by capturing screenshots and drawing detection areas. You don't need to write XML by hand!

Learn more: Automation Documentation - Visual workflow design and screen definitions

Motion Detection

Wait for motion to start or stop on screen:

-- Wait for motion to start (video playback beginning)
local time, frame = wait.forMotion("start", {
  timeout = 10000,
  includeAudio = true,
  stableFrames = 3,
  threshold = 0.30
})

-- Wait for motion to end (loading completion)
local time, frame = wait.forMotion("end", {
  timeout = 10000,
  stableFrames = 15
})

-- Ignore specific regions during motion detection
local time, frame = wait.forMotion("start", {
  ignore = "Max.Loading.MotionVector",
  stableFrames = 3
})

Screen Capture and Analysis

Capture and analyze screen content:

-- Basic screen capture
local frame = screen.capture()
asset.upload(frame, "Test Screenshot")

-- Get current screen type
local currentScreen = screen.type()
if currentScreen == "Profiles" then
  print("Currently on profiles screen")
end

Region-Based Operations

Work with specific screen regions for text detection and assertions:

-- Check if a region is active
local is_active, _, reason = region.is_active({"region_id"}, { mode = "All" })
if not is_active then
  error("Region not active: " .. reason)
end

-- Extract text from a region
local matched, matched_text, detected_text = region.extract_text("region_id", {
  pattern = ".{1,}",
  threshold = 80
})

AI-Powered Vision

Use AI to interact with screens when traditional vision methods aren't sufficient:

-- Describe what's on screen
local response, viewpoint = ai.prompt("Describe what you see on the screen")

-- Navigate to UI elements using AI
local sequence = ai.navigate("Search button", {
  additionalInstructions = "The selected element is highlighted in yellow"
})

-- Detect specific elements
local result, viewpoint = ai.prompt("Is there a Login button visible?", {
  temperature = 0,
  systemPrompt = "You are an expert spotter of buttons. Give only true or false."
})

Advanced Vision Features

Perceptual Hashing

Wait for specific frames using perceptual hash:

-- Wait for a specific frame
local time, frame = wait.forPhash(13337681215611895963, {
  timeout = 50000,
  threshold = 4
})

Video Playback Quality

Monitor video playback quality:

-- Test playback with network constraints
network.enableRateLimit(0.1, 1000000)  -- 0.1Mbps down, 1000Mbps up
local playback_failed = wait.forPlaybackFailure({
  timeout = 120000,
  threshold = 4.2
})

if playback_failed then
  print("Playback failed due to quality issues")
else
  print("Playback completed successfully")
end

Fully Rendered Screens

Wait for screens to be completely loaded (no motion):

local time, frame = wait.forScreen("Home", {
  timeout = 10000,
  waitForMotion = true,
  motionTimeout = 5000,
  stableFrames = 3
})

Vision Best Practices

Define clear screen markers: Use unique, stable text or visual elements in XML definitions
Set appropriate timeouts: Balance between test speed and reliability
Use stable frames: Ensure motion has truly stopped before proceeding
Handle device differences: Some devices may render screens differently
Combine vision methods: Use screen detection + motion detection for robust automation
Leverage AI wisely: Use AI navigation for dynamic UIs that are hard to define statically

Ready for Testing?

Now that you understand vision-based automation, learn how to structure these techniques into comprehensive tests:

Next: Writing Your First Test - Learn test organization and structure

Why Vision-Based Testing?​

Key Benefits​

Waiting for Screens​

Screen Definition Files​

Element Types​

Motion Detection​

Screen Capture and Analysis​

Region-Based Operations​

AI-Powered Vision​

Advanced Vision Features​

Perceptual Hashing​

Video Playback Quality​

Fully Rendered Screens​

Vision Best Practices​

Ready for Testing?​