TV Labs Performance Score

Motivation

The TV Labs Performance Score (TVPS) evaluates the performance of smart TV apps through the lens of a customer. To determine this score, we compare relevant metrics with industrie benchmarks and adjust for devices that naturally perform slower. Ultimately, this produces a numerical value that quickly tells us how well an app is performing.

How the Performance Score is Calculated

The Performance Score provides a quick look at how well smart TV apps are functioning. In a nutshell, this is how we calculate the TV Labs Performance Score:

Market Share: Every type of TV (like different brands or models) has its popularity. Some are more common in homes than others. We begin by determining how widespread each TV model is compared to the rest. TVs that are more common or popular will have their scores weighted more because they represent a larger share of the market.

Aggregated Scores: For every TV, we measure common metrics, like how quickly it starts an app for example, and compare this speed to the entire set of all measurements taken within the last 30 days for the same make and model. Based on this comparison, we assign points. Certain metrics might be more critical, so they get a higher point value. Next, we total these points.

Diminishing Returns and The Law of Little Gains: Imagine an already fast App becoming a tad quicker; it might not be very noticeable. But a slower App making significant improvements can be easily noticed. So, we adjust the scores to factor in these nuances.

Final Score: After all the evaluations and tweaks, we convert the total points into a percentage score out of 100. This final number offers a clear indication of the TV app's overall performance.

In-depth description

If you want to learn more, the following is a more detailed description of these calculations:

Normalized Market Share:

For each device, the market share is calculated as a fraction of the total market share of all devices.
$\text{Normalized Market Share} = \frac{\text{Market Share of Device}}{\text{Total Market Share of All Devices}}$
Individual Measurements

To determine a metrics value we repeatetly run the same test between 10 and 50 times. A measurement is determined by taking the median of all measurements. This way we can adjust for both outliers and small variations between tests. Usually tests run in the early morning.
$\text{Measurement} = \text{median}\left( all measurements taken in one session \right)$
Aggregated Performance Score:

Each metric's value is first normalized with respect to a baseline.
$\text{Normalized Metric} = \frac{\text{Baseline (90th percentile)}}{\text{Metric Value (median)}}$
After normalization, the metrics are weighted.¹
$\text{Weighted Metric Score} = \text{Weight of Metric} \times \text{Normalized Metric}$
The aggregated score for a device is a sum of the weighted metric scores, which is then multiplied by the device's normalized market share.
$\text{Aggregated Score} = \text{Normalized Market Share} \times \text{Sum of Weighted Metric Scores}$
Diminishing Returns:

The aggregated score is adjusted to account for diminishing returns, emphasizing that as performance metrics improve, the reward for further improvement is reduced. This is done using the formula, also known as gamma curve:
$\text{Adjusted Score} = \text{Aggregated Score}^{\left(\frac{1}{\text{Gamma Value}}\right)}$
By default, the Gamma Value is set to 2.0.
Final Performance Score:

The adjusted score is then converted into a percentage for a clearer representation.
$\text{Performance Score (\%)} = \text{Adjusted Score} \times 100$
Putting it all together into a comprehensive equation the TVPS (TV Labs Performance Score) is calculated as:
$\text{TVPS} = 100\% \times \left( \sum_{i} \left( \text{m}_{i} \times \sum_{k} \frac{w_{k} \cdot \text{v}_{k,i}}{\text{p}_{k,i}} \right) \right) ^{\frac{1}{\gamma}}$
where $i$ refers to devices and $k$ to metrics:
- $m_{i}$ Normalized market share for device $i$ .
- $p_{k,i}$ Baseline for device $i$ and metric $k$ . The baseline is defined as the 90th percentile of measurements taken during the last 30 days.
- $v_{k,i}$ Metric value for device $i$ and metric $k$ where the metric value is calculated as the median of all the performance measurements taken on a specific day.
- $S$ Performance Score in %.
- $\gamma$ correction factor for diminishing returns, which is set to 2.0

Why this Approach?

Customizability: By using predefined weights and baselines for each device, we can prioritize specific metrics and adjust for unique device characteristics.
Market Share Importance: Metrics from devices with a larger market share will have a more significant impact on the overall score, ensuring market relevance.
Diminishing Returns: This emphasizes significant improvements in poorly performing metrics and avoids excessive rewards for minor enhancements in already optimal metrics.
Relative Performance: Metrics are interpreted relative to typical device performance (via the baseline), giving users a contextual view rather than just absolute values.

For now the only metrics the following metrics are used with a weight of 0.5 each (indicating both have equal importance): video_start_time and app_start_time. Future versions of the TVPS might include other metrics and weights. ↩

TV Labs Performance Score

Motivation​

How the Performance Score is Calculated​

In-depth description​

Why this Approach?​

Footnotes​

Motivation

How the Performance Score is Calculated

In-depth description

Why this Approach?

Footnotes