24 April 20268 min

How to score a video ad before you spend

Video ads don't fail at the algorithm. They fail at the creative. The problem is that most advertisers only find out after the budget is spent and the platform reports arrive — which, between attribution windows and learning-phase requirements, means discovering the failure seven to fourteen days after launch. By then the money is gone.

Scoring a video ad before launch means evaluating it across six specific dimensions — visual diversity, hook variety, angle coverage, format mix, audio diversity, and text and CTA variety — against known patterns of what delivers. Each dimension predicts a different kind of failure mode. Doing this well isn't expensive. Doing it poorly, or not at all, is the norm and is the reason most creative sets under-deliver.

This guide walks through the six dimensions, how to weight them, and the similarity check that catches the clustering risks that platform delivery engines will punish you for.

Why platform reports are late

Three structural delays mean by the time performance data arrives, your budget decision is already made.

Attribution windows. Meta defaults to 7-click 1-view. Google Ads default is 30-day click. TikTok defaults to 28-day click. A creative reported as "performing" three days after launch is still hiding most of its real performance picture inside the attribution window.

Learning phase requirements. Each creative needs roughly $50–$150 of spend and seven days to exit the learning phase on Meta. Until then, delivery is exploratory and cost-per-action is inflated. Platforms recommend minimum 50 conversions per week per ad set to maintain learning.

Platform optimisation during launch. Advantage+, Performance Max, and Smart+ all optimise against their own internal signals during launch — clustering, similarity, early engagement. Your first week of a creative set isn't really a test; it's the platform sorting your creatives into buckets that will persist for the life of the campaign.

The implication is simple: by the time you have enough data to judge a creative empirically, you've already committed to it. Pre-launch scoring closes that gap.

The six dimensions that predict creative performance

1. Visual Diversity

The variance in visual style, framing, colour, and subject across your creative set. Low variance means Meta's Andromeda algorithm — or Google Performance Max's creative matching, or TikTok Smart+'s delivery engine — groups your creatives as visually similar and throttles delivery on all but the top-clustered member.

How to measure: perceptual hashing plus CLIP 512-dim embeddings. Each creative produces an embedding vector; the set's diversity is the average pairwise distance.

Healthy range: an average pairwise cosine distance of ≥0.6 across the set. Below 0.4 means delivery will suppress similar creatives.

2. Hook Variety

The first three seconds. Hook fingerprints capture facial composition, motion direction, opening-word phonetics, colour palette, and music-entry timing. Research across large ad-creative corpora consistently shows that 60–70% of CTR variance on video ads comes from the first three seconds.

Common failure mode: eight creatives, three distinct hooks. Delivery collapses to the top two hooks and never serves the other six at scale. The set looks big; the delivery footprint is tiny.

Healthy range: at least as many distinct hook fingerprints as you have creatives that you want to actually run. In practice this means a creative set of 10 should have 8+ distinct hooks.

3. Angle Coverage

The narrative angle each creative takes. Problem-solution, social-proof, testimonial, feature demo, before-and-after, urgency, category-education, comparison. Most accounts over-cover two angles and under-cover five or more. Delivery will pick up on the dominant angle and drown the others.

Healthy range: at least five angles represented across an 8–10 creative set. No single angle should claim more than 40% of the creatives.

4. Format Mix

Aspect ratios, durations, platform-native vs adapted. A creative set built only at 16:9 will lose 30% or more of its placement footprint on Meta (which needs 9:16 for Stories and Reels, 4:5 for Feed) and the entirety of TikTok. Adapted creatives (horizontal video letterboxed or blurred into 9:16) systematically underperform native-format creatives.

Healthy range: native format for every placement you intend to run. 9:16 and 4:5 should be present if you're running Meta; 9:16 only, no adaptation, if you're running TikTok.

5. Audio Diversity

VO, music, sound effects, silence. Audio is the most under-optimised dimension because advertisers think in visuals. But audio drives completion rate, especially on TikTok where audio-off is rare (unlike Meta where it's the default). A creative set with four VO tracks of similar cadence and tone will fatigue audio-first before visual-first.

Healthy range: at least three distinct audio treatments across a 10-creative set. If all your creatives use the same royalty-free background track, expect fast audio fatigue.

6. Text and CTA Variety

On-screen text, captions, CTA verbs. Covers both accessibility (sound-off viewing, which is the Meta default and 15% of TikTok) and intent laddering (soft CTA like "learn more" vs hard CTA like "shop now"). Sets that repeat the same CTA verb across every creative miss the intent ladder — they signal the same commercial temperature to every viewer regardless of funnel position.

Healthy range: at least three distinct CTA verbs across a 10-creative set. Captions present on all creatives.

How to weight the dimensions

Default weights for a generic consumer-brand account: 25% visual, 20% hook, 20% angle, 15% format, 10% audio, 10% text. These come from cross-account analysis on hundreds of linked campaigns.

Those defaults are the starting point, not the answer. Once you have 30+ linked campaign-outcome pairs on your own account, those weights should be re-learned from your data. Every advertiser's weights end up different. DTC beauty runs visual-heavy; financial services runs text-and-CTA-heavy; performance-driven subscription runs hook-heavy.

The similarity check that catches clustering

Run every creative pair through a similarity matrix. The standard computation is 45% visual similarity (CLIP embedding cosine), 30% audio similarity (MFCC or embedding distance), 25% hook similarity (first-three-second fingerprint match). Each pair scores 0 (completely different) to 1 (identical).

Interpretation: pairs scoring above 0.7 will likely be clustered by Meta Andromeda and either Google's PMax or TikTok's Smart+ delivery engine. That means the pair competes for the same audience segment — you're paying multiple times to reach the same people.

Any red cell in the matrix is wasted budget waiting to happen. A well-built 10-creative set has no pairs above 0.7 and a median pairwise similarity around 0.3–0.4.

Score ranges

Putting it together, here's roughly what the total weighted score means:

8.0+: publication-ready. Your set covers the dimensions and has no clustering issues. Launch with full budget.
6.5–8.0: strong, launch-eligible. One or two dimensions might be slightly weak but the set will perform.
5.0–6.5: marginal. Usually one dimension is materially under-covered, often angle coverage or audio diversity. Fix before launch.
Below 5.0: significant clustering risk. The set will suffer delivery suppression on at least one platform. Rework a few creatives or drop the weakest.

Where Omniscia fits

Omniscia runs this framework automatically via Omniscia Lens. Upload a creative set, get a 0–10 score across the six dimensions with their similarity matrix, and receive ranked recommendations on what to fix before launch. On accounts with linked campaign data, the default dimension weights get replaced with weights learned from your account's historical correlations — so the scoring sharpens as your account matures.

Further reading