

Building the Audience Intelligence Engine That Powered India's No. 2 YouTube MCN in Nine Months
Culture Machine was scaling a YouTube Multi-Channel Network without a reliable way to know who was watching, what they wanted next, or whether a piece of content would perform before it went live. Ideas2IT built the predictive analytics platform that changed that: a distributed AWS architecture crawling metadata at billion-video scale, statistical models predicting audience demographics and content performance, and an automated pipeline generating long-tail video assets from stock libraries.

Client
Culture Machine

Industry
Entertainment

Service
Artificial Intelligence
BI & Analytics

Engagement
9 months · Time & Material

Team
7 engineers
01 Challenge
Culture Machine was producing content for a YouTube MCN without a systematic way to predict what an audience wanted before publishing. Instinct and basic YouTube Analytics were the inputs. The result was content strategy that moved at the speed of trial and error, with no mechanism to target demographics or forecast performance at scale.
02 Solution
Ideas2IT built a distributed metadata platform on AWS: a multi-node Hadoop cluster crawling video metadata at billion-scale, a predictive R modeling layer mapping title, category, description, and social signals to audience demographics, and a content recipe engine that took a target demographic as input and returned a content brief. Long-tail assets were generated automatically from stock libraries.
03 Outcome
Within nine months, the platform was processing metadata at scale, the MCN had reached the number two position in India, and content teams had a repeatable model for demographic targeting where they previously had none.
Phase 01
Crawling metadata at billion-video scale: the distributed AWS architecture that made prediction possible
The first engineering constraint was the scale of the input: over a billion YouTube videos, with metadata arriving from sources that rate-limited and metered external requests.
The team built
- a distributed cloud-native AWS architecture on a four-node Hadoop cluster running on Amazon EC2, with NameNode and DataNode infrastructure configured for the volume.
- To get around metering by social media platforms, the crawling layer used IP rotation and request morphing, aggregating metadata at the scale the statistical models required.
- The data was stored in MySQL for structured records and Kyoto Cabinet for high-throughput key-value lookup. Ubuntu Server 12.04.3 was the OS baseline, with cluster storage configured at under 30GB per node.
This Phase Produced
- Multi-node Hadoop cluster on AWS EC2
- Distributed metadata crawling architecture
- IP rotation and request morphing crawler
- MySQL + Kyoto Cabinet data layer
- Billion-scale metadata aggregation pipeline
- GitHub-based version control and workflow
Phase 02
Audience prediction and content recipe models: turning metadata signals into demographic forecasts
With metadata at scale available, the modeling layer used R to run predictive analytics across video signals: title, category, description, comments, and social engagement data. The audience prediction model took a video as input and returned a demographic forecast.
The content recipe model inverted that relationship: given a target audience demographic, it produced the content parameters most likely to reach them. Both models ran against the YouTube Data and Analytics API.
An automated long-tail production pipeline assembled videos from stock photo and music libraries, generating content programmatically without manual production effort. The full system was built and delivered inside a nine-month engagement with a seven-person team.
This Phase Produced
- Audience demographic prediction model (R)
- Content recipe engine (demographic-to-brief)
- YouTube Data + Analytics API integration
- Automated long-tail video assembly pipeline
- Stock image and music library integration
- Predictive analytics layer (Hadoop + R)
The Outcome
What the platform made possible for a content team operating at MCN scale
Culture Machine's content team moved from instinct-driven publishing to a platform that could forecast who would watch a video before it went live, and reverse that model to generate content briefs for specific demographics. The MCN ranking followed from the capability, not from the platform alone. A seven-person team delivered that architecture in nine months because the problem was scoped correctly from the start: build the data layer first, at the scale the models required, then build the models on top of it.