Exploring Netflix’s Content And viewership - A Data-Driven Perspective
Introduction
I stumbled across last month’s TidyTuesday analytical challenge featuring Netflix data, and curiosity got the better of me what exactly are we watching, and what patterns lie beneath the surface? My first instinct was to dive in with Python, since that’s the language I use daily for production systems. But then I thought: where’s the fun in that? I decided to dust off my Julia skills instead, and finally give Makie, Julia’s visualization library, a proper try.
What started as a small side experiment quickly turned into a deeper exploration. And when I finished, I figured why not turn this into a write-up?
Netflix sits at the heart of how stories are told in the modern era. It is not just a distributor of films and shows; it has become a mirror of global cultural trends. Looking at the data provides more than a snapshot of viewing habits it uncovers patterns of engagement, longevity, and strategy that define today’s streaming era.
In this article, I explore Netflix’s catalog and viewing data through four key questions:
How do films and series differ in terms of engagement?
Does the age of a title influence its performance?
Are there seasonal dynamics in Netflix viewership?
But before we dive in, a word on data preparation.
Data Cleaning and Preparation: Why It Matters
The Raw streaming data released by Netflix had its normal shortcomings like any other data source realy it had Missing release dates, duplicate entries, and inconsistent labeling required careful cleaning. Dates were standardized, null values handled, and calculated fields such as views per hour were introduced to move beyond raw counts.
This step is not just technical housekeeping it is foundation to the integrity of the analysis. Without consistent data, the patterns we draw would risk being misleading.
Films vs. Series: Different Engagement Profiles
One of the first questions was whether films and series perform differently on the platform. When aggregating total hours and engagement (views per hour), series often capture longer sustained attention, while films can spike with high volumes in shorter bursts.
This aligns with intuitive viewing behavior: series encourage binge-watching, stretching engagement over multiple episodes, whereas films deliver concentrated attention. From a strategic perspective, this helps explain Netflix’s pivot in recent years toward serialized content.
In summary:
From the total hours to the average hours spent viewing on netflix we can see an upward trend on the hours spent. This graph tells us which years produced content that really stuck with audiences.You can notice how certain years stand out, driven by breakout hits or multiple strong titles.
The second graph depicts quality vs. quantity. Netflix sometimes floods the market, but not every title gets equal traction. In hits years, fewer shows dominate, while in saturated years, attention fragments
Engagement
Most engaging titles.
Here we try not just to see raw popularity but efficiency as well, that is how much audience attention a title captures relative to time watched. So Some titles may not dominate total hours, but they generate very high engagement. This would mean\suggest strong audience pull in shorter windows.
Engagement tells us which titles people lean in to watch actively, not just what’s playing in the background.
Most viewed titles
Another point of view we can take is on the absolute watch time. This is the metric of scale and broad reach. Here we see the shows/films that most people watched. These are the heavyweights Netflix’s traffic drivers. But note: big doesn’t always mean engaging.
Engagement vs popularity
This shows the relationship between titles with high engagement vs those with high total views.
Clearly from this we can see two categories:
we can see blockbusters with massive hours but average engagement.
Buzzy shows with very high engagement but smaller total hours.
Longevity: Do Older Titles Still Matter?
A striking finding emerges when comparing newer releases against older catalog titles. Newly released content dominates in total hours viewed, but certain older titles retain remarkable staying power, pulling in consistent audiences years after release.
When categorized into age buckets (0–2 years, 3–5 years, 6–10 years, etc.), the data reveals a decay curve: most titles peak early and then taper off, but the long tail is not negligible. This long-tail effect underscores the importance of catalog depth — Netflix doesn’t just thrive on the new; it also benefits from evergreen content.
Seasonality: Peaks and Lulls in Viewing
Quarterly breakdowns highlight that viewership is not evenly distributed across the calendar year. Peaks occur during holiday periods (Q4), when audiences have more leisure time, while summer months tend to dip slightly.
For Netflix, this has implications for release strategies. Positioning blockbuster titles in high-demand periods can amplify impact, while quieter quarters may serve as testing grounds for niche or experimental content.
Limitations of the Data
It’s important to acknowledge what the data cannot tell us. For instance, the dataset lacks detailed information on genres, languages, or viewer demographics. This prevents a richer exploration of cultural diversity in the catalog. Moreover, the numbers reflect availability and hours but not audience sentiment — whether viewers liked what they watched remains beyond the dataset’s scope.
The link is here.