Finding the durations of MP4 files without downloading the entire file

I wanted to find the durations of a bunch of MP4 files located out on the net – durations for the introduction videos for the top Kickstarter projects.

But I wanted to do this quickly. Downloading all those MP4 files would take too long. A little bit of research revealed that MP4 files files set up for streaming have their metadata (or moov atom) at the beginning of the file.

Now I need a way to read just the metadata, without getting the entire file.

More research reveals that I can use curl and dd to get the first bytes of a file. For some reason ‘curl -r’ doesn’t work.

So now we’re ready to go.

I made a file that had one Kickstarter project URL per line. Here’s a couple of them:

This script will load the Kickstarter project page, and get the URL-encoded download link for the project’s introductory video, if there is one:

Now we need to URL-decode the URLs:

Now we get the durations from the video urls, you’ll need Python, pip, and virtualenvwrapper installed. We make a Python virtual environment, and install hsaudiotag module to decode the mp4 metadata:

This code uses curl and dd to download only the first 512-byte block of the MP4 file.

Now we analyze the durations using a simple R script, I am on a Mac so I need to use Homebrew to install R:

Output for the top 100 Kickstarter technology projects (by amount raised) – all numbers are in seconds:

The average duration of the top 100 Kickstarter videos is 203.3 seconds, or just about 3.38 minutes.

Thanks to: