Video Streaming, Simplified!

Rajesh Dangi / December 20th, 2019

Beyond the Cloud

Video Streaming, Simplified!

It won’t be prudent to start the discussion on Video Streaming without mentioning YouTube, with over 2 billion monthly active users with over one billion hours of YouTube content viewed per day and 500 hours of content uploaded to YouTube, it becomes a testimony in itself on the video streaming and content delivery app on the Internet crossing between $9.5 billion and $14 billion revenue in 2018.

According to market estimates live video streaming market will be a $70.5 billion industry by 2021. The Video delivery and content is the largest consumer of data traffic today and takes major share of eyeballs in the world of mobile devices. Recently Netflix CEO, Reed Hastings said that the next 100 million subscribers will come from India, but the Indian market is unusually competitive.

Beyond the Cloud

In addition to the usual suspects such as YouTube, Netflix, Amazon, Hot star etc, new video streaming services / video on demand or lately OTT as they are called, have emerged and grown substantially in India such as Sony Liv, Voot, Zee5, or Wynk, to mention few. After Asia and Western Europe, the geography which is attracting much of an OTT traction is Latin America. Despite having limited broadband penetration and high levels of piracy, Latin America shows great potential to become OTT dominated region in coming 5 years, given the changing habits of consumers globally making this space more interesting to watch out for!


The Video on Demand i.e. VOD as a term that remained famous for over a decade actually exists since 1990 and was limited only to television broadcasting based on selection of few channels and tapes for offline viewing of the content. Wax or the Discovery of Television Among the Bees, originally released in 1991, is the first film to be streamed on the Internet. Due to bandwidth limitations it is broadcast at 2 frames per second rather than the standard 24 frames per second.

The video transmission on IP landscape began to change in the early 2000s when researchers realized that perhaps TCP, rather than UDP, could also be used to transmit video. The initial implementations of video over HTTP/TCP used a technique called Progressive Download (PD), which basically meant that the entire video file was downloaded as fast as the TCP would allow it into the receiver’s buffer. Furthermore, the client video player would start to play the video before the download was complete. This technique was used by YouTube, and even today YouTube uses an improved version of the PD algorithm called Progressive Download with Byte Ranges (PD-BR). This algorithm allows the receiver to request specific byte ranges of the video file from the server, as opposed to the entire file.

A fundamental improvement in HTTP/TCP delivery of video was made in the mid 2000s with the invention of an algorithm called HTTP Adaptive Streaming (HAS), which is also sometimes known by the acronym DASH (Dynamic Adaptive Streaming over HTTP). Using HAS, the video receiver is able to adaptively change the video rate so that it matches the bandwidth that the network can currently support. From this description, HAS can be considered to be a flow control rather than a congestion control algorithm because its objective is to keep the video receive buffer from getting depleted rather than to keep network queues from getting congested. In this sense, HAS is similar to TCP receive flow control except for the fact that the objective of the latter algorithm is to keep the receive buffer from overflowing. HAS operates on top of TCP congestion control, albeit over longer time scales, and the interaction between the two is rich source of research problems.

By the mid-2000s the vast majority of the Internet traffic was HTTP-based and content delivery networks (CDNs) were increasingly being used to ensure delivery of popular content to large audiences. Streaming media, with its hodgepodge of proprietary protocols – all mostly based on the far less popular UDP – suddenly found itself struggling to keep up with demand.

Microsoft launched its Smooth Streaming technology in 2008, the same year Netflix developed its own technology to power its pioneering Watch Instantly streaming service. Apple followed suit in 2009 with HTTP Live Streaming (HLS) designed for delivery to iOS devices, and Adobe joined the party in 2010 with HTTP Dynamic Streaming (HDS). HTTP-based adaptive streaming quickly became the weapon of choice for high-profile live streaming events.

It was a time of adolescence for streaming media – bursting with potential, but also had a potential risk due to clash of proprietary streaming technologies would do more damage than good to an industry that was on the verge of maturing into mainstream, so in 2009 efforts began in 3GPP to establish an industry standard for adaptive streaming. Early 3GPP standardisation work shifted to ISO/IEC MPEG working groups in 2010, where it moved quickly from proposals to draft status to ratification in less than two years. More than 50 companies were involved – Microsoft, Netflix and Apple included – and the effort was co-ordinated with other industry organisations such as 3GPP, DECE, OIPF and W3C. By April 2012 a new standard was born – Dynamic Adaptive Streaming over HTTP, colloquially known as MPEG-DASH. Many companies were quick to announce MPEG-DASH support in their products as early as 2011, but as it often happens with standards the adoption process didn't immediately begin at ratification. MPEG-DASH in its original specification tried to be everything to everyone and consequently suffered from excessive ambiguity (a story surely familiar to anyone acquainted with HTML5 Video). The bulk of the companies involved in MPEG-DASH quickly formed a DASH Industry Forum with the goal of promoting DASH adoption and establishing a well-defined set of interoperability constraints.

In 2013, the DASH-IF published a draft (version 0.9) of its DASH264 Implementation Guidelines to provide important interoperability requirements such as support for the H.264 video codec which has been an industry standard for the better part of the past decade. DASH264 defines other essential interoperability requirements such as support for HE-AAC v2 audio codec, ISO base media file format, SMPTE-TT subtitle format, and MPEG Common Encryption for content protection (DRM).

Today, Video streaming has become testimony of set of technologies behind YouTube videos playback, real-time video chats in Skype, Snapchat and TikTok’s of the world, and even drives the online broadcasting of your favorite team match to your smart TV via connected firesticks. With some limitations even a TV program, say, from 50 years ago, could have been referred to as video streaming, even though it was not digital and, of course, non¬interactive.

Few Fundamentals – bitrate, frame rate, resolution and Codec

Bitrate is the rate at which bits are transferred from one location to another. In other words, it measures how much data is transmitted in a given amount of time. Bitrate is commonly measured in bits per second (bps), kilobits per second (Kbps), or megabits per second (Mbps).

A video bitrate is the number of bits that are processed in a unit of time. Thus the data rate for a video file is the bitrate. So a data rate specification for video content that runs at 1 megabyte per second would be given as a bitrate of 8 megabits per second (8 mbps). The bitrate for an HD Blu-ray video is typically in the range of 20 mbps, standard-definition DVD is usually 6 mbps, high-quality web video often runs at about 2 mbps, and video for phones is typically given in the kilobits (kbps). For example, these are the targets we usually see for H.264 streaming:

  • LD 240p 3G Mobile @ H.264 baseline profile 350 kbps (3 MB/minute)
  • LD 360p 4G Mobile @ H.264 main profile 700 kbps (6 MB/minute)
  • SD 480p WiFi @ H.264 main profile 1200 kbps (10 MB/minute)
  • HD 720p @ H.264 high profile 2500 kbps (20 MB/minute)
  • HD 1080p @ H.264 high profile 5000 kbps (35 MB/minute)

It generally determines the size and quality of video and audio files: the higher the bitrate, the better the quality and the larger the file size and bandwidth required to carry the streams from source to destination.

Frame rate is the speed at which video images are shown, or how fast your player is “flipping” through the frames and it’s usually expressed as “frames per second,” or FPS. Each image represents a frame, so if a video is captured and played back at 24fps, that means each second of video shows 24 distinct still images. Frame rate greatly impacts the style and viewing experience of a video. Different frame rates yield different viewing experiences and choosing a frame rate often means choosing between things such as how realistic you want your video to look, or whether or not you plan to use techniques such as slow motion or motion blur effects.

The speed at which a human eye can spot the difference or jerkiness is below 24 fps, At 50 fps is the flicker-fusion rate -- the frame rate at which the flashing of interrupted frames disappears and the image looks solid and continuous. This is why 50Hz TV monitors looked pretty good for our eyes. Remember our retina, however, is analog -- the brain does not process vision as "frames". So it is possible that even higher frame rates could change visual perception in certain circumstances.

Pixels & Resolution is another interesting element in video streaming, the quality of content is directly proportional to the resolution of the video frames. Resolution is is the number of distinct pixels in each dimension that can be displayed. The word Pixel is derived from Pix (for Pictures) and el (for "element"). Each pixel is a sample of an original image; more samples typically provide more accurate representations of the original. The intensity of each pixel is variable. In color imaging systems, a color is typically represented by three or four component intensities such as red, green, and blue, or cyan, magenta, yellow, and black. A pixel is generally thought of as the smallest single component of a digital image.

For device displays such as phones, tablets, monitors and televisions, the use of the word resolution as defined above is a common misnomer. The term display resolution actually means pixel dimensions, the number of pixels in each dimension (e.g. 1920 × 1080), which does not tell anything about the pixel density of the display on which the image is actually formed: resolution properly refers to the pixel density, the number of pixels per unit distance or area, not total number of pixels. In digital measurement, the display resolution would be given in pixels per inch (PPI). It is usually quoted as width × height, with the units in pixels: for example, 1024 × 768 means the width is 1024 pixels and the height is 768 pixels. Just an example, an 8K Ultra HD having 7680x4320 pixel density can have 33 Million Pixels..

Beyond the Cloud

Wiki Image : Common Display Resolutions

A video codec compresses or decompresses digital video. It converts uncompressed video to a compressed format or vice versa. In the context of video compression, "codec" is a concatenation of "encoder" and "decoder"—a device that only compresses is typically called an encoder, and one that only decompresses is a decoder. The purpose of an encoder is to take a video signal and compress it into the correct format to stream across your internet connection and send it to Media servers. In many streaming setups, the encoder is integrated with the broadcasting or switching software and decoding might happen at the player running on the end device.

The most popular video coding standards used for codecs have been the MPEG standards. MPEG-1 was developed by the Motion Picture Experts Group (MPEG) in 1991, and it was designed to compress VHS-quality video. The quality the codec can achieve is heavily based on the compression format the codec uses. A codec is not a format, and there may be multiple codecs that implement the same compression specification – for example, MPEG-1 codecs typically do not achieve quality/size ratio comparable to codecs that implement the more modern H.264 specification.

Further, in Video Transcoding, is the process that converts a video file from one digital format to another digital format, to help make videos viewable across different platforms and devices. Transcoding is a two-step process in which the original data is decoded to an intermediate uncompressed format (e.g., PCM for audio; YUV for video), which is then encoded into the target format. Simply put, video transcoding refers to the process of creating multiple versions of the same video. Each discrete version has an optimization that’s ideal for different users. In other words, video transcoding helps deliver high-quality videos to viewers with fast internet speeds or at the same time, deliver lower-resolution videos to viewers with slower internet connections. The end result, ideally, is decreased buffering and latency for all your viewers, enriching QoE.

Methods of Video Streaming

A media stream can be streamed either "live" or "on demand". Live streams are generally provided by a means called "true streaming". True streaming sends the information straight to the computer or device without saving the file to a local disk. Typically, the “streamed” audio and video were delivered to a viewer over HTTP much like other website content such as images (this method is often called HTTP Streaming). The video would be downloaded to a computer then saved temporarily on a hard drive. Playback started once enough of the file had been downloaded to begin playing. The rest of the video would be downloaded in the background during playback thus this method is called "Progressive Download". The upside to progressive downloading is that it doesn’t take any special software or servers to use this method. Since it uses standard HTTP over TCP it’s just like most other website content. And there have been many adaptations to make progressive downloads function better for those who want to skip around the video and to restrict how much of the file is downloaded while still using HTTP. Many of those changes (including throttling) do offer something close to a true streaming experience.

Alternatively, there are protocols designed specifically for streaming that use a different mechanism than progressive downloads, called “Streaming Protocols”. These protocols are Real Time Messaging Protocol (RTMP) and Real Time Streaming Protocol (RTSP). Media / Content delivered using RTMP is streamed in chunks, whereas media using progressive download is sent (downloaded) via HTTP. It may seem like a small difference but there are some large differences when it comes to creating the content. For one, this type of streaming requires the use of a streaming server. The server does the work of sending the video file over the Internet to the end user. The advantage this method has is the video is never stored on the viewer’s computer. It’s transferred in chunks as a video player requests them. Once the chunk is played, the player discards the information. This prevents copying of the information off the hard drive (there are other ways to copy streaming content, but this form of streaming is more secure than progressive download). Another advantage is the viewer can skip around the video. The player simply sends a request to the streaming server and sends the appropriate chunks for the timestamp the viewer requested. It can also achieve much faster transfer rates, and thus fast play times because it uses UDP rather than TCP (and there are arguments on both sides why that is a good thing and why it isn’t).

Among the protocols, Unicast protocols send a separate copy of the media stream from the server to each recipient. Unicast is the norm for most Internet connections, but does not scale well when many users want to view the same television program concurrently. Multicast protocols were developed to reduce the server/network loads resulting from duplicate data streams that occur when many recipients receive unicast content streams independently. These protocols send a single stream from the source to a group of recipients. Depending on the network infrastructure and type, multicast transmission may or may not be feasible.

The streaming video though, is usually provided at a single bitrate, though many players offer the viewer the chance to change the bitrate – assuming the content owner provides a variety of bitrates to the server and thus enhances QoE for viewers using different devices connecting from various bandwidths and locations. The need for ensuring this QoE demanded a different methodology for streaming taking the next logical step by using the best of both methods discussed above and called Adaptive Variable Bitrate (ABR). Same video can now be delivered over HTTP, but it can also be streamed over UDP. Both can use various bitrates. The basic concept is that the bitrate of the streamed content changes depending on current local network conditions and computing resources. It’s thus becoming the standard for most major streaming platforms and technology owners like Apple (HLS Live Streaming), Adobe (HTTP Dynamic Streaming) and Microsoft (Smooth Streaming). Also the MPEG-DASH standard by the MPEG group is already supported by a number of players and streaming providers.

Beyond the Cloud

Copyrighted Streaming, i.e. DRM

Live streaming video is incredibly effective and low-cost way to distribute the content to a wide audience, top streaming players have spent years and millions of dollars building out their streaming infrastructure to beam on-demand content across the web and support live streams of sports and other events. They've also had to figure out how to distribute that video to an increasingly connected landscape of varying devices and screens.

The emergence of multiple OTT players and the fight to acquire content, monetization and market leadership demanded Digital Rights Management (DRM) by which a set of technologies aimed at protecting digital content from unauthorized reproduction after sale via implementation of a simple content scrambling to advanced encryption of video streams and even more sophisticated algorithms using strong math to verify the digital signatures of video content.

Basically, DRM enforces a business model of selling a digital copy per user or per device. Obviously, monetization of Video¬ On¬ Demand service subscriptions implies the use of a DRM or custom encryption solution. This is used to protect content by making it exclusively available to subscribers. Almost every modern delivery protocol supports video stream encryption, including HLS, MPEG¬DASH and RTMP. Some delivery protocols (HLS, MPEG¬DASH) are natively supported by video playback devices (Smart TV, Apple iPhone, etc.), other protocols (RTMP) sometimes require custom video players. In any case most of the playback solutions support DRM.

Beyond the Cloud

In Summary,today video streaming rose above all other traffic types over the internet. While providing this service with a high quality is the most challenging task, In new multimedia based networks, new challenges move from technology-oriented services to user-oriented services which prove the importance of QoE. (Read, Quality of Experience) that is all about the user experience. For users, qualitative perception differs from one user to another and service providers affront difficulties to transform the qualitative values into the quantitative one since video has some fundamental differences in transmission requirements compared with traditional data traffic, such as the fact that there are real-time constraints in the delivery of video traffic to the client player and depends on lot of external factors such as shortest path of network/internet, available bandwidth, device on which the user ‘plays’ the stream etc besides the Codes and Bitrates that too influence the QoE.

Streaming services started as an add-on to DVD and digital download offerings with a trickle of second-run movies and TV shows. They were alternatives to the programs / episodes we missed watching on TV. But speedier internet connections and an abundance of video streaming devices have accelerated the decline of traditional cable. More and more viewers are cutting the cord entirely in favor of dedicated streaming alternatives and OTA players, the world of video content is and will dominate the share of eyeballs.

The vision for the future of social network integration expressed by many researchers condenses down to the next generation web products should allow users to build their own playlists, to form user context-based playlists semi¬-automatically and aggregate different video resources. In addition to this, video content must be available anytime anywhere, on mobile and desktops or smart devices, a choice for users and ease of access from anywhere, anytime and any-ware. What Say?


Dec 2019. Compilation from various publicly available internet sources, authors views are personal.