Media streams transcoding with Wowza Transcoder Addon

2015-11-29 20:34:00

Why do you need transcoding (conversion of parameters) of an IP stream.

The IP source stream generated on the side of the encoder or signal receiver is usually of high quality, but it is usually not suitable for transmission to end-user devices via the IP network. This is due to the fact that all subscribers have receivers that differ in the available network bandwidth, type of compression, screen resolution. To ensure guaranteed viewing of the delivered content, it is necessary to transcode the stream.

Wowza Streaming Engine media server supports transcoding (and transrating) of IP streams of all the most commonly used formats. The transcoding process is a conversion of the type of compression (for example, MPEG2 video to H.264 and MPEG1 audio to AAC), transrating allows you to convert the bit rate and resolution without changing the type of compression.

How the stream is processed.

There are a large number of both software and hardware encoders on the market today. Wowza transcoding is based on software compression, but it supports stream processing using additional devices and technologies that can significantly offload the central processor and use it for other tasks.

A special Wowza Transcoder Addon module, which is an important part of Wowza Streaming Engine, is responsible for signal conversion. Video stream processing can occur:

  • By CPU power. This is the most resource-intensive way to generate streams.
  • Using a video adapter built into the CPU with support for Quick Sync technology.
  • Using nVideo discrete video adapter supporting NVENC technology;

The obvious fact is that the more powerful the platform, the more threads it can generate. At first glance, it seems that the cost is the same. The more expensive the components are, the more efficiently they will work. Is this really so, we will consider in this article. For comparison, we will be guided by the official data provided by Wowza Media Systems on its forum ( https://www.wowza.com/forums/content.php?332-Wowza-Transcoder-AddOn-Performance-Benchmark ).

Testing took place as follows:

Wowza Transcoder receives from 1 to 8 streams as a source. Each source must be transcoded into several streams, which may differ in resolution and output bitrate. Server platforms based on Intel Xeon E3 Series, dual-processor systems based on Intel Xeon X5650 and Amazon EC2 Instance computing platform rental were considered as a hardware platform.

3 types of tests were done:

  • Transrating stream 720p;
  • Transrating stream 1080p;
  • Transcoding of MPEG2 720p stream.

Specifications of server platforms are presented in the table below.


 

Server platform

Operating system

GPU acceliration

1

  • Processor Single Intel Xeon CPU E3-1285 V3 @ 3.60GHz , 4 cores
  • Motherboard Supermicro X10SAE
  • Random access memory 32 Гб

MS Windows 7 64bit Ultimate

No

2

Intel HD 4600 (Intel Quick Sync)

3

NVIDIA QUADRO K5000 (NVENC)

4

  • Processor Single Intel Xeon CPU E3-1285 V3 @ 3.60GHz , 4 cores
  • Motherboard Supermicro X10SAE
  • Random access memory 32 Гб

Ubuntu 12.04 (64-bit)

Нет

5

Intel HD 4600 (Intel Quick Sync)

6

NVIDIA QUADRO K5000 (NVENC)

7

  • Processor Dual Intel Xeon CPU X5650 @ 2.66GHz , 12 cores total
  • Motherboard SuperServer 7046GT-TRF 4U Xeon DP 4xGPU Ready
  • Random access memory 32 Гб

MS Windows 8.1 64bit

No

8

NVIDIA QUADRO K5000 (NVENC)

9

Ubuntu 12.04 (64-bit)

No

10

NVIDIA QUADRO K5000 (NVENC)

11

  • EC2 Instance: Extra Large Instance - m1.xlarge (EC2 Instance: Extra Large Instance - m1.xlarge)
  • Random access memory 15 Гб

Amazon Linux

No

12

  • High-CPU Extra Large Instance - c3.8xlarge (32 virtual cores with 3.37 EC2 Compute Units each)
  • Random access memory 60 Гб

Amazon Linux

No

The developers of Wowza Media have carried out 3 tests with the following parameters of the source signal: 720p h.264, 1080p h.264 and 1080p MPEG2.

For testing 720p h.264 signal, the following input stream parameters were selected:

  • Video compression: H.264
  • Frame size: 1280x720 pixels
  • Frames frequency: 24 frames/sec
  • Video bitrate: 5.588 Mbit/sec
  • Audio compression: AAC
  • Audio frequency: 48 kHz
  • Number of audio channels: Stereo
  • Audio bitrate: 97 kbit/sec.

At the outputs, you need to get from 1 to 8 streams 720p, 360p, 240p, 160p. The graphs below should be understood as follows: the horizontal bar shows the number of input streams, each of which is encoded in 720p, 360p, 240p and 160p. That is, if there are 8 different streams at the input, then their output will be 8x4 = 32 streams with different parameters.

If you analyze the resulting graphs (for a start, Figure 1), you can come to very interesting conclusions. For example, signal processor processing is the least efficient, the Intel Xeon E3-1285V3 processor is not able to generate more than 4 threads for Windows 7. At the same time, its load increases more than 75%, which is unacceptable, since the probability of a critical situation and the possibility of failure in work increases greatly. This is quite a natural result, since the processors of this series have 4 cores and do an excellent job with one thread of calculations, but behave much worse when performing more parallel operations.




Figure 1. Results of transrating for platform 1-6 for an initial stream of 720p.

The most interesting are the results of work based on the Intel Quick Sync technology built into the processor. The graph is a straight line, that is, the load increases smoothly and evenly, which indicates high stability of work. The load is always normal. The only serious drawback is the lack of capacity to increase capacity. As soon as the resource is exhausted, it remains either to change the processor to a more modern one (although its effectiveness still needs to be checked), or to purchase an additional server. There are no platforms that support multiple CPUs with QuickSync. But in many cases this option is very attractive in terms of price, efficiency, reliability, and energy consumption.

The use of GPU coding based on the NVIDIA QUADRO K5000 video card shows a very high margin of system performance. No problem handling up to 8 input streams. The load increases relatively stably and is normal, that is, the occurrence of a critical situation is unlikely. Unfortunately, installing a second video card will not allow doubling the number of processed threads, but it is quite possible to increase performance by another 15-20%. Do not forget that today the K5000 board is not advanced, there is an updated K5200 on the market, as well as the latest model of the M5000 line. The likelihood that 2 video cards of the latest generation will give a corresponding performance boost is very high.

Figure 2. Results of transrating for platform 7-8 for an initial stream of 720p.

The graph in Figure 2 allows us to draw similar conclusions. CPU processing is still the least efficient, but using GPUs gives excellent results. It would be logical to assume that using a more powerful graphics adapter will give even more outstanding results and will allow you to install several cards on one platform.



Now let's move on to transrating the 1080p signal with the following input stream parameters:
  • Video compression: H.264
  • Frame size: 1920x1080 pixels
  • Frames frequency: 24 frames/sec
  • Video bitrate: 9.7208 Mbit/sec
  • Audio compression: AAC
  • Audio frequency: 48 kHz
  • Number of audio channels: Stereo
  • Audio bitrate: 97 kbit/sec.

Let's go in the same order as before. Since both the resolution and bitrate of the stream have increased, the results are promising to be interesting.

What happened can be seen from the graphs in Figures 3-4. The results were identical with regard to the choice of the solution and the distribution of the load. But it is worth paying special attention to the fact that when the bitrate and resolution of the original stream increase, the load also increases, but as a result, none of the configurations is able to recode 8 streams, as it was in the previous tests for 720p. The best performance result was shown by a cloud service from Amazon, and from server platforms a 2-processor configuration using QUADRO K5000.



Figure 3. Transrated results for platform 1-6 for the original 1080p stream.

Figure 4. Transrating results for platform 7-8 for the original 1080p stream.

The increase in load and the choice of processor power for transrating are obvious

And now let's move on to the results of> MPEG2 to H.264 transcoding. One should hardly expect fundamentally different indicators if we consider the performance of the configuration relative to each other. However, it is important to verify this.



At the entrance we have streams with the following parameters:
  • Компрессия видео: MPEG-2
  • Размер кадра: 1280x720 пикселей
  • Частота кадров: 23.98
  • Видео битрейт: 3.0 Мбит/сек
  • Компрессия аудио: MPEG-1 Layer 2
  • Частота аудио сигнала: 48 кГц
  • Количество аудио каналов: Стерео
  • Аудио битрейт: 128 кбит/сек.

So, the graphs below fully confirm our assumption.



Figure 5. Transcoding results for platform 1-6 for the original MPEG2 720p stream.

Figure 6. Transcoding results for platform 7-8 for the original MPEG2 720p stream.

Figure 7 shows the frame sizes for basic resolutions. The picture gives a very clear idea of how the processing power will be spent when the frame size is increased. This should be taken into account when choosing a platform and calculating the required capacities. After all, at first glance, the end consumer will be satisfied with both 720p and 1080p. However, the computing power will require 2 times more. It is worth seriously thinking about how rationally to work with maximum quality, and whether it fits the budget.

Figure 7. Visual representation of frame size comparison for generally accepted image standards.

If we summarize all of the above and just take a close look at the data obtained, we will get the following conclusions:

  1. When you need to transcode multiple streams, CPU processing is the least efficient method. The number of streams transcoded by the system strongly depends on the total number of cores and the clock frequency of the processor subsystem. That, in turn, may require unjustified financial costs.
  2. Processors with integrated graphics based on QuickSync technology can be used to create a scalable transcoding system. This will save both operating resource and power consumption as well as financial costs compared to using NVIDIA cards. In fact, this is the most cost-effective option. Workstations, rack servers, or even Blade server systems based on the C226, C236, C246 chipsets.
  3. In the case when the number of source streams is large, and the possibilities for placing equipment are limited, the use of video cards with support for NVENC encoding technology will give the best result. GPU encoding will allow you to get a system that can cope with dozens of threads without any problems, while maintaining a resource reserve. It is possible to build fairly compact transcoders, for example on Supermicro platforms, which are specifically designed to accommodate the maximum number of GPUs per unit of volume. Unfortunately, the cost of the final solution will be very big.
  4. Of course, you can make a huge number of different conclusions about where Linux is better, and where Windows is, how you can win an extra 5-7% of performance somewhere, and so on. These will all be fairly rough calculations.
  5. In the event that you need to temporarily improve performance, you can rent cloud services (Amazon). But this is advisable only if the need arises rarely, it is not constant, or the urgency is very high.