2015-11-29 20:34:00
Why do you need transcoding (conversion of parameters) of an IP stream.
The IP source stream generated on the side of the encoder or signal receiver is usually of high quality, but it is usually not suitable for transmission to end-user devices via the IP network. This is due to the fact that all subscribers have receivers that differ in the available network bandwidth, type of compression, screen resolution. To ensure guaranteed viewing of the delivered content, it is necessary to transcode the stream.
Wowza Streaming Engine media server supports transcoding (and transrating) of IP streams of all the most commonly used formats. The transcoding process is a conversion of the type of compression (for example, MPEG2 video to H.264 and MPEG1 audio to AAC), transrating allows you to convert the bit rate and resolution without changing the type of compression.
How the stream is processed.
There are a large number of both software and hardware encoders on the market today. Wowza transcoding is based on software compression, but it supports stream processing using additional devices and technologies that can significantly offload the central processor and use it for other tasks.
A special Wowza Transcoder Addon module, which is an important part of Wowza Streaming Engine, is responsible for signal conversion. Video stream processing can occur:
The obvious fact is that the more powerful the platform, the more threads it can generate. At first glance, it seems that the cost is the same. The more expensive the components are, the more efficiently they will work. Is this really so, we will consider in this article. For comparison, we will be guided by the official data provided by Wowza Media Systems on its forum ( https://www.wowza.com/forums/content.php?332-Wowza-Transcoder-AddOn-Performance-Benchmark ).
Testing took place as follows:
Wowza Transcoder receives from 1 to 8 streams as a source. Each source must be transcoded into several streams, which may differ in resolution and output bitrate. Server platforms based on Intel Xeon E3 Series, dual-processor systems based on Intel Xeon X5650 and Amazon EC2 Instance computing platform rental were considered as a hardware platform.
3 types of tests were done:
Specifications of server platforms are presented in the table below.
|
Server platform |
Operating system |
GPU acceliration |
1 |
|
MS Windows 7 64bit Ultimate |
No |
2 |
Intel HD 4600 (Intel Quick Sync) |
||
3 |
NVIDIA QUADRO K5000 (NVENC) |
||
4 |
|
Ubuntu 12.04 (64-bit) |
Нет |
5 |
Intel HD 4600 (Intel Quick Sync) |
||
6 |
NVIDIA QUADRO K5000 (NVENC) |
||
7 |
|
MS Windows 8.1 64bit |
No |
8 |
NVIDIA QUADRO K5000 (NVENC) |
||
9 |
Ubuntu 12.04 (64-bit) |
No |
|
10 |
NVIDIA QUADRO K5000 (NVENC) |
||
11 |
|
Amazon Linux |
No |
12 |
|
Amazon Linux |
No |
The developers of Wowza Media have carried out 3 tests with the following parameters of the source signal: 720p h.264, 1080p h.264 and 1080p MPEG2.
For testing 720p h.264 signal, the following input stream parameters were selected:
At the outputs, you need to get from 1 to 8 streams 720p, 360p, 240p, 160p. The graphs below should be understood as follows: the horizontal bar shows the number of input streams, each of which is encoded in 720p, 360p, 240p and 160p. That is, if there are 8 different streams at the input, then their output will be 8x4 = 32 streams with different parameters.
If you analyze the resulting graphs (for a start, Figure 1), you can come to very interesting conclusions. For example, signal processor processing is the least efficient, the Intel Xeon E3-1285V3 processor is not able to generate more than 4 threads for Windows 7. At the same time, its load increases more than 75%, which is unacceptable, since the probability of a critical situation and the possibility of failure in work increases greatly. This is quite a natural result, since the processors of this series have 4 cores and do an excellent job with one thread of calculations, but behave much worse when performing more parallel operations.
The most interesting are the results of work based on the Intel Quick Sync technology built into the processor. The graph is a straight line, that is, the load increases smoothly and evenly, which indicates high stability of work. The load is always normal. The only serious drawback is the lack of capacity to increase capacity. As soon as the resource is exhausted, it remains either to change the processor to a more modern one (although its effectiveness still needs to be checked), or to purchase an additional server. There are no platforms that support multiple CPUs with QuickSync. But in many cases this option is very attractive in terms of price, efficiency, reliability, and energy consumption.
The use of GPU coding based on the NVIDIA QUADRO K5000 video card shows a very high margin of system performance. No problem handling up to 8 input streams. The load increases relatively stably and is normal, that is, the occurrence of a critical situation is unlikely. Unfortunately, installing a second video card will not allow doubling the number of processed threads, but it is quite possible to increase performance by another 15-20%. Do not forget that today the K5000 board is not advanced, there is an updated K5200 on the market, as well as the latest model of the M5000 line. The likelihood that 2 video cards of the latest generation will give a corresponding performance boost is very high.
Figure 2. Results of transrating for platform 7-8 for an initial stream of 720p.The graph in Figure 2 allows us to draw similar conclusions. CPU processing is still the least efficient, but using GPUs gives excellent results. It would be logical to assume that using a more powerful graphics adapter will give even more outstanding results and will allow you to install several cards on one platform.
Let's go in the same order as before. Since both the resolution and bitrate of the stream have increased, the results are promising to be interesting.
What happened can be seen from the graphs in Figures 3-4. The results were identical with regard to the choice of the solution and the distribution of the load. But it is worth paying special attention to the fact that when the bitrate and resolution of the original stream increase, the load also increases, but as a result, none of the configurations is able to recode 8 streams, as it was in the previous tests for 720p. The best performance result was shown by a cloud service from Amazon, and from server platforms a 2-processor configuration using QUADRO K5000.
The increase in load and the choice of processor power for transrating are obvious
And now let's move on to the results of> MPEG2 to H.264 transcoding. One should hardly expect fundamentally different indicators if we consider the performance of the configuration relative to each other. However, it is important to verify this.
So, the graphs below fully confirm our assumption.
Figure 7 shows the frame sizes for basic resolutions. The picture gives a very clear idea of how the processing power will be spent when the frame size is increased. This should be taken into account when choosing a platform and calculating the required capacities. After all, at first glance, the end consumer will be satisfied with both 720p and 1080p. However, the computing power will require 2 times more. It is worth seriously thinking about how rationally to work with maximum quality, and whether it fits the budget.
Figure 7. Visual representation of frame size comparison for generally accepted image standards.
If we summarize all of the above and just take a close look at the data obtained, we will get the following conclusions: