Levi 的个人资料Nothing :: Maybe Future日志列表留言簿更多 工具 帮助

日志


8月18日

[Z]WinXP下编译ffmpeg

WinXP下编译ffmpeg
    ——转自firebolt2002&&小土人的专栏
 
  今天开始编译ffmpeg这个让人爱恨交加的东西了。由于VC++并非是一个标准的C编译器,所以ffmpeg使用的C99以及at&t汇编语法在VC下无法编译通过,我曾经从ffmpeg中提取过h263++的编码器,在不使用mmx优化选项的情况下,对C代码进行少许的改动还是能够编译通过和正常使用的,但是没有MMX优化的东西使用起来还是不爽的。
 
  采用MINGW在window平台下编译ffmpeg是个不错的选择,当然如果你有兴趣可以把它完全移植到VC环境下,我曾经的老板这么做过,(这是他的命根子,我连看一看的机会都没有,呵呵)据说移植了百万行代码。不过我怎么没发现有那么多呢。不过在VC下编译ffmpeg 会比用Mingw编译小很多,好像是500多K.。
 
  下面就说说用Mingw如何编译ffmpeg 吧。
 
  第一步:应该是安装和配置编译环境,我的做法比较简单,首先我去下载了一个DEV-CPP编译器,它包含了最新的Mingw。如果你没有DEV-CPP,你可以直接下载并安装Mingw.下载地址是http://www.mingw.org/。我安装的版本是MinGW-3.1.0-1.exe,请顺手下载一个MSYS-1.0.10.exe它是windows下的一个虚拟linux shell环境,我们需要用它来执行 configure 、make 等命令。首先安装 MinGW到一个指定的目录,假设在c:\MinGW目录下吧。然后设置环境变量,在PATH中加入 c:\MinGW\BIN。然后安装MSYS ,安装的过程中会提示你进行一些配置,其中需要提示你输入MinGW 的安装目录。这里需要注意一下,请输入 C:/MinGW(搞不懂,我输入C:\MinGW居然告诉我找不到,呵呵,可能是开发者玩惯LINUX了吧)如果是安装了DEV-CPP那就输入C:/dev-cpp即可。哦,差点忘了,您安装MinGW的时候别忘了下载一个WINAPI的源码包,如果安装DEV-CPP就不用了。哎,建议你还是安装个DEV-CPP吧,这个小东西还是很不错的,需要的东西基本上都有了。好了,编译环境配置好了。
 
  第二步:下载一个最新的ffmpeg源码包,我是通过CVS下载的,具体你可以看看ffmpeg的首页http://ffmpeg.sourceforge.net/index.php,通过CVS下载至少能保证你能得到最新的代码,ffmpeg的h264更新好像挺快的,呵呵。将ffmpeg源码解压缩到一个目录里,或者check-out 到一个目录里,在来个假设吧,D:\FFMpeg,运行桌面上的MSYS快捷方式,进入MSYS shell 里(如果安装没有创建快捷方式,晕我就不罗嗦了,安装目录里找MSYS.bat)。运行如下命令:
    1、$ cd /d/ffmpeg 进入ffmpeg工程目录.
    2、$ ./configure --enable-shared --enable-memalign-hack --disable-debug --enable-small
    3、$ make
 
  可以了,这是编译结束了,你可以在相应的目录里找到编译好的文件。如果不加--disable-debug --enable-small目标文件可能会很大,我的7M多。我下载的ffmpeg最新版本里有一个错误,在libavcodec目录下的snow.c文件的第3460行,pict->quality= ff_rate_estimate_qscale(&s->m);错误提示参数不够,我改成pict->quality= ff_rate_estimate_qscale(&s->m,0);编译顺利通过。
 
  问题1、目标文件怎么会这么大呢?(其实没啥关系,make install 后会进一步缩小,大概avcodec.dll 1.6M)如果制作网络下载的CAB包可能会大了点。没关系,明天想其他办法了。
 
  问题2、为啥导出这么多函数呢? 可能是我没仔细看MinGW的使用说明吧。有朋友知道MinGW如何制定导出函数麻烦能告诉我一下,谢谢。
 
今天就写到这里吧。明天见。
5月19日

H.264 群资料

H.264 群资料

 

★★★群成员须知★★★  
千古一心
  10.11 23:09
返回群论坛
1、本群基于相互帮助而建。希望大家对别人的问题热心回答,不要保留。本群不鄙视新人,但鄙视懒人;

2、本群要求成员积极发言或者在BBS中发表FAQ。刚接触H.264的成员,虽然不能在学术上参加讨论,但你总有很多问题,而随着学习的深入,你总能够得到答案,所以就要求你把问题和问题的答案以FAQ贴的形式发表出来。这样你也能为群做一点力所能及的贡献。发FAQ贴的具体方式为:回复群BBS里主题为“H264_FAQ集锦”的帖子,把自己遇到的问题写在回帖中。等到自己知道答案后要主动修改自己的FAQ帖,把答案写在自己的FAQ贴中(也就是把你的问题和问题答案写在同一篇帖子里,而不要另起新帖)。对于FAQ贴,即使是非常基础的问题也可以。例如:X.264与H.264的关系。但注意:要自问自答!

3、本群在人数达到上限时候会清理不积极发言、发贴者。新加入的成员可能被误开除。请不要生气,再次申请加入。注意:所发FAQ贴不能为其他人已经发过的内容或相似内容;虽然发了FAQ帖,但未做解答者也将被开除(即使别人已经做了解答,也要求该成员自己把答案按第三条款规定的方式写入自己的FAQ贴)。

4、本群规定成员在发言时只能使用不大于10号(即五号)的字体,以免影响相互浏览发言信息。因为如果你的发言字体过大,可能整个屏幕全部被你的信息所覆盖。如果想让自己的发言醒目些,你可以更改发言字体的颜色;

5、在本群的讨论话题希望始终与H.264技术相关。除在休息时间(11:30~13:30,18:00~19:00,22:00以后)可以放松外,对于在群中发言闲聊调侃、废话连篇、肆意捣乱、无理取闹者,本群将坚决开除之;

6、对于讨论的非264技术的内容,本应提倡。但因为群人数有限。不得已规定:不讨论264内容的人也将在开除之列。请这些朋友自己去另外建群,或者加入与其关心内容相关的群。实在抱歉!

7、本群的其他交流平台,希望大家积极参与:
   http://www.3g918.com.cn/forumdisplay.php?fid=44
   http://bbs.edw.com.cn/index.asp?boardid=47
   http://www.avs264.com
   QQ群:10851606
8、本群 FTP 地址:

    ftp://202.120.37.193  (教育网1)
  ftp://59.77.17.218  (教育网2)
  ftp://61.233.40.219  (公   网) 

   〖levi:用户名及密码请进群论坛查询。〗

9、本群 WEB 共享空间:
   http://download.3g918.com.cn/h.26x

10、本群规定不允许任何人在群聊天中公布本群的 FTP 账号。如果有成员问此问题,直接让他看《群成员须知》。这样才能让他主动了解《群成员须知》的规定;

11、传到群共享的文件,请大家不要删除。我们的管理人员会不定期转移到FTP上。因为大家学习有先有后。以前传的资料对后来的成员可能还有用。谢谢大家支持和配合!

   注意:
    (1)、对于从FTP上下载后不能打开或打开有错误的资料,请从速与管理员联系。以方便更正资料。
    (2)、有时候清理人员可能有误开除的情况。请被误开除的成员不要生气,重新申请加入。谢谢谅解!

【感谢 Striving(435?????) 为本群提供教育网2 FTP 空间】
【感谢 白水(290??????) 为本群提供教育网1 FTP 空间】
【感谢 视高天空(4?????) 为本群提供公网 FTP 空间】
【感谢 Wayne(165?????)为本群提供 WEB 共享空间】

1月24日

Rate Control and H.264

 
 
Rate Control and H.264

Concepts

A rate control algorithm dynamically adjusts encoder parameters to achieve a target bitrate. It allocates a budget of bits to each group of pictures, individual picture and/or sub-picture in a video sequence. Rate control is not a part of the H.264 standard, but the standards group has issued non-normative guidance to aid in implementation. The purpose of this white paper is to offer 1) a basic understanding of what rate control is and why it is essential and 2) a common framework and terminology so that schemes originating from H.264 and other standards groups can be more easily understood and compared.

Block-based hybrid video encoding schemes such as the MPEG [
1,2] and h.26* [3] families are inherently lossy processes. They achieve compression not only by removing truly redundant information from the bitstream, but also by making small quality compromises in ways that are intended to be minimally perceptible. In particular, the quantization parameter QP regulates how much spatial detail is saved. When QP is very small, almost all that detail is retained. As QP is increased, some of that detail is aggregated so that the bit rate drops – but at the price of some increase in distortion and some loss of quality. Figure 1 suggests that relationship for a particular input picture – if you want to lower bit rate, you can do so by lowering QP at a cost of increased distortion. Figure 2 suggests that as source complexity varies during a sequence, you move from one such curve to another.

Figure 1. For a particular source frame

Figure 2. But when source complexity varies….


Figure 3 illustrates open loop (or VBR) operation of a video encoder. The user supplies two key inputs – the uncompressed video source and a value for QP. As the source sequence progresses, you will get compressed video of fairly constant quality, but the bitrate may vary dramatically. Because the complexity of pictures is continually changing in a real video sequence, it is not so obvious what value of QP to pick. If you fix QP for an "easy" part of the sequence having slow motion and uniform areas, then the bit rate will go up dramatically when you reach the "hard" (i.e., more complex) parts.

In reality, constraints imposed by decoder buffer size and network bandwidth force us to encode video at a more nearly constant bitrate. To do this, Figure 4 suggests that we must dynamically vary QP based upon estimates of the source complexity, so that each picture (or group of pictures) gets an appropriate allocation of bits to work with. Rather than specifying QP as input, the user specifies demanded bitrate instead.

Figure 3. Open Loop Encoding (VBR)

 

 

Figure 4. Closed Loop Rate Control (CBR)

Elements of H.264 Rate Control

With a focus on the recommended approach [4, 5, 6] for H.264, Figure 5 identifies important elements within the rate controller. Most of these elements are common to other rate control schemes. Note that Figure 5 is conceptual and is not a literal representation of any software implementation. Many details are glossed over – for example, that B and P pictures are treated differently, and that some estimates are averages of sampled data over multiple pictures.

Figure 5. Elements of H.264 Rate Controller

Rate-Quantization Model

The heart of the algorithm is a quantitative model describing Figure 2 the relationship between QP, actual bitrate and a surrogate for encoding complexity. However, the bits and complexity terms should be associated only with the residuals. Why?? Because the quantization parameter QP can only influence the detail of information carried in the transformed residuals. QP has no direct effect on the bitrates associated with overhead, prediction data, or motion vectors. The Mean Average Difference (or MAD) of the prediction error is used for this purpose.

The model takes an algebraic form such as

ResidualBits = C1 * MAD / QP + C2 * MAD / QP

but it may take a simpler form (with C2 = 0) or a more complicated form involving exponentials or other basis curves for fitting. This equation [note that our term ResidualBits is synonomous with the term Texture Bits used by other authors [2]] corresponds to equation 2-84 of [6] and to equation 1 of [2]. The free coefficients C1 and C2 may be estimated empirically, by providing hooks in the encoder for extracting the residual coefficients, as well as the number of residual bits needed to transmit them.

Having established the model in (2), we can solve for the demanded QP when the target value of ResidualBits is supplied by the Bit Allocation modules in Figure 5.

Complexity Estimation

As indicated above, we need a simple metric that reflects the encoding complexity associated with the residuals. The MAD of the prediction error is a convenient surrogate for this purpose:

This MAD is an inverse measure of predictor's accuracy and (in the case of interprediction) the temporal similarity of adjacent pictures.

Ideally, the MAD would be estimated after encoding the current picture, but that would require us to encode the picture again after the QP is selected – quite a burden for a computationally intensive standard like H.264! Instead, we can usually assume that this complexity surrogate varies gradually from picture to picture, and estimate it based upon data extracted from the encoder for previous pictures. Note that this assumption fails at a scene change.

QP-Limiter

Figures 4 and 5 represent a closed loop control system which must be appropriately damped to guarantee stability and to minimize perceptible variations in quality. For difficult sequences having rapid changes in complexity, QP-demand may oscillate noticeably, so a rate limiter is applied which typically limits changes in QP to no more than ± 2 units between pictures.

Virtual Buffer Model

Any compliant decoder is equipped with a buffer to smooth out variations in the rate and arrival time of incoming data. The corresponding encoder must produce a bitstream that satisfies constraints of the decoder, so a virtual buffer model is used to simulate the fullness of the real decoder buffer.

The change in fullness of the virtual buffer is the difference between the total bits encoded into the stream, less a constant removal rate assumed to equal the bandwidth (or demanded bitrate). The buffer fullness is bounded by zero from below and by the buffer capacity from above. The user must specify appropriate values for buffer capacity and initial buffer fullness, consistent with the decoder levels supported.

QP Initializer

QP must be initialized upon start of video sequence. An initial value may be input manually, but a better approach is to estimate it from the demanded bits per pixel, i.e.,
DemandedBitsPerPixel = DemandedBitrate / (FrameRate * height * width)
Equation 2-67 of [6] provides a recommended table relating initial QP to DemandedBitsPerPixel.

GOP Bit Allocation

Based upon the demanded bit rate and the current fullness of the virtual buffer, a target bit rate for the entire group of pictures (GOP) is determined, and QP for the GOP's I-picture and first P-picture is also determined.

The GOP Target is fed into the next block for detailed bit allocation to pictures or to smaller basic units.

Basic Unit Bit Allocation

The "Basic Unit" is useful terminology introduced in [4], which is the basis for H.264 rate control recommendations [6]. With this approach, scalable rate control may be pursued to different levels of granularity – such as picture, slice, macroblock row or any contiguous set of macroblocks. That level is referred to as a "basic unit" at which rate control is resolved, and for which distinct values of QP are calculated.

If the basic unit is smaller than a picture, then this block in Figure 5 actually breaks out into two layers – one for the picture itself and another for the basic unit. Figure 5 and our discussion are limited to the case where the picture itself is the basic unit. For details on how to treat smaller basic units, please see [5] or [6].

For H.264, the emphasis is on computing QP for each stored picture (usually a P-picture)[Strictly speaking, the H.264 standard allows B pictures to be used as reference pictures. However, this is not expected to be common usage.]. The QP's for non-stored pictures (ordinarily B-pictures) are then interpolated (and offset) from QP values for their neighboring P pictures. First, considering the MAD of the picture, one can determine a target level for the buffer fullness. Then using the buffer target level, it is easy to calculate the target bits for the picture.

Comparison with MPEG-2 (Test Model 5) Rate Control

Because of the influence and familiarity of MPEG's Test Model 5 rate control [7], it is useful to compare its similarities and differences with the H.264 approach. To do so, we transmogrify Figure 5 into Figure 6, which corresponds conceptually to the MPEG2/TM5 approach.

Figure 6. Comparison to MPEG2 Test Model 5


Similarities include the use of the virtual buffer model, the calculation of layered bit targets for the GOP and picture, and the overall goal of generating a quantization parameter (in this case, called Mquant) for a basic unit. The Mquant for the basic unit (always a single macroblock) is adjusted in proportion to its estimated complexity.
Differences include:

? The Basic Unit is always the macroblock in this scheme. It is possible to get significant variations of quantization parameter across different macroblocks in the same picture
? Differences between I, P and B picture types arise in the allocation of target bits. Otherwise, they are treated similarly.
? MPEG-2 does not have the same multiplicity of prediction modes. In the absence of advanced intra prediction, it need not be so rigorous in relating quantization parameter (which controls residual quality) to measured properties of the residual itself.
? Macroblock-level spatial complexity is estimated from the source activity, regardless of whether the complexity is handled by transmitting motion vectors (inter-prediction) or residual coefficients.
? Allocation of bits to a picture considers the picture type, GOP structure and demanded bitrate, but not the picture's measured complexity. However, within the picture, the buffer fullness and relative spatial activity of each macroblock is used to allocate the picture bits among the macroblocks.

It is easy to recognize this Test Model 5 approach as an ancestor of the H.264 approach, which accommodates the more general prediction methods of H.264 and provides more flexibility to scale the granularity of control.

H.264 Rate-Distortion Optimization and Global Rate Control

H.264 provides 7 modes for inter (temporal) prediction, 9 modes for intra (spatial) prediction of 4x4 blocks, 4 modes for intra prediction of 16 x 16 macroblocks, and one skip mode. Each 16 x 16 macroblock can be broken down in numerous ways. Thus, mode selection for each macroblock is a critical and time-consuming step that enables much of the dramatic bitrate reduction.

Selection of the optimal mode is done by an algorithm called rate-distortion optimization (RDO) [8], which essentially involves 1) an exhaustive pre-calculation of all feasible modes to determine the bits and distortion of each; 2) evaluation of a metric that considers both bitrate and distortion; and 3) selection of the mode that minimizes the metric.

QP is input to the RDO process, which does not regulate QP or modify the quality of the residual coefficients. RDO is complementary to rate control; these two aspects of the problem are decoupled because a fully coupled optimization would require a more expensive iterative solution.

The interplay with RDO, described in [4] as a "chicken and egg" dilemma, influences implementation of a rate control algorithm. The MAD is needed by the rate control algorithm, but it is available only after the RDO has used a QP value to generate it. Thus, the rate control algorithm must use an estimate for MAD based upon complexity of prior pictures in the sequence.

ExpertH264 Implementation of Rate Control

PixelTools has implemented the H.264 rate control recommendations in a recent release of ExpertH264. For this release, we have provided picture level control without frame skip. Especially for offline applications for encoding to stored media, this algorithm provides excellent tracking of bitrates for GOPs of a wide variety of sizes.

Typical results track GOP bitrate within 1% without B pictures or 2-3% with B pictures, with good stabilization of QP to prevent noticeable swings in quality. You can try this for yourself by requesting a free demo of ExpertH264 from PixelTools Corporation.

In subsequent releases, we plan to allow flexibility for smaller basic units, which will allow closer bitrate tracking on the individual picture level, as well as for smaller virtual buffer capacities. We will also support both frame skip and stuffing bits in a subsequent release – depending upon the end requirements, use of one or both of these techniques will reduce variations in bitrate.

The algorithm is a separate module having several interfaces that can be called by the encoder, and with callbacks to the encoder for retrieving key information such as residual bits and residual coefficients. Construction of the complexity metric (i.e., prediction error MAD) is part of the rate control algorithm. C Interfaces and utility functions include:

? init_rateControl ? frameRateControl ? updateBFrameState
? initRateControlParams ? getQB ? getMbMAD
? gopRateControl ? updateModel ? initialQP


Thus, developers of hardware and software encoders can consider integrating this algorithm into their own environments. For example, after the encoding step, a call to updateModel refreshes the empirical coefficients such as C1 and C2 in equation (2). Similarly frameRateControl is called prior to encoding each picture and supplies the quantization parameter.

Terminology

The following glossary is intended to help with a common understanding of rate control issues.

Prediction. Both H.264 and MPEG-* may predict a macroblock by traditional inter (temporal) prediction, i.e., a motion estimation from previous reference pictures followed by transmission of the motion vector. Additionally, H.264 supports advanced intra (spatial) prediction of a macroblock from encoded values for neighboring pixels that have already been encoded (e.g., in raster-scan order).

Residual. The difference between the source and prediction signals is called the residual, or the prediction error. A spatial transform is then applied to the residual to produce transformed coefficients that carry any spatial detail that is not captured in the prediction itself or its reference pictures.

Distortion. Distortion refers to the difference between the original source image x, and the reconstructed image y after it has been decoded. In H.264, sum of squared difference is used to quantify distortion as (1/N) i |yi – xi |2, for any set of N pixels.

Complexity. As the saying goes, I can't define complexity, but I know it when I see it! A single source picture is complex if it is "busy" and has lots of spatial detail. The term spatial activity is synonymous with source complexity for this case. However, for a video sequence, the meaning of complexity is, well, more complex! For example, if a video sequence consists of one busy object that translates slowly across the field of view, it may not require very many bits because the temporal prediction can easily capture the motion using a single reference picture and a series of motion vectors. It is difficult to define an inclusive video complexity metric that is also easy to calculate. See MAD

MAD: Mean Absolute Difference of Prediction Error. For rate control, what is more important is the encoding complexity of the residuals that are left over after the inter or intra prediction process is finished. The Mean Absolute Difference of Prediction Error is usually closely related to encoding complexity. Suppose xi is the source value for ith pixel, then:

Spatial Activity. This term is used to quantify the amount of spatial variation within a part of the picture, normally a block of N pixels. Suppose the N pixel values xi, i = 1,..,N. Then the activity for those N pixels is: (1/N) i (xi – <x> )2, where <x> = (1/N) i xi. In other words the spatial activity is the sample variance of a block's values. It is the measure for local complexity used in MPEG-2.

Bitrate. Bitrate refers to the bits per second consumed by a sequence of pictures, i.e., bitrate = (average bits per picture) / (frames per second). In practice, it is equated to the reliable network bandwidth that is provisioned or available for the stream.

Quantization Parameter (QP). Residuals are transformed into the spatial frequency domain by an integer transform that approximates the familiar Discrete Cosine Transform (DCT). The Quantization Parameter determines the step size for associating the transformed coefficients with a finite set of steps. Large values of QP represent big steps that crudely approximate the spatial transform, so that most of the signal can be captured by only a few coefficients. Small values of QP more accurately approximate the block's spatial frequency spectrum, but at the cost of more bits. In H.264, each unit increase of QP lengthens the step size by 12% and reduces the bitrate by roughly 12%.

Group of Pictures (GOP). The Group of Picture concept is inherited from MPEG and refers to an I-picture, followed by all the P and B pictures until the next I picture. A typical MPEG GOP structures might be IBBPBBPBBI. Although H.264 does not strictly require more than one I picture per video sequence, the recommended rate control approach does require a repeating GOP structure to be effective. Thus, H.264 rate control will not work properly if the IntraPeriod parameter is set to 0.

Basic unit. The authors of references [4] and [5] introduced this useful term that expresses the granularity on which QP is adjusted in the feedback control loop. If the basic unit is a picture, then the rate controller's adjustments to QP are uniform across the picture. In MPEG-2, the basic unit is a macroblock. Initially, most H.264 applications will probably use the picture as basic unit, but ultimately a full or partial row of macroblocks is expected to yield the best compromise between uniform bitrate and uniform quality.

Summary

This white paper presents the basics of rate control for H.264 and compares them to the Test Model 5 approach of MPEG-2. Implementers needing a detailed description of the algorithm should see [5] or [6]. The structure shown in our Figure 5, the discussion of its modules, and the terminology glossary should provide a useful companion to help in understanding the densely packed equations found in these references.

References

1. C. Poynton, Digital Video and HDTV, Elsevier Science 2003, pp. 491-2
2. A. Vetro, "MPEG-4 Rate Control for Multiple Video Objects," IEEE Transactions on Circuits and Systems for Video Technology," Vol. 9, No. 1, February 1999
3. G. Sullivan, T. Wiegand and K.P. Lim, "Joint Model Reference Encoding Methods and Decoding Concealment Methods; Section 2.6: Rate Control" JVT-I049, San Diego, September 2003
4. Z. Li et al., "Adaptive Basic Unit Layer Rate Control for JVT," JVT-G012, 7th Meeting: Pattaya, Thailand, March 2003
5. Z. Li et al., "Proposed Draft of Adaptive Rate Control," JVT-H017, 8th Meeting: Geneva, May 2003
6. G. Sullivan, T. Wiegand and K.P. Lim, "Joint Model Reference Encoding Methods and Decoding Concealment Methods; Section 2.6: Rate Control" JVT-I049, San Diego, September 2003
7. MPEG 2 Test Model 5, Rev. 2, Section 10: Rate Control and Quantization Optimization, ISO/IEC/JTC1SC29WG11, April 1993
8. T. Wiegand, H. Schwarz, A. Joch, F. Kossentini and G. Sullivan, "Rate-Constrained Coder Control and Comparison of Video Coding Standards," IEEE Transactions on Circuits & Systems for Video Technology, 13, #7, July 2003

? 2003 PixelTools

Please contact PixelTools Corporation to discuss your specific needs for video compression and application delivery.