为视频转编码以及添加音频

博主： Shirou
发布时间：2023 年 10 月 22 日
3578 次浏览
暂无评论
6919字数
分类： tech

为视频转编码以及添加音频

系列仓库地址：https://github.com/xuanhao44/AnimeGANv2

在前篇的 3.3 和 4 中提到 https://www.sheniao.top/tech/191.html，由 OpenCV 的 VideoWriter 导出的视频由于专利问题，并不是原生支持 H264 编码的，而这正是能在浏览器上播放的视频编码格式。于是需要想办法转成这种编码格式。

为此尝试了 issue 里提到的方法，也就是自己编译 OpenCV，再带上 openh264 库，但是很遗憾的失败了。

这次就来尝试其他的办法给视频转编码。

0 服务器

带显卡的服务器：RTX A4000。

1 直接调用 FFMPEG

需要安装 ffmpeg：

sudo apt update
sudo apt install ffmpeg -y

部分代码：

    video_out = cv2.VideoWriter("tmp.mp4", fourcc, fps, (width, height))

    ...(省略)

    # When your video is ready, just run the following command
    # You can actually just write the command below in your terminal

    # https://snipit.io/public/snippets/43806
    # os.system("ffmpeg -i Video.mp4 -vcodec libx264 Video2.mp4")
    # os.system("ffmpeg -i tmp.mp4 -vcodec libx264 " + video_out_path + " -y")

    # https://stackoverflow.com/questions/12938581/ffmpeg-mux-video-and-audio-from-another-video-mapping-issue
    # ffmpeg -an -i tmp.mp4 -vn -i video_path -c:a copy -c:v copy video_out_path
    # os.system("ffmpeg -an -i tmp.mp4 -vn -i " + video_path + " -c:a copy -c:v copy " + video_out_path + " -y")

    # 合成大西瓜！
    os.system(
        "ffmpeg -an -i tmp.mp4 -vn -i " + video_path + " -c:a copy -c:v copy -vcodec libx264 " + video_out_path + " -y")

灵感来源：https://snipit.io/public/snippets/43806
- os.system("ffmpeg -i tmp.mp4 -vcodec libx264 " + video_out_path + " -y")
- 首先是 video_in 处理成 tmp.mp4，然后再通过 FFMPEG 来转编码为 video_out。
添加原视频音轨：https://stackoverflow.com/questions/12938581/ffmpeg-mux-video-and-audio-from-another-video-mapping-issue
- os.system("ffmpeg -an -i tmp.mp4 -vn -i " + video_path + " -c:a copy -c:v copy " + video_out_path + " -y")
- 取 video1（tmp.mp4）的视频部分（无音频，即 -an），和 video2（video_in）的音频部分（不要视频，即 -vn），最后合成到 video_out 中。
最后把上面两条命令合到一起。

我在 onnx_video2anime.py 和 onnx_app.py 中使用了这种办法。

2 使用 PyAV

2.1 说明

PyAV 是 FFmpeg 的 Pythonic 绑定。

文档（稳定版）：https://pyav.org/docs/stable/

安装：pip install --user av

在网络上有用的教程并不多，无论是中文网站还是英文网站，在浏览了一圈之后发现没有比文档更好的。
但是官方文档上的样例也少的可怜，能看的只有 Cookbook 的短短两页（Reference 看了白看）。
- https://pyav.org/docs/stable/cookbook/basics.html
- https://pyav.org/docs/stable/cookbook/numpy.html
真正有用的参考来自于其 issue，下面列出：
- 本次代码参考：https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956
- 认为有价值的 issue（但是并不是对的）：https://github.com/PyAV-Org/PyAV/issues/302
  - 开头的代码写的就有略有问题，但是可以参考。
  - 下面还有一个仅转编码的很好的例子，但是我还没实际测试：https://github.com/PyAV-Org/PyAV/issues/302#issuecomment-415829779
仓库 issue 里还有很多和我一样的痛苦的使用者：
- 抱怨文档中没有关于处理音频的样例代码：https://github.com/PyAV-Org/PyAV/issues/1144
- 和我目的几乎相同的三人，第三位的解决路径基本和我相似，只是还差很小一步。

2.2 代码

在经历了很长时间的寻找和测试之后，终于成功了。

下面仅展示两个更改的函数。

def process_image_alter(img, x32=True):
    h, w = img.shape[:2]
    if x32:  # resize image to multiple of 32s
        def to_32s(x):
            return 256 if x < 256 else x - x % 32

        img = cv2.resize(img, (to_32s(w), to_32s(h)))
    img = img.astype(np.float32) / 127.5 - 1.0  # 注意修改
    return img

def cvt2anime_video(video_path, output, model, onnx='model.onnx'):
    # check onnx model
    exists = os.path.isfile(onnx)
    if not exists:
        print('Model file not found:', onnx)
        return

    # 加载模型，若有 GPU, 则用 GPU 推理
    # 参考：https://zhuanlan.zhihu.com/p/645720587
    # 慎入！https://zhuanlan.zhihu.com/p/492040015
    if ort.get_device() == 'GPU':
        print('use gpu')
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider', ]
        session = ort.InferenceSession(onnx, providers=providers)
        session.set_providers(['CUDAExecutionProvider'], [{'device_id': 0}])  # gpu 0
    else:
        print('use cpu')
        providers = ['CPUExecutionProvider', ]
        session = ort.InferenceSession(onnx, providers=providers)

    video_in_name = os.path.basename(video_path)  # 只取文件名
    # 输出视频名称、路径
    video_out_name = video_in_name.rsplit('.', 1)[0] + '_' + model + '.mp4'
    video_out_path = os.path.join(output, video_out_name)

    # 载入视频
    in_container = av.open(video_path, 'r')
    in_video_stream = next(s for s in in_container.streams if s.type == 'video')
    in_audio_stream = next(s for s in in_container.streams if s.type == 'audio')

    fps = in_video_stream.base_rate  # 帧率
    width = in_video_stream.width  # 帧宽
    height = in_video_stream.height  # 帧高
    total_time_in_second = in_video_stream.duration * 1.0 * in_video_stream.time_base  # 视频总长
    total_frame = int(total_time_in_second * fps)  # 视频总帧数

    out_container = av.open(video_out_path, 'w')
    out_video_stream = out_container.add_stream("h264", rate=fps)
    out_audio_stream = out_container.add_stream(template=in_audio_stream)

    out_video_stream.width = width
    out_video_stream.height = height

    pbar = tqdm(total=total_frame, ncols=80)
    pbar.set_description(f"Making: {video_out_name}")

    for packet in in_container.demux(in_video_stream, in_audio_stream):

        _type = packet.stream.type

        for frame in packet.decode():
            if _type == 'video':
                frame = frame.to_ndarray(format="rgb24")  # 这里 frame 得到了 rgb 格式
                # https://www.zhihu.com/question/452884533 VideoCapture 读出来的图片默认是 BGR 格式，所以需要转
                # 但是这里 frame 可以指定格式，所以后面就不 cvtColor 了。

                frame = np.asarray(
                    np.expand_dims(process_image_alter(frame), 0))  # 修改原来的 process_image 函数，不用转换 cvtColor 了
                fake_img = session.run(None, {session.get_inputs()[0].name: frame})
                fake_img = post_precess(fake_img[0], (width, height))

                frame = av.VideoFrame.from_ndarray(fake_img, format="rgb24")  # 接收 rgb
                out_container.mux(out_video_stream.encode(frame))

                pbar.update(1)  # bar 跟随 video frame

            elif _type == 'audio':
                # We need to skip the "flushing" packets that `demux` generates.
                if packet.dts is None:
                    continue
                # We need to assign the packet to the new stream.
                packet.stream = out_audio_stream
                out_container.mux(packet)

    pbar.close()

    # Close the file
    out_container.close()

    return video_out_path

把输入视频放到 in_container 中，然后获取 video 和 audio 两个 stream，然后是各种参数。
输出容器是 out_container，只要指定其编码格式为 h264，那么就能解决转编码的问题。
循环中处理 video 和 audio 两个 stream，video 的处理方法同前，audio 不做处理。
- video 部分处理参考：https://pyav.org/docs/stable/cookbook/numpy.html，先 to_ndarray，处理完后再 from_ndarray。
- audio 部分处理参考：https://pyav.org/docs/stable/cookbook/basics.html?highlight=remuxing

我在 onnx_video2anime_pyav.py 和 onnx_app_pyav.py 中使用了这种办法。

2.3 遇到的问题和解决办法

现象：一开始只处理 video 的时候简单套用了之前的代码，但是遇到了视频变蓝的问题。

原因：视频 RGB 和 BGR 格式错误。

参考：https://www.zhihu.com/question/452884533

解释：原来的代码中，OpenCV 的 VideoCapture.read 得到的是 BGR 格式的图片，所以在中间的处理过程中转成了 RGB 格式。

而现在 frame.to_ndarray 可以直接指定得到的图片格式，那就可以直接得到 RGB 格式：

frame = frame.to_ndarray(format="rgb24")  # 这里 frame 得到了 rgb 格式

于是后面 process_image 函数也不需要 cvtColor 来转换了。

现象：只得到音频，或者只得到视频。

最开始参考的是：https://github.com/PyAV-Org/PyAV/issues/302#issue-310229729；

但是后来参考的才是对的（尽管他们很像）：https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956。

原因：循环部分 packet.stream.type 的位置错误。

现象：视频音画不同步。

原因：没有为输出的视频指定和原视频一样的 fps。

补上 rate=fps：

out_video_stream = out_container.add_stream("h264", rate=fps)

进度条 pbar 怎么处理：跟随 video 的 frame 的处理，在所有循环结束后关闭。

最后修改：2023 年 10 月 26 日

如果觉得我的文章对你有用，请随意赞赏

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

为视频转编码以及添加音频

Shirou • 2023 年 10 月 22 日

<h1>为视频转编码以及添加音频</h1>系列仓库地址：<a class="no-external-link" href="https://github.com/xuanhao44/AnimeGANv2" target="_blank">https://github.com/xuanhao44/AnimeGANv2</a><hr>在前篇的 3.3 和 4 中提到 <a class="no-external-link" href="https://www.sheniao.top/tech/191.html" target="_blank">https://www.sheniao.top/tech/191.html</a>，由 OpenCV 的 <code>VideoWriter</code> 导出的视频由于专利问题，并不是原生支持 H264 编码的，而这正是能在浏览器上播放的视频编码格式。于是需要想办法转成这种编码格式。为此尝试了 issue 里提到的方法，也就是自己编译 OpenCV，再带上 <code>openh264</code> 库，但是很遗憾的失败了。这次就来尝试其他的办法给视频转编码。<h2>0 服务器</h2>带显卡的服务器：RTX A4000。<h2>1 直接调用 FFMPEG</h2>需要安装 ffmpeg：<pre><code class="lang-shell">sudo apt update
sudo apt install ffmpeg -y</code></pre>部分代码：<pre><code class="lang-python"> video_out = cv2.VideoWriter(&quot;tmp.mp4&quot;, fourcc, fps, (width, height))

...(省略)

# When your video is ready, just run the following command
    # You can actually just write the command below in your terminal

# https://snipit.io/public/snippets/43806
    # os.system(&quot;ffmpeg -i Video.mp4 -vcodec libx264 Video2.mp4&quot;)
    # os.system(&quot;ffmpeg -i tmp.mp4 -vcodec libx264 &quot; + video_out_path + &quot; -y&quot;)

# https://stackoverflow.com/questions/12938581/ffmpeg-mux-video-and-audio-from-another-video-mapping-issue
    # ffmpeg -an -i tmp.mp4 -vn -i video_path -c:a copy -c:v copy video_out_path
    # os.system(&quot;ffmpeg -an -i tmp.mp4 -vn -i &quot; + video_path + &quot; -c:a copy -c:v copy &quot; + video_out_path + &quot; -y&quot;)

# 合成大西瓜！
 os.system(
 &quot;ffmpeg -an -i tmp.mp4 -vn -i &quot; + video_path + &quot; -c:a copy -c:v copy -vcodec libx264 &quot; + video_out_path + &quot; -y&quot;)</code></pre><ul><li>灵感来源：<a class="no-external-link" href="https://snipit.io/public/snippets/43806" target="_blank">https://snipit.io/public/snippets/43806</a><ul><li><code>os.system(&quot;ffmpeg -i tmp.mp4 -vcodec libx264 &quot; + video_out_path + &quot; -y&quot;)</code></li><li>首先是 <code>video_in</code> 处理成 <code>tmp.mp4</code>，然后再通过 FFMPEG 来转编码为 <code>video_out</code>。</li></ul></li><li>添加原视频音轨：<a class="no-external-link" href="https://stackoverflow.com/questions/12938581/ffmpeg-mux-video-and-audio-from-another-video-mapping-issue" target="_blank">https://stackoverflow.com/questions/12938581/ffmpeg-mux-video-and-audio-from-another-video-mapping-issue</a><ul><li><code>os.system(&quot;ffmpeg -an -i tmp.mp4 -vn -i &quot; + video_path + &quot; -c:a copy -c:v copy &quot; + video_out_path + &quot; -y&quot;)</code></li><li>取 <code>video1</code>（<code>tmp.mp4</code>）的视频部分（无音频，即 <code>-an</code>），和 <code>video2</code>（<code>video_in</code>）的音频部分（不要视频，即 <code>-vn</code>），最后合成到 <code>video_out</code> 中。</li></ul></li><li>最后把上面两条命令合到一起。</li></ul><hr>我在 <code>onnx_video2anime.py</code> 和 <code>onnx_app.py</code> 中使用了这种办法。<ul><li><a class="no-external-link" href="https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_video2anime.py" target="_blank">https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_video2anime.py</a></li><li><a class="no-external-link" href="https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_app.py" target="_blank">https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_app.py</a></li></ul><h2>2 使用 PyAV</h2><h2>2.1 说明</h2>PyAV 是 FFmpeg 的 Pythonic 绑定。文档（稳定版）：<a class="no-external-link" href="https://pyav.org/docs/stable/" target="_blank">https://pyav.org/docs/stable/</a>安装：<code>pip install --user av</code><ul><li>在网络上有用的教程并不多，无论是中文网站还是英文网站，在浏览了一圈之后发现没有比文档更好的。</li><li>但是官方文档上的样例也少的可怜，能看的只有 Cookbook 的短短两页（Reference 看了白看）。<ul><li><a class="no-external-link" href="https://pyav.org/docs/stable/cookbook/basics.html" target="_blank">https://pyav.org/docs/stable/cookbook/basics.html</a></li><li><a class="no-external-link" href="https://pyav.org/docs/stable/cookbook/numpy.html" target="_blank">https://pyav.org/docs/stable/cookbook/numpy.html</a></li></ul></li><li>真正有用的参考来自于其 issue，下面列出：<ul><li>本次代码参考：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956" target="_blank">https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956</a></li><li>认为有价值的 issue（但是并不是对的）：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/issues/302" target="_blank">https://github.com/PyAV-Org/PyAV/issues/302</a><ul><li>开头的代码写的就有略有问题，但是可以参考。</li><li>下面还有一个仅转编码的很好的例子，但是我还没实际测试：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/issues/302#issuecomment-415829779" target="_blank">https://github.com/PyAV-Org/PyAV/issues/302#issuecomment-415829779</a></li></ul></li></ul></li><li>仓库 issue 里还有很多和我一样的痛苦的使用者：<ul><li>抱怨文档中没有关于处理音频的样例代码：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/issues/1144" target="_blank">https://github.com/PyAV-Org/PyAV/issues/1144</a></li><li>和我目的几乎相同的三人，第三位的解决路径基本和我相似，只是还差很小一步。<ul><li><a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/discussions/853" target="_blank">https://github.com/PyAV-Org/PyAV/discussions/853</a></li><li><a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/issues/1093" target="_blank">https://github.com/PyAV-Org/PyAV/issues/1093</a></li><li><a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/discussions/1081" target="_blank">https://github.com/PyAV-Org/PyAV/discussions/1081</a></li></ul></li></ul></li></ul><h2>2.2 代码</h2>在经历了很长时间的寻找和测试之后，终于成功了。下面仅展示两个更改的函数。<pre><code class="lang-python">def process_image_alter(img, x32=True):
 h, w = img.shape[:2]
 if x32: # resize image to multiple of 32s
 def to_32s(x):
 return 256 if x &lt; 256 else x - x % 32

img = cv2.resize(img, (to_32s(w), to_32s(h)))
    img = img.astype(np.float32) / 127.5 - 1.0  # 注意修改
    return img

def cvt2anime_video(video_path, output, model, onnx='model.onnx'):
    # check onnx model
    exists = os.path.isfile(onnx)
    if not exists:
        print('Model file not found:', onnx)
        return

# 加载模型，若有 GPU, 则用 GPU 推理
    # 参考：https://zhuanlan.zhihu.com/p/645720587
    # 慎入！https://zhuanlan.zhihu.com/p/492040015
    if ort.get_device() == 'GPU':
        print('use gpu')
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider', ]
        session = ort.InferenceSession(onnx, providers=providers)
        session.set_providers(['CUDAExecutionProvider'], [{'device_id': 0}])  # gpu 0
    else:
        print('use cpu')
        providers = ['CPUExecutionProvider', ]
        session = ort.InferenceSession(onnx, providers=providers)

video_in_name = os.path.basename(video_path)  # 只取文件名
    # 输出视频名称、路径
    video_out_name = video_in_name.rsplit('.', 1)[0] + '_' + model + '.mp4'
    video_out_path = os.path.join(output, video_out_name)

# 载入视频
    in_container = av.open(video_path, 'r')
    in_video_stream = next(s for s in in_container.streams if s.type == 'video')
    in_audio_stream = next(s for s in in_container.streams if s.type == 'audio')

fps = in_video_stream.base_rate  # 帧率
    width = in_video_stream.width  # 帧宽
    height = in_video_stream.height  # 帧高
    total_time_in_second = in_video_stream.duration * 1.0 * in_video_stream.time_base  # 视频总长
    total_frame = int(total_time_in_second * fps)  # 视频总帧数

out_container = av.open(video_out_path, 'w')
    out_video_stream = out_container.add_stream(&quot;h264&quot;, rate=fps)
    out_audio_stream = out_container.add_stream(template=in_audio_stream)

out_video_stream.width = width
    out_video_stream.height = height

pbar = tqdm(total=total_frame, ncols=80)
    pbar.set_description(f&quot;Making: {video_out_name}&quot;)

for packet in in_container.demux(in_video_stream, in_audio_stream):

_type = packet.stream.type

for frame in packet.decode():
            if _type == 'video':
                frame = frame.to_ndarray(format=&quot;rgb24&quot;)  # 这里 frame 得到了 rgb 格式
                # https://www.zhihu.com/question/452884533 VideoCapture 读出来的图片默认是 BGR 格式，所以需要转
                # 但是这里 frame 可以指定格式，所以后面就不 cvtColor 了。

frame = np.asarray(
                    np.expand_dims(process_image_alter(frame), 0))  # 修改原来的 process_image 函数，不用转换 cvtColor 了
                fake_img = session.run(None, {session.get_inputs()[0].name: frame})
                fake_img = post_precess(fake_img[0], (width, height))

frame = av.VideoFrame.from_ndarray(fake_img, format=&quot;rgb24&quot;)  # 接收 rgb
                out_container.mux(out_video_stream.encode(frame))

pbar.update(1)  # bar 跟随 video frame

elif _type == 'audio':
                # We need to skip the &quot;flushing&quot; packets that `demux` generates.
                if packet.dts is None:
                    continue
                # We need to assign the packet to the new stream.
                packet.stream = out_audio_stream
                out_container.mux(packet)

pbar.close()

# Close the file
    out_container.close()

return video_out_path</code></pre><ul><li>把输入视频放到 <code>in_container</code> 中，然后获取 video 和 audio 两个 stream，然后是各种参数。</li><li>输出容器是 <code>out_container</code>，只要指定其编码格式为 <code>h264</code>，那么就能解决转编码的问题。</li><li>循环中处理 video 和 audio 两个 stream，video 的处理方法同前，audio 不做处理。<ul><li>video 部分处理参考：<a class="no-external-link" href="https://pyav.org/docs/stable/cookbook/numpy.html" target="_blank">https://pyav.org/docs/stable/cookbook/numpy.html</a>，先 <code>to_ndarray</code>，处理完后再 <code>from_ndarray</code>。</li><li>audio 部分处理参考：<a class="no-external-link" href="https://pyav.org/docs/stable/cookbook/basics.html?highlight=remuxing" target="_blank">https://pyav.org/docs/stable/cookbook/basics.html?highlight=remuxing</a></li></ul></li></ul><hr>我在 <code>onnx_video2anime_pyav.py</code> 和 <code>onnx_app_pyav.py</code> 中使用了这种办法。<ul><li><a class="no-external-link" href="https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_video2anime_pyav.py" target="_blank">https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_video2anime_pyav.py</a></li><li><a class="no-external-link" href="https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_app_pyav.py" target="_blank">https://github.com/xuanhao44/AnimeGANv2/blob/main/onnx_app_pyav.py</a></li></ul><h3>2.3 遇到的问题和解决办法</h3>现象：一开始只处理 video 的时候简单套用了之前的代码，但是遇到了视频变蓝的问题。原因：视频 RGB 和 BGR 格式错误。参考：<a class="no-external-link" href="https://www.zhihu.com/question/452884533" target="_blank">https://www.zhihu.com/question/452884533</a>解释：原来的代码中，OpenCV 的 <code>VideoCapture.read</code> 得到的是 BGR 格式的图片，所以在中间的处理过程中转成了 RGB 格式。而现在 <code>frame.to_ndarray</code> 可以直接指定得到的图片格式，那就可以直接得到 RGB 格式：<pre><code class="lang-python">frame = frame.to_ndarray(format=&quot;rgb24&quot;) # 这里 frame 得到了 rgb 格式</code></pre>于是后面 <code>process_image</code> 函数也不需要 <code>cvtColor</code> 来转换了。<hr>现象：只得到音频，或者只得到视频。最开始参考的是：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/issues/302#issue-310229729" target="_blank">https://github.com/PyAV-Org/PyAV/issues/302#issue-310229729</a>；但是后来参考的才是对的（尽管他们很像）：<a class="no-external-link" href="https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956" target="_blank">https://github.com/PyAV-Org/PyAV/discussions/866#discussion-3773956</a>。原因：循环部分 <code>packet.stream.type</code> 的位置错误。<hr>现象：视频音画不同步。原因：没有为输出的视频指定和原视频一样的 fps。补上 <code>rate=fps</code>：<pre><code class="lang-python">out_video_stream = out_container.add_stream(&quot;h264&quot;, rate=fps)</code></pre><hr>进度条 <code>pbar</code> 怎么处理：跟随 video 的 frame 的处理，在所有循环结束后关闭。

为视频转编码以及添加音频

0 服务器

1 直接调用 FFMPEG

2 使用 PyAV

2.1 说明

2.2 代码

2.3 遇到的问题和解决办法

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

为视频转编码以及添加音频

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款