简介

MinerU 2.0使用sglang加速,与之前差别较大,建议按照官方的Docker镜像的方式启动。

Docker镜像

Dockerfile

这是官方的Dockerfile

# Use the official sglang image
FROM lmsysorg/sglang:v0.4.7-cu124

# install mineru latest
RUN python3 -m pip install -U 'mineru[core]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages

# Download models and update the configuration file
RUN /bin/bash -c "mineru-models-download -s modelscope -m all"

# Set the entry point to activate the virtual environment and run the command line tool
ENTRYPOINT ["/bin/bash", "-c", "export MINERU_MODEL_SOURCE=local && exec \"$@\"", "--"]

建议使用下面这个Dockerfile,相较于官方的,它增加了缓存(提升下次构建的速度),值下载vlm的模型(官方还会下载pipeline的)。

# Use the official sglang image
FROM lmsysorg/sglang:v0.4.7-cu124

# install mineru latest
RUN --mount=type=cache,id=mineru_cache,target=/root/.cache,sharing=locked \
    python3 -m pip install -U 'mineru[core]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages

# Download models and update the configuration file
RUN --mount=type=cache,id=mineru_cache,target=/root/.cache,sharing=locked \
    mineru-models-download -s modelscope -m vlm && \
    cp -r /root/.cache/modelscope /tmp/modelscope
RUN mkdir -p /root/.cache && \
    mv /tmp/modelscope /root/.cache/modelscope


# Set the entry point to activate the virtual environment and run the command line tool
ENTRYPOINT ["/bin/bash", "-c", "export MINERU_MODEL_SOURCE=local && exec \"$@\"", "--"]

构建Docker镜像

docker build -t mineru-sglang:latest -f Dockerfile .

启动

Docker

# --gpus all
docker run -e MINERU_MODEL_SOURCE=local --gpus '"device=0,1"' \
  --shm-size 100g \
  -p 80:80 \
  --ipc=host \
  mineru-sglang:latest \
  mineru-sglang-server --host 0.0.0.0 --port 80 --enable-torch-compile --tp 2

Docker compose

services:
  mineru-sglang:
    image: mineru-sglang:latest
    container_name: mineru-sglang
    restart: always
    ports:
      - 30000:30000
    environment:
      MINERU_MODEL_SOURCE: local
    entrypoint: mineru-sglang-server
    command:
      --host 0.0.0.0
      --port 80
      # --enable-torch-compile  # You can also enable torch.compile to accelerate inference speed by approximately 15%
      # --dp 2  # If you have more than two GPUs with 24GB VRAM or above, you can use sglang's multi-GPU parallel mode to increase throughput  
      # --tp 2  # If you have two GPUs with 12GB or 16GB VRAM, you can use the Tensor Parallel (TP) mode
      # --mem-fraction-static 0.7  # If you have two GPUs with 11GB VRAM, in addition to Tensor Parallel mode, you need to reduce the KV cache size
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:30000/health || exit 1"]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]

测试

"""
pip install -U mineru -i https://mirrors.aliyun.com/pypi/simple
"""
import json
import os
import time

from mineru.backend.vlm.vlm_analyze import doc_analyze as vlm_doc_analyze
from mineru.backend.vlm.vlm_middle_json_mkcontent import union_make as vlm_union_make
from mineru.cli.common import convert_pdf_bytes_to_bytes_by_pypdfium2, prepare_env
from mineru.data.data_reader_writer import FileBasedDataWriter
from mineru.utils.enum_class import MakeMode


def process_pdf(
    file_path:str,
):
    output_dir = 'output'
    server_url = 'http://<mineru_sglang_ip>:<port>'
    f_make_md_mode = MakeMode.MM_MD
    f_dump_md = True
    f_dump_content_list = True
    f_dump_middle_json = True
    f_dump_model_output = True

    start = time.time()
    parts = os.path.splitext(os.path.basename(file_path))
    pdf_file_name = parts[0]
    with open(file_path, 'rb') as f:
        pdf_bytes = f.read()

    pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, 0, None)
    local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name, 'auto')
    image_writer, md_writer = FileBasedDataWriter(local_image_dir), FileBasedDataWriter(local_md_dir)
    end1 = time.time()
    print(f'start to call sglang, cost, {end1 - start}')
    middle_json, infer_result = vlm_doc_analyze(pdf_bytes, image_writer=image_writer, backend='sglang-client',
                                                server_url=server_url)
    end2 = time.time()
    print(f'end to call sglang, cost, {end2 - end1}')

    pdf_info = middle_json["pdf_info"]

    # draw_layout_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_layout.pdf")
    # draw_span_bbox(pdf_info, pdf_bytes, local_md_dir, f"{pdf_file_name}_span.pdf")

    if f_dump_md:
        image_dir = str(os.path.basename(local_image_dir))
        md_content_str = vlm_union_make(pdf_info, f_make_md_mode, image_dir)
        md_writer.write_string(
            f"{pdf_file_name}.md",
            md_content_str,
        )
        end3 = time.time()
        print(f'end to gen md, cost, {end3 - end2}')

    if f_dump_content_list:
        image_dir = str(os.path.basename(local_image_dir))
        content_list = vlm_union_make(pdf_info, MakeMode.CONTENT_LIST, image_dir)
        md_writer.write_string(
            f"{pdf_file_name}_content_list.json",
            json.dumps(content_list, ensure_ascii=False, indent=4),
        )

    if f_dump_middle_json:
        md_writer.write_string(
            f"{pdf_file_name}_middle.json",
            json.dumps(middle_json, ensure_ascii=False, indent=4),
        )

    if f_dump_model_output:
        model_output = ("\n" + "-" * 50 + "\n").join(infer_result)
        md_writer.write_string(
            f"{pdf_file_name}_model_output.txt",
            model_output,
        )

    print(f"local output dir is {local_md_dir}")


if __name__ == '__main__':
    file = 'demo.pdf'
    process_pdf(file)

Logo

GitCode 天启AI是一款由 GitCode 团队打造的智能助手,基于先进的LLM(大语言模型)与多智能体 Agent 技术构建,致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话,还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力,真正做到“一句话,让 Al帮你完成复杂任务”。

更多推荐