opendatalab/MinerU-HTML
opendatalab/MinerU-HTML is an advanced HTML main content extraction tool developed by OpenDatalab. This model leverages Large Language Models (LLMs) for intelligent content identification and uses state machine-guided generation to produce structured JSON output. It provides a complete pipeline for extracting primary content from HTML pages, featuring a fallback mechanism and comprehensive evaluation capabilities. MinerU-HTML is optimized for accurate and structured main content extraction from web pages.
No reviews yet. Be the first to review!