Webpage Readable Content Extraction Intelligently extracts key elements of articlesMethod: POST Path: /v1/websitetools/readability?appkey={{appkey}} Demo: https://api.gugudata.io/v1/websitetools/readability/demoRequest Parameters:
appkey (string, required): Obtained after payment
html (string, optional): The webpage HTML content to be extracted, choose either this parameter or url
url (string, optional): The webpage URL to be extracted, choose either this parameter or html. (Issues caused by the source site's anti-crawling measures that prevent normal webpage content requests for subsequent processing are not handled)
Response Fields Count: 15 Response Field Examples:
DataStatus.RequestParameter: API request parameter
DataStatus.StatusCode: API return status code
DataStatus.StatusDescription: API return status description
DataStatus.ResponseDateTime: API data return time
DataStatus.DataTotalCount: Total data count under this condition, generally used for pagination
Data.Title: Article title
Data.Byline: Article author
Data.Dir: Article text direction
... total 15 fields
Key Features:
Intelligently extracts readable content from webpages
Provides HTML code of the webpage's readable content
Supports passing either webpage HTML or webpage URL parameters
Supports extraction of various elements information including article title, author, text direction, language, content, content (without HTML tags, divided by paragraphs), article length, excerpt, website name, publication time
Second-level parsing performance, supporting high concurrency