[DEMO] Webpage Readable Content Extraction

Webpage Readable Content Extraction
Intelligently extracts key elements of articles

Method: POST
Path: /v1/websitetools/readability?appkey={{appkey}}
Demo: https://api.gugudata.io/v1/websitetools/readability/demo

Request Parameters:

appkey (string, required): Obtained after payment

html (string, optional): The webpage HTML content to be extracted, choose either this parameter or url

url (string, optional): The webpage URL to be extracted, choose either this parameter or html. (Issues caused by the source site's anti-crawling measures that prevent normal webpage content requests for subsequent processing are not handled)

Response Fields Count: 15
Response Field Examples:

DataStatus.RequestParameter: API request parameter

DataStatus.StatusCode: API return status code

DataStatus.StatusDescription: API return status description

DataStatus.ResponseDateTime: API data return time

DataStatus.DataTotalCount: Total data count under this condition, generally used for pagination

Data.Title: Article title

Data.Byline: Article author

Data.Dir: Article text direction

... total 15 fields

Key Features:

Intelligently extracts readable content from webpages

Provides HTML code of the webpage's readable content

Supports passing either webpage HTML or webpage URL parameters

Supports extraction of various elements information including article title, author, text direction, language, content, content (without HTML tags, divided by paragraphs), article length, excerpt, website name, publication time

Second-level parsing performance, supporting high concurrency

Details:
https://gugudata.io/details/readability

Responses

⚪0

application/json

Body

Example

{
  "DataStatus": {
    "RequestParameter": "string",
    "StatusCode": 0,
    "StatusDescription": "string",
    "ResponseDateTime": "string",
    "DataTotalCount": 0
  },
  "Data": {
    "Title": "string",
    "Byline": "string",
    "Dir": "string",
    "Lang": "string",
    "Content": "string",
    "TextContent": "string",
    "Length": 0,
    "Excerpt": "string",
    "SiteName": "string",
    "PublishedTime": [
      "string"
    ]
  }
}

Request

Request Code Samples

Responses