title: "数据流与数据格式" post_status: publish comment_status: open taxonomy: category: - gutenberg-docs post_tag: - Architecture - Explanations - Repos

数据流与数据格式

格式说明

区块编辑器文章是文章在区块感知层面的规范表示形式：它是一系列语义一致的描述集合，用于说明每个区块是什么及其核心数据。这种表示形式仅存在于内存中。它如同排版车间的追版过程——随着铅字的嵌入和重新定位而持续变化。

区块编辑器文章并非其最终产物（即post_content）。后者如同印刷成品页，虽为读者体验而优化，但仍保留着供后续编辑使用的隐形标记。

区块编辑器的输入与输出均为采用当前格式的区块对象树：

const value = [ block1, block2, block3 ];

The block object

Each block object has an id, a set of attributes and potentially a list of child blocks.

const block = {
    clientId, // unique string identifier.
    type, // The block type (paragraph, image...)
    attributes, // (key, value) set of attributes representing the direct properties/content of the current block.
    innerBlocks, // An array of child blocks or inner blocks.
};

Note the attributes keys and types, the allowed inner blocks are defined by the block type. For example, the core quote block has a cite string attribute representing the cite content while a heading block has a numeric level attribute, representing the level of the heading (1 to 6).

During the lifecycle of the block in the editor, the block object can receive extra metadata:

isValid: A boolean representing whether the block is valid or not;
originalContent: The original HTML serialization of the block.

Examples

// A simple paragraph block.
const paragraphBlock1 = {
    clientId: '51828be1-5f0d-4a6b-8099-f4c6f897e0a3',
    type: 'core/paragraph',
    attributes: {
        content: 'This is the <strong>content</strong> of the paragraph block',
        dropCap: true,
    },
};

// A separator block.
const separatorBlock = {
    clientId: '51828be1-5f0d-4a6b-8099-f4c6f897e0a4',
    type: 'core/separator',
    attributes: {},
};

// A columns block with a paragraph block on each column.
const columnsBlock = {
    clientId: '51828be1-5f0d-4a6b-8099-f4c6f897e0a7',
    type: 'core/columns',
    attributes: {},
    innerBlocks: [
        {
            clientId: '51828be1-5f0d-4a6b-8099-f4c6f897e0a5',
            type: 'core/column',
            attributes: {},
            innerBlocks: [ paragraphBlock1 ],
        },
        {
            clientId: '51828be1-5f0d-4a6b-8099-f4c6f897e0a6',
            type: 'core/column',
            attributes: {},
            innerBlocks: [ paragraphBlock2 ],
        },
    ],
};

序列化与解析

然而，这种数据模型仅存在于编辑文章时的内存中。页面渲染后，浏览者无法看到它，就像印刷页面上不会留下印刷机中字母结构的痕迹。

由于整个 WordPress 生态系统在渲染或编辑文章时都期望接收 HTML，区块编辑器通过序列化将其数据转换为可保存到 post_content 中的形式。这确保了内容的单一真实来源，并且该来源在当前与 WordPress 内容交互的所有工具中保持可读性和兼容性。如果我们将对象树单独存储，将面临 post_content 与对象树不同步的风险，以及数据在两地重复的问题。

因此，序列化过程使用 HTML 注释作为显式的区块分隔符，将区块树转换为 HTML——这些注释可以包含非 HTML 格式的属性。这就像在印刷页面上留下不可见的标记，以追溯原始的结构化意图。

这是流程的一端。另一端则是每当需要再次编辑文章时，如何重建区块集合。形式化语法定义了区块编辑器文章的序列化表示应如何加载，正如一些基本规则定义了如何将树转换为类 HTML 字符串。区块编辑器文章并非设计为手动编辑；它们本质上不是 HTML 文档，因此也不设计为以 HTML 文档形式编辑。

它们只是碰巧以某种方式存储在 post_content 中，这种方式使得任何旧系统无需转换即可查看。诚然，在没有相应机制的情况下将存储的 HTML 加载到浏览器中可能会降低体验，如果包含动态内容区块，动态元素可能无法加载，服务器生成的内容可能不会显示，交互式内容可能保持静态。然而，这至少确保了在未感知区块的主题和安装上仍能查看区块编辑器文章，并提供了最易访问的内容方式。换句话说，即使保存的 HTML 按原样渲染，文章内容也基本保持完整。

Delimiters and parsing expression grammar

We chose instead to try to find a way to keep the formality, explicitness, and unambiguity in the existing HTML syntax. Within the HTML there were a number of options.

Of these options, a novel approach was suggested: by storing data in HTML comments, we would know that we wouldn't break the rest of the HTML in the document, that browsers should ignore it, and that we could simplify our approach to parsing the document.

Unique to HTML comments is the fact that they cannot legitimately exist in ambiguous places, such as inside of HTML attributes like <img alt='data-id="14"'>. Comments are also quite permissive. Whereas HTML attributes are complicated to parse properly, comments are quite easily described by a leading . This simplicity and permissiveness means that the parser can be implemented in several ways without needing to understand HTML properly, and we have the liberty to use more convenient syntax inside of the comment—we only need to escape double-hyphen sequences. We take advantage of this in how we store block attributes: as JSON literals inside the comment.

After running this through the parser, we're left with a simple object we can manipulate idiomatically, and we don't have to worry about escaping or unescaping the data. It's handled for us through the serialization process. Because the comments are so different from other HTML tags and because we can perform a first-pass to extract the top-level blocks, we don't actually depend on having fully valid HTML!

This has dramatic implications for how simple and performant we can make our parser. These explicit boundaries also protect damage in a single block from bleeding into other blocks or tarnishing the entire document. It also allows the system to identify unrecognized blocks before rendering them.

N.B.: The defining aspects of blocks are their semantics and the isolation mechanism they provide: in other words, their identity. On the other hand, where their data is stored is a more liberal aspect. Blocks support more than just static local data (via JSON literals inside the HTML comment or within the block's HTML), and more mechanisms (e.g., global blocks or otherwise resorting to storage in complementary WP_Post objects) are expected. See attributes for details.

序列化区块的结构解析

当区块在编辑会话后被保存到内容中时，其属性——取决于区块的性质——会被序列化到这些显式的注释分隔符中。

<!-- wp:image -->
<figure class="wp-block-image"><img src="source.jpg" alt="" /></figure>
<!-- /wp:image -->

一个纯动态区块，在显示前需要由服务器渲染，可能看起来像这样：

<!-- wp:latest-posts {"postsToShow":4,"displayPostDate":true} /-->

数据生命周期

简而言之，区块编辑器工作流将已保存的文档解析为内存中的区块树，并借助标记分隔符。编辑过程中，所有操作都在区块树内进行。最后通过将区块序列化回 post_content 来结束流程。

该工作流依赖序列化/解析器对来持久化文章。理论上，文章数据结构可通过插件存储，或从远程 JSON 文件获取并转换为区块树。