CMS (PKCS7) signed-data Constructed Octets

In a CMS (PKCS#7) signed-data structure, the actual content (data) being signed is encoded using ASN.1, and that content can be represented in two forms:


1. Primitive Octet String

  • A single ASN.1 OCTET STRING that contains all the data in one block.
  • Most common for small or medium-sized payloads.
  • Encoded as:
    OCTET STRING ::= [UNIVERSAL 4] <length> <data>
  • Efficient and simple for parsing.
  • 2. Constructed Octet String

    • The OCTET STRING is marked as constructed (tag = 0x24) and contains a sequence of nested octet strings.
    • Used when:
      • The content is large (e.g., for streaming)
      • Encoders choose to segment the data
    • Each sub-octet string contains a fragment of the full content.
    • Useful for streaming implementations where data is written incrementally.

    Why Does This Happen?

    The CMS EncapsulatedContentInfo type looks like:

    EncapsulatedContentInfo ::= SEQUENCE {
    eContentType ContentType,
    eContent [0] EXPLICIT OCTET STRING OPTIONAL
    }
    

    That OCTET STRING (the eContent) can be encoded:

    Format Use Case ASN.1 Tag
    **Primitive** Default for small/complete data "0x04"
    **Constructed** Chunked or streamed content "0x24"

    Both are valid under ASN.1 rules — the choice depends on the encoder and context.


    Handling in Software

    When parsing CMS signed-data:

    • Libraries must accept either form. (Chilkat's library accepts eithe form.)
    • To reconstruct the full content from a constructed OCTET STRING, concatenate the bytes of each sub-octet.
    • The Chilkat library, when creating signed-data, uses the primitive octet string by default, but can use the constructed octet strings by setting the ConstructedOctets JSON member equal to true in the CmsOptions property.

      Example (pseudo-structure of constructed form):

      [0] EXPLICIT OCTET STRING (constructed)
      ├─ OCTET STRING (chunk 1)
      ├─ OCTET STRING (chunk 2)
      └─ ...
      

      Summary

      In CMS signed-data, the encapsulated content can be a single primitive OCTET STRING or a constructed OCTET STRING composed of multiple sub-strings. Both are valid ASN.1 encodings and must be supported by compliant parsers, especially for streamed or large content.