Presentation Graphic Stream (SUP files) BluRay Subtitle Format

presentation graphic stream specification

The Presentation Graphic Stream (PGS) specification is defined in the US Patent US 20090185789 A1 . This graphic stream definition is used to show subtitles in BluRay movies. When a subtitle stream in PGS format is ripped from a BluRay disc is usually saved in a file with the SUP extension (Subtitle Presentation).

A Presentation Graphic Stream (PGS) is made of several functional segments one after another.  These segments have the following header:

The DTS should indicate a time when the decoding of the sub picture starts, and the PTS indicates a time when the decoding ends and the sub picture is shown on the screen.  DTS is always zero in practice (at least from what I have found so far), so you can freely ignore this value. These timestamps have an accuracy of 90 kHz. This means that for example, if you have a PTS value of 0x0004C11C and you want to know the milliseconds from the start of the movie when the sub picture is shown, you have to divide the decimal value (311,580) by 90, and the result is the value you are looking for: 3,462 milliseconds (or 3.462 seconds).

As you can see, there are five types of segments used in PGS:

  • Presentation Composition Segment (PCS)
  • Window Definition Segment (WDS)
  • Palette Definition Segment (PDS)
  • Object Definition Segment (ODS)
  • End of Display Set Segment (END)

The Presentation Composition Segment (PCS) is also called the  Control Segment  because it indicates a new Display Set (DS) definition, composed by definition segments (WDS, PDS, ODS) until an END segment is found.

A Display Set (DS) is a sub picture definition that might look like this:

In a DS there can be several windows, palette and object definitions, and the composition objects define what is going to be shown on the screen.

Presentation Composition Segment

The Presentation Composition Segment is used for composing a sub picture. It is made of the following fields:

The composition state can be one of three values:

  • Epoch Start : This defines a new display. The Epoch Start contains all functional segments needed to display a new composition on the screen.
  • Acquisition Point : This defines a  display refresh . This is used to compose in the middle of the Epoch. It includes functional segments with new objects to be used in a new composition, replacing old objects with the same Object ID.
  • Normal : This defines a  display update , and contains only functional segments with elements that are different from the preceding composition. It’s mostly used to stop displaying objects on the screen by defining a composition with no composition objects (a value of zero in the Number of Composition Objects flag) but also used to define a new composition with new objects and objects defined since the Epoch Start.

The composition objects, also known as window information objects, define the position on the screen of every image that will be shown. They have the following structure:

When the Object Cropped Flag is set to true (or actually 0x40), then the sub picture is cropped to show only a portion of it. This is used for example when you don’t want to show the whole subtitle at first, but just a few words first, and then the rest.

Window Definition Segment

This segment is used to define the rectangular area on the screen where the sub picture will be shown. This rectangular area is called a  Window . This segment can define several windows, and all the fields from Window ID up to Window Height will repeat each other in the segment defining each window.  You can see it more clearly in the example at the end of this page.  Its structure is as follows:

Palette Definition Segment

This segment is used to define a palette for color conversion. It’s composed of the following fields:

There can be several palette entries, with different palette IDs so the last five fields can repeat.

Object Definition Segment

This segment defines the graphics object. These are images with rendered text on a transparent background. Its structure is explained in the following table:

The Run-length encoding method is defined in the US 7912305 B1 patent . Here’s a quick and dirty definition to this method:

End Segment

The end segment always has a segment size of zero and indicates the end of a Display Set (DS) definition. It appears immediately after the last ODS in one DS.

Let’s see a real world example. Take a look at this section of a SUP file:

This is a complete Display Set. These are the segments:

  • Magic Number: “PG” (0x5047)
  • Presentation Time: 17:11.822 (92,863,980 / 90)
  • Decoding Time: 0
  • Segment Type:  PCS (0x16)
  • Segment Size: 0x13 bytes
  • Width: 1920 (0x780)
  • Height: 1080 (0x438)
  • Frame rate: 0x10
  • Composition Number: 430 (0x1ae)
  • Composition State: Epoch Start (0x80)
  • Palette Update Flag: false
  • Palette ID: 0
  • Number of Composition Objects: 1
  • Object ID: 0
  • Window ID: 0
  • Object Cropped Flag: false
  • Object Horizontal Position: 773 (0x305)
  • Object Vertical Position: 108 (0x06c)
  • Segment Type: WDS (0x17)
  • Number of Windows: 2
  • Window Horizontal Position: 773 (0x305)
  • Window Vertical Position: 108 (0x06c)
  • Window Width: 377 (0x179)
  • Window Height 43 (0x02b)
  • Window ID: 1
  • Window Horizontal Position: 739 (0x2e3)
  • Window Vertical Position: 928 (0x3a0)
  • Window Width: 472 (0x1d8)
  • Segment Type:  PDS  (0x14)
  • Segment Size: 0x9d bytes
  • Palette Version: 0
  • 31 palette entries
  • Segment Type:  PDS  (0x15)
  • Segment Size: 0x21c2 bytes
  • Object Version Number: 0
  • Last in sequence flag: First and last sequence (0xC0)
  • Object Data Length: 0x0021bb bytes
  • Width: 377 (0x179)
  • Height: 43 (0x02b)
  • Segment Type:  END  (0x80)
  • Segment Size: 0 bytes

This Display Set will show an image of 377×43 in size, starting at 17 minutes, 11.822 seconds on the screen at offset 773×108.

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Reddit (Opens in new window)

4 thoughts to “Presentation Graphic Stream (SUP files) BluRay Subtitle Format”

I am currently working on a script for changing the size of subtitles after cropping a video in Handbrake. This article has been very usefull, however I think you forgot to include the “number of windows” byte of the Window Definition Segment and that the windows repeat over that number. I had to look through the source code of BDSup2sub before I realized this, and after realizing this I finally understood how your WDS could have 0x13 in size which didn’t add up with the 9 bytes specified in the WDS segment.

You’re absolutely right! Even in the example you can see that there are two windows. I’ll update the post with that information. The problem is that the Patent Document doesn’t specify it, because it isn’t clear there.

Useful note: PGS supports up to 64 presentation objects in one epoch. But only 2 objects can be shown simultaneously. I.e. only 2 objects can be in one Presentation Composition Segment.

It should be noted that for the ODS packets, the value of “Object Data Length” also includes the four bytes of the “Width” and “Height” fields that immediately follow it. Therefore, the correct length of “Object Data” is actually the value of “Object Data Length” minus four bytes.

Leave a Reply Cancel reply

Subtitle File Formats: A Comprehensive Overview

Subtitle file formats are essential for enhancing the accessibility and understanding of video content. Different formats offer unique features and compatibility options. In this article, we will provide a comprehensive overview of popular subtitle file types and discuss their characteristics.

Frame from the film Lara - the gift of languages

In the dynamic world of video and media, subtitle file formats are essential for ensuring accessibility and improving viewer experiences. With a wide range of formats to choose from, each designed for particular needs, this article explores some of the most popular subtitle formats. From the straightforward SRT to the more complex capabilities of TTML, these formats meet various needs, including compatibility with different players, support for detailed formatting, and synchronization accuracy. Understanding these formats is crucial for anyone aiming to master the complexities of contemporary media consumption and creation.

There are two main types of formats:

Text-Based Formats : These formats, like SRT or SSA, store subtitles as plain text that you can read and even edit easily with a regular text editor. They’re commonly used in online videos and offline players, making them super accessible.

Binary Formats : Binary formats, such as IDX/SUB and STL, are a bit trickier. They store subtitles in a way that’s not human-readable without special software. You’ll find these in DVDs and Blu-ray discs, making them perfect for your movie nights.

Understanding these two categories is your ticket to mastering subtitles in today’s digital age, whether you’re a content creator or just a movie enthusiast.

Let’s explore some of the most widely used text-based subtitle formats.

Text-based subtitle formats

These text-based subtitle formats cater to various needs, from basic captioning to advanced styling and platform compatibility. Understanding the options available empowers content creators and viewers alike to make the most of subtitles in the digital age.

SubRipper (.srt)

The SubRipper format, commonly known as SRT, is widely supported by subtitle converters and players. It features a concise and easily understandable structure. When opened with a text editor, an SRT file displays the time when the text appears and the corresponding subtitles. This format is widely compatible and can be edited without difficulty.

The SRT Structure

Let’s take a closer look at how SRT works. Imagine a subtitle as a tiny snippet of text that appears on your screen when you’re watching a movie or video. SRT organizes these snippets neatly.

Each snippet in an SRT file has three main parts:

The Sequence Number: Think of it as a tag that says which subtitle comes first, second, third, and so on.

Timing Information: This part tells your video player when to display the subtitle and when to remove it. It’s like a stopwatch for your subtitles.

The Text: This is the actual subtitle, the words you see on the screen. It can be anything from spoken dialogue to captions.

An Example of SRT in Action:

In this SRT example, each snippet has a sequence number, timing information, and text. When you play the video, your player reads the SRT file and displays each snippet at the right time.

The Pros of SRT Subtitles

Now that we’ve introduced you to SRT (SubRip Text) subtitles, let’s dive deeper into what makes them truly remarkable. SRT subtitles have captured the hearts of content creators and viewers alike for several compelling reasons:

Universal Compatibility : SRT subtitles are like the friendly neighborhood superhero of the subtitle world. They work seamlessly with a wide range of video players and platforms. Whether you’re watching a video on your favorite streaming service or a personal project on your computer, chances are, SRT has got you covered.

Human-Friendly Editing : Have you ever wanted to make a quick change to a subtitle? With SRT, you don’t need to be a tech wizard. SRT files are plain text, which means you can edit them with a regular text editor. It’s as simple as updating a document on your computer. No need for complicated software or special skills.

Flexibility at Your Fingertips : SRT subtitles offer a level of flexibility that’s hard to beat. You have precise control over when each subtitle appears and disappears in your video. This precision ensures that your subtitles sync perfectly with the spoken dialogue or action on screen. It’s like having a personal conductor for your subtitles, making sure they harmonize with your video’s rhythm.

Accessible to All : One of the most beautiful things about SRT subtitles is their accessibility. They bridge language barriers, making it possible for people from different parts of the world to enjoy your content. Whether you’re sharing a heartwarming story, a tutorial, or a funny moment, SRT subtitles open the doors to a global audience.

User-Friendly for Everyone : Whether you’re a seasoned content creator or just someone who loves watching videos, SRT subtitles make the experience more enjoyable. They’re there to enhance your understanding, add context, and make sure you never miss a moment.

In a world where video content knows no boundaries, SRT subtitles are your trusted companions. They ensure that your message reaches far and wide, transcending languages and cultures. So, the next time you see those neat lines of text at the bottom of your screen, know that it’s not just text—it’s the magic of SRT subtitles, making your video experience exceptional.

That’s why SRT is one of the formats fully supported by Matesub for export. So, the next time you use Matesub to create or edit subtitles, rest assured that you can export them in the user-friendly and widely compatible SRT format, ready to enhance your videos and reach a global audience.

MicroDVD (.sub)

SUB subtitles are like the reliable Swiss Army knife of subtitles. They are named after MicroDVD, the software that popularized this format. What sets SUB apart is its straightforwardness and effectiveness.

The SUB Structure

To truly appreciate SUB subtitles, it’s essential to understand their structure. Imagine each subtitle as a small puzzle piece that fits perfectly into your video. SUB organizes these pieces neatly.

Each subtitle in a SUB file consists of three key elements:

The Sequence Number: This serves as an indicator, telling your video player the order in which the subtitles should appear.

Timing Information: Just like SRT, SUB subtitles come with precise timing details. These details dictate when a subtitle should begin and end. It’s like having a conductor directing each subtitle’s entrance and exit.

The Text: This is the heart of the subtitle—the actual words that appear on the screen. It can be dialogues, captions, or translations.

An Example of SUB in Action:

In this example, each subtitle consists of three parts: the sequence number, timing information, and the text. When you play the video, your player reads the SUB file and showcases each subtitle at the precise moment.

The Pros of SUB Subtitles

Having familiarized ourselves with the SUB format, let’s now explore its key advantages and why it could be a valuable choice for subtitles:

Precise Timing: SUB subtitles shine in the realm of synchronization. Their timing information ensures that subtitles align precisely with spoken dialogue and on-screen actions. It’s like having a conductor orchestrating each subtitle’s entrance and exit, delivering a seamless viewing experience.

Efficiency and Compactness: SUB files are renowned for their efficiency. They are compact, making them an ideal choice when storage space is a concern. Despite their small size, SUB subtitles pack a punch in delivering clear and effective communication.

Universal Appeal: SUB subtitles enjoy widespread support across various video players and platforms. This universal compatibility makes them versatile and accessible to a broad and diverse audience, transcending language and geographical boundaries.

WebVTT (.vtt)

The WebVTT format, often referred to as VTT, holds a prominent place in the world of subtitles. It’s recognized as a W3C standard , making it a dependable choice for web-based content. VTT subtitles are prized for their simplicity and compatibility, ensuring accessibility across various platforms.

Here are the key features of WebVTT:

Simple Text-Based Format : WebVTT is a plain text format, making it easy to create and edit using basic text editors. It uses a straightforward structure, making it human-readable and editable without specialized software.

Timestamps : WebVTT supports precise timing of captions and subtitles. Each caption or subtitle line is associated with a specific timestamp in hours, minutes, seconds, and milliseconds (HH:MM:SS.sss).

Cue Settings : You can specify various settings for individual cues (captions or subtitles) using the “::cue” selector. These settings include text styling (font, color, background), positioning, and voice differentiation.

Cue Styles : WebVTT allows you to define global styles or styles specific to certain cues. This enables you to customize the appearance of captions and subtitles to match your design or to distinguish between different speakers or voices.

Support for Multiple Languages : WebVTT supports multiple language tracks within a single file. You can include captions or subtitles in different languages, and viewers can select their preferred language if the video player supports it.

Line Breaks and Positioning : You can control line breaks in subtitles, ensuring that text is displayed in a readable manner. Additionally, you can specify the positioning of captions on the video screen.

Comments : You can include comments within a WebVTT file by using lines that begin with “NOTE.” These comments are ignored by the video player and can be used for documentation or annotations.

Compatibility : WebVTT is widely supported by HTML5 video players and web browsers, making it a reliable choice for adding captions and subtitles to web-based video content.

Accessibility : WebVTT supports accessibility features, allowing you to provide text descriptions for audio content (audio descriptions) and textual representations of non-speech sounds (sound descriptions) to ensure accessibility for users with disabilities.

Error Handling : WebVTT includes error-checking mechanisms to help identify and resolve formatting issues in the file, making it more robust.

Extensibility : While WebVTT provides a standardized format, it also allows for extensions and custom cues, which can be helpful in specific use cases.

Overall, WebVTT is a versatile and user-friendly format for adding captions, subtitles, and other text tracks to web videos, making them more inclusive and accessible to a wide range of viewers.

The WebVTT Structure

The WebVTT structure is relatively straightforward and consists of key components:

Header : The header section of a WebVTT file contains metadata and settings for the entire subtitle track. It begins with the keyword “WEBVTT” on the first line, which indicates that the file is using the WebVTT format. The header may also include settings for styling, positioning, and language preferences.

Style Block : The style block is an optional section within the header where you can define styles for captions and subtitles. These styles include font properties, colors, background, text shadow, and more. You can use the “::cue” selector to apply these styles globally to all cues or use specific selectors to target cues with specific attributes (e.g., speakers).

Cues : Cues are the individual subtitle or caption segments in a WebVTT file. Each cue starts with a timestamp indicating the cue’s start and end times, followed by the text content of the cue. Cues are separated by empty lines.

Comments : You can include comments within a WebVTT file to provide annotations or additional information. Comments start with the keyword “NOTE” followed by the comment text. These comments are typically ignored by video players.

Whitespace : WebVTT allows for some flexibility in terms of whitespace. You can have multiple spaces or tabs between elements, but line breaks are significant as they separate different components (e.g., header, cues).

Encoding : WebVTT files are typically encoded in UTF-8 to support various character sets and languages.

Language Support : WebVTT supports the inclusion of subtitles or captions in multiple languages within a single file, allowing viewers to select their preferred language if the video player supports it.

Accessibility Features : WebVTT also supports features for accessibility, including text descriptions for audio content (audio descriptions) and textual representations of non-speech sounds (sound descriptions).

In summary, a WebVTT file has a simple structure that includes a header with optional styling information, followed by individual cues that specify the timing and content of subtitles or captions. It is designed to be human-readable and easily editable with basic text editors, making it a popular choice for web-based video content.

Let’s consider a simple example thatillustrates the use of WebVTT to style and structure subtitles or captions for a video, including the differentiation of speakers (Speaker1 and Speaker2) and applying specific styles to their text:

Let’s have a look at The STYLE section (where you define the styling rules for subtitles or captions):

  • ::cue applies the specified styles to all cues (subtitles) by default.
  • In this example, all cues have a semi-transparent black background ( rgba(#000, 0.56) ), white text color ( #fff ), a bold Arial font ( font-family: Arial; font-weight: bold ), and a text shadow for readability.
  • Additionally, cues with the attribute voice="Speaker1" have green text color ( color: #00FF00; ), and cues with voice="Speaker2" have blue text color ( color: #0000FF; ).

Advanced SubStation Alpha (.ass) and SubStation Alpha (.ssa)

The Advanced SubStation Alpha (.ass) and SubStation Alpha (.ssa) subtitle formats are powerful and feature-rich options for adding subtitles and captions to video content. These formats are favored by experienced subtitlers and offer a wide range of capabilities. Both .ass and .ssa formats share similarities, and .ass is considered an enhanced version of .ssa. Here, we’ll explore the structure and advantages of these formats.

The .ass and .ssa Structure

Both .ass and .ssa formats follow a similar structure that includes key components:

Script Info : This section contains metadata and settings for the subtitle file, including the title, author, and various configuration options. It may specify the video resolution, default font, and more.

V4 Styles : .ass and .ssa formats offer extensive styling options. The V4 Styles section defines how subtitles are displayed, including font properties (typeface, size, color, bold, italic, underline), positioning, alignment, and more.

Events : The Events section contains the main body of the subtitles. Each subtitle event includes timing information (start and end times), layering information (important for complex formatting), and the actual text content.

An Example of .ass/.ssa in Action:

In this example, the .ass/.ssa format includes metadata in the Script Info section, styling information in the V4 Styles section, and the actual subtitle events in the Events section. Each event specifies timing, styling, and the text content to be displayed.

The Pros of .ass/.ssa Subtitles

1. Advanced Styling : .ass/.ssa formats provide extensive control over subtitle styling. You can specify fonts, colors, sizes, bold, italic, underline, and more for precise subtitle appearance.

2. Complex Formatting : These formats support complex formatting options, such as multiple text layers, rotation, and advanced positioning. This flexibility is particularly useful for typesetting and stylized subtitles.

3. Rich Metadata : The Script Info section allows you to include detailed metadata about the subtitle file, enhancing its documentation and organization.

4. Script Type : .ass/.ssa formats support advanced script types, making them suitable for various applications, including karaoke, complex animations, and typesetting.

5. Versatility : .ass/.ssa subtitles work well with multimedia players that support them, making them suitable for a wide range of video content.

6. Multilingual Support : These formats can handle subtitles in multiple languages within a single file, making them versatile for international audiences.

7. Precise Timing : .ass/.ssa formats enable precise control over subtitle timing, ensuring synchronization with video dialogues and actions.

8. Editability : While they are more complex than some other formats, .ass/.ssa files are still human-readable and can be edited using text editors, providing flexibility for subtitlers and content creators.

9. Community and Tool Support : There is an active community of subtitlers and tools like Aegisub that facilitate the creation and editing of .ass/.ssa subtitles.

10. Compatibility : .ass/.ssa formats are supported by various media players and multimedia applications, making them suitable for both professional and amateur subtitlers.

In conclusion, the .ass and .ssa subtitle formats offer powerful styling, formatting, and timing capabilities, making them a preferred choice for subtitlers who require advanced features and precise control over subtitle appearance. These formats are widely compatible and versatile for a range of multimedia content.

Timed Text Markup Language (TTML)

Timed Text Markup Language (TTML) is a comprehensive and standardized format for representing subtitles, captions, and other timed text in multimedia content. TTML is widely used in broadcasting, streaming, and web-based video services. It offers a rich set of features for creating accessible and styled text tracks. Here, we’ll explore the structure and advantages of TTML for subtitles and captions.

The TTML Structure

TTML documents consist of XML markup that represents timed text content. The structure of a TTML document typically includes the following components:

Head : The head section contains metadata and styling information for the entire TTML document. It may include details such as the language of the subtitles, styling preferences, and metadata about the content.

Body : The body section contains the main content of the subtitles or captions. It includes individual text elements that are associated with specific timing cues.

Styling : TTML allows for detailed styling of subtitles, including font family, size, color, background, positioning, and more. Styling is typically defined within the head section and can be applied to individual text elements within the body.

Timing Information : Each text element within the body of the TTML document is associated with timing information, specifying when the text should be displayed and when it should disappear.

An Example of TTML in Action:

In this TTML example:

  • The <tt> element defines the TTML document.
  • The <head> section contains metadata and styling information.
  • The <body> section contains the main content, with each subtitle enclosed in a <div> element.
  • Timing information (begin and end attributes) specifies when each subtitle is displayed.
  • Styling information is defined in the <styling> section and applied to each <div> element.

The Pros of TTML Subtitles

1. Standardization : TTML is an industry-standard format with well-defined specifications, making it suitable for professional and broadcast applications.

2. Rich Styling : TTML supports extensive styling options, allowing for precise control over the appearance of subtitles, including font properties, colors, backgrounds, and positioning.

3. Internationalization : TTML provides excellent support for multilingual content and ensures accurate text rendering for various languages and scripts.

4. Accessibility : TTML supports accessibility features, such as text descriptions, ensuring that content is accessible to individuals with disabilities.

5. Timing Precision : TTML allows for precise timing control, ensuring synchronization with video dialogues and actions.

6. Versatility : TTML can be used in various scenarios, including broadcasting, streaming services, and web-based multimedia content.

7. Compatibility : TTML is supported by a wide range of multimedia players and platforms, making it a reliable choice for content distribution.

8. Community and Tool Support : There are numerous authoring tools and software applications available for creating and editing TTML subtitles, facilitating content production.

9. XML Format : Being based on XML, TTML documents are structured and machine-readable, enabling automated processing and integration with other systems.

10. Global Adoption : TTML is widely adopted by broadcasters and streaming platforms worldwide, making it a global standard for timed text representation.

In summary, TTML is a robust and versatile format for creating subtitles and captions in multimedia content. Its standardization, rich styling options, internationalization support, and accessibility features make it a preferred choice for professional and accessible video content distribution.

SAMI Format (.smi)

The SAMI (Synchronized Accessible Media Interchange) format, often referred to as SMI, is a versatile format for creating subtitles and captions in multimedia content. It is widely supported and offers a structured and easy-to-understand framework for displaying text alongside video or audio content. When viewed in a text editor, an SMI file displays the time when text appears and corresponding subtitles, similar to the SRT format.

The SMI Structure

Let’s take a closer look at how SMI works. Think of a subtitle as a brief snippet of text that appears on your screen while you’re watching a video or listening to audio. SMI organizes these snippets in a clear and structured manner.

Each snippet in an SMI file typically consists of three main parts:

Sequence Number : Similar to SRT, the sequence number tags each subtitle, indicating the order in which they should appear.

Timing Information : This section specifies when the subtitle should be displayed and when it should disappear. It serves as a time reference for your media player, ensuring precise synchronization.

Text Content : The text content is the actual subtitle or caption. It represents the spoken dialogue, captions, or any relevant text that accompanies the media.

Here’s an example of SMI in action:

In this SMI example, each snippet is enclosed within <SYNC> tags, specifying the timing for when the text should be displayed. The text content is styled with font color attributes to enhance readability.

The Pros of SMI Subtitles

1. Versatile Format : SMI is a versatile format suitable for both video and audio content, making it a valuable tool for multimedia creators.

2. Standard Structure : Similar to SRT, SMI uses a standardized structure that is easy to understand and edit, even with basic text editors.

3. Precise Timing : SMI provides precise timing control, ensuring that subtitles are displayed and removed at the right moments, enhancing the viewer’s experience.

4. Accessibility : SMI supports accessibility features, making it suitable for creating content that is inclusive and accessible to individuals with disabilities.

5. Styling Options : Like the example, you can style SMI subtitles using HTML and CSS attributes, allowing for customization and improved visual appeal.

6. Global Compatibility : SMI is supported by various media players and platforms, ensuring compatibility with a wide range of devices and applications.

7. Multilingual Support : SMI can accommodate subtitles and captions in multiple languages within a single file, making it ideal for reaching a diverse audience.

8. User-Friendly Editing : SMI files are human-readable and can be edited with standard text editors, simplifying the editing process for content creators.

9. Educational Applications : SMI is commonly used in educational contexts to provide transcripts, translations, and additional context for audio and video materials.

In summary, the SAMI format (SMI) offers a structured and accessible way to display subtitles and captions alongside multimedia content. Its versatility, standardized structure, and compatibility make it a valuable choice for multimedia creators aiming to enhance the accessibility and comprehension of their videos and audio clips.

Binary subtitles formats

While text-based formats like SRT and WebVTT are popular for their simplicity and universality, there exists an intriguing world of binary subtitle formats. These formats offer a distinct approach. In this chapter, we’ll explore the world of binary subtitle formats, examining their unique features and applications in the realm of video content.

STL (Spruce Subtitle File)

STL (Spruce Subtitle File) is a widely used binary subtitle format primarily employed in the broadcasting and video production industries. STL files contain graphical representations of subtitles in the form of bitmaps, allowing for precise styling, positioning, and timing. Here’s more information about STL:

1. Bitmap-Based Subtitles: STL subtitles are image-based, which means that each subtitle is represented as a bitmap image. These bitmap images can include text, symbols, and other graphical elements that make up the subtitles.

2. Precise Timing: STL files include timing information that specifies when each subtitle should appear and disappear during video playback. This timing information is crucial for synchronizing subtitles with the corresponding video or audio content.

3. Positioning and Styling: STL subtitles offer flexibility in terms of positioning and styling. Subtitle images can be placed at specific locations on the screen, and various font styles, colors, and sizes can be applied to enhance readability and visual appeal.

4. Compatibility: STL is a widely recognized and supported format in the broadcasting industry. It is compatible with professional broadcast equipment and video editing software used by broadcasters and post-production professionals.

5. Standardization: STL adheres to specific standards, making it suitable for television broadcast and professional video production. The format ensures consistent rendering of subtitles across different broadcasting systems.

6. Multilingual Support: STL supports multiple languages and character sets, making it suitable for international broadcasting and providing subtitles in various languages.

7. Editing: While STL files are primarily used for broadcast purposes, they can be edited using specialized software to adjust timing, positioning, and styling of subtitles when necessary.

8. Subtitle Placement: STL subtitles can be positioned at different locations on the screen, such as the top, bottom, or sides, depending on the requirements of the content and broadcasting standards.

9. Legacy Format: STL has been in use for many years and is considered a legacy format in the broadcasting industry. However, it continues to be a reliable choice for adding subtitles to television programs, documentaries, and other broadcasted content.

10. European Variant (EBU STL): The European Broadcasting Union (EBU) has its variant of STL, known as EBU STL. EBU STL adheres to European broadcasting standards and includes support for international character sets and symbols commonly used in European languages.

STL subtitles are essential for professional video production and broadcasting, ensuring that subtitles are accurately timed, visually appealing, and compliant with industry standards. While text-based subtitle formats are more common for consumer video content, STL remains a trusted format for delivering high-quality subtitles in the broadcast industry, where precision and consistency are paramount.

PAC (Presentation Audio/Video Coding)

PAC (Presentation Audio/Video Coding) is a binary subtitle format primarily used for storing and displaying subtitles in the context of DVDs (Digital Versatile Discs) and DVD-Video. PAC files contain graphical representations of subtitles in the form of bitmap images and are used to provide high-quality, visually appealing subtitles for DVD content. Here’s more information about PAC:

1. Bitmap-Based Subtitles: PAC subtitles are image-based, meaning that each subtitle is represented as a bitmap image. These bitmap images can include text, symbols, and other graphical elements that make up the subtitles.

2. High Quality: One of the key features of PAC is its ability to provide high-quality and visually appealing subtitles. The use of bitmap images allows for detailed and well-styled subtitles, including various font styles, sizes, and colors.

3. Precise Timing: PAC files include timing information that specifies when each subtitle should appear and disappear during DVD playback. This timing information ensures that subtitles are synchronized with the corresponding video or audio content.

4. Versatility: PAC subtitles are versatile and can support subtitles in multiple languages, making them suitable for international DVDs. Different language tracks can be included, allowing viewers to select their preferred language.

5. DVD Compatibility: PAC is commonly used in the context of DVDs, including DVD movies, TV series, and other video content. It is recognized by DVD players and is part of the DVD-Video standard.

6. Editing: While PAC files are primarily used for DVD production, they can be edited using specialized software to adjust timing, positioning, and styling of subtitles when necessary. This is particularly important for DVD authoring and post-production.

7. Overlaying Capability: PAC subtitles can be overlaid on the video content, ensuring that they appear seamlessly during DVD playback. This overlaying capability contributes to the overall viewing experience.

8. Advanced Styling: PAC allows for advanced styling options, including the use of different fonts, font sizes, colors, and special effects. This flexibility enables DVD producers to create subtitles that match the visual style of the content.

9. Legacy Format: PAC is considered a legacy format, primarily used for DVDs and DVD-Video. While newer formats may offer more features and flexibility, PAC continues to be used in the DVD industry.

PAC subtitles play a crucial role in enhancing the viewer’s experience when watching DVDs, ensuring that subtitles are not only informative but also visually engaging. While this format is less common in modern streaming platforms and digital media, it remains essential for DVD production, where high-quality subtitles are a key component of the content.

Other binary formats

Aside from STL and PAC, there are several other binary subtitle formats that you may consider for various purposes. These formats may vary in terms of importance or adoption depending on your specific requirements and industry. Here is a list of some binary subtitle formats in approximate order of their importance and adoption:

CineCanvas : Specifically designed for digital cinema, CineCanvas provides high-quality, image-based subtitles for cinematic presentations. It’s crucial in the film industry.

DVB-Subtitles : Used in digital television broadcasting, including both image-based and text-based subtitle options. DVB subtitles are prevalent in Europe and other regions.

IMX Subtitles : Supported in the IMX professional video format, used in broadcast and post-production environments. It allows for image-based subtitles.

OP-47 and OP-42 : Formats for encoding and decoding bitmap subtitles within MPEG-2 (OP-47) and MPEG-4 (OP-42) video streams, commonly used in digital broadcasting.

HDMV PGS (Presentation Graphic Stream) : A format used in Blu-ray Discs, providing graphic-based subtitles. Important for authoring Blu-ray content.

Teletext : A binary subtitle format used for analog and digital television broadcasting, commonly found in some regions.

ATSC A/53 Part 4 Captions : Used for closed captions in ATSC digital television broadcasts in North America.

CPCM : Used for bitmap-based captions in digital television broadcasting, with various regional implementations.

XDS (Extended Data Services) : Provides a way to deliver additional data, including captions and subtitles, in the context of television broadcasting.

The importance and adoption of these formats can vary widely based on geographic location, industry, and the specific use case. For example, STL and PAC are crucial in broadcasting and DVD production, while formats like CineCanvas are vital in digital cinema. It’s essential to consider your specific needs and the standards prevalent in your region or industry when choosing a binary subtitle format.

You may also like

Simplified subtitling with matesub: exploring various user-friendly workflows.

Cover image for Simplified Subtitling with Matesub: Exploring Various User-Friendly Workflows

Understanding Automatic Speech Recognition (ASR) for Translators and Subtitlers

Cover image for Understanding Automatic Speech Recognition (ASR) for Translators and Subtitlers

Captions, Closed Captions and Subtitles: What is the Difference?

Cover image for Captions, Closed Captions and Subtitles: What is the Difference?

How Subtitles Boost Engagement For Your Videos

Cover image for How Subtitles Boost Engagement For Your Videos

Subtitles for the Deaf and Hard of Hearing - An overview

Cover image for Subtitles for the Deaf and Hard of Hearing - An overview

This activity has received funding from the European Institute of Innovation and Technology (EIT). This body of the European Union receives support from the European Union's Horizon 2020 research and innovation programme.

Digiarty Software

Winxvideo AI

  • Data Transfer
  • Download Center

WinX DVD Ripper Platinum

  • WinX DVD Ripper Platinum

Rip a full DVD to MP4 (H.264/HEVC) in 5 mins. Backup DVD collection to hard drive, USB, etc with original quality. GPU Accelerated.

  • Support old/new/99-title DVD
  • 1:1 copy DVD
  • Full GPU acceleration

More DVD tips and solutions >>

Winxvideo AI

AI-powered video/image enhancer. Complete toolkit to upscale, stabilize, convert, compress, record, & edit 4K/8K/HDR videos. Cinema-grade quality. Full GPU accelerated.

  • AI Video Enhance
  • AI Image Restore/Enhance
  • Convert Video
  • Record Screen

More video tips and tutorials >>

WinX MediaTrans

  • WinX MediaTrans

Manage, backup & transfer videos, music, photos between iPhone iPad and computer in an easier way. Free up space and fast two-way sync.

  • Backup Photo
  • Manage Music
  • Transfer Video
  • Make Ringtone
  • Encrypt File

More data transfer solutions and guides >>

Sales FAQ icon

Find the answers to purchase benefits, license code, refund, etc.

Tech FAQ icon

Get help yourself if you have any questions with Digiarty Software.

User Guide

Tutorials and step-by-step guides to learn how to use our products.

About Us

We've been focused on multimedia software solutions, since 2006.

presentation graphic stream specification

Contact Support Team Have any questions on purchase or need technical support, please contact us >>

  • All-in-one subtitle toolbox. Search, download, extract, and add subtitles.
  • Convert & compress HD, 4K, 8K, MP4, MKV, HEVC, AV1, ProRes and more videos.
  • Record screen, webcam, gameplay, online courses, etc.
  • Edit videos: cut, trim, crop, split, rotate, watermark, effect, etc.
  • AI-powered video & image enhancer: upscale, stabilize, interpolate, denoise, etc.

presentation graphic stream specification

Presentation Grapic Stream Subtitle Format

The Blu-ray MKV or m2ts format allows for two types of on-screen overlays that can be used for subtitles. One is based on text but as for now I've seen no Blu-ray using this one for subs. The other one is the PGS (Presentation Grapic Stream) and consists in bitmaps (and the timeframes on which they have to be displayed). That second stream is by far the most commonly used by Blu-ray discs. That’s no surprising you see a lot of MakeMKV Blu-ray rips with PGS subtitles. As we'll see below, tools exist to extract that stream to .sup files. It's not the same format as .sup files that some tools extract from DVDs.

Note that HD DVD have also a .sup format which is slightly different from the Blu-ray one. As far as the PCH is not (yet) able to display PGS, the only way to get subs for MKV, ts or m2ts material is to use a side text file (.srt) containing the subs. In the following section, you will learn what is PGS subtitle, how it differs from SRT and what to do when PGS subtitles not showing.

Table of Contents

  • Part 1. What Is PGS Subtitle? What's the Difference between SRT and PGS?

Part 2. Can I Extract PGS Subtitles from MKV for Editing?

  • Part 3. What to Do When HDMV PGS subtitles Not Showing?
  • Part 4. FAQ about PGS subtitles

Part 1. What Is PGS Subtitle? What's the Difference between SRT vs PGS?

From time to time, you may notice that most movies have the PGS. So what’s the difference between PGS subtitles and SRT subtitles? Frankly, SRT is a text in .txt with timing and the lines of text inside, which you can modify even with WordPad. Compared with SRT, PGS usually have a lot of colors, styles, etc, especially for karaokes in Disney movies or so, which cannot be modified easily. And the size of PGS is therefore much bigger than that of SRT.

Frankly, PGS subtitles are graphics-based, which are extremely difficult to remove or edit from the video stream once they are embedded in the video. Extracting or changing them requires OCR program.

Have you ever met with problems with Plex when you try to stream 4K MakeMKV rips with PGS subtitles. Typically, the original source video is transcoded down to 1080p instead of direct play. But if the PGS subtitles are turned off, it will revert back to original 4K quality.

This is because the plex transcoding issue happens whenever you are using an image based subtitle format like PGS. So if you want to play original 4k on Plex without resolution downscaled, you’d better to make sure the 4K HDR files have SRT subtitles rather than PSG subtitles which Plex cannot play them directly. To fix the error, you can use Subtitle Edit to convert .pgs to srt. It uses OCR to basically convert PGS subtitles to SRT, so Plex wont transcode. Actually, the PGS subtitles not showing error also occurs when you are playing the video on Samsung or other TVs. You need to convert PGS subtitles to SRT with tools like Subtitle Edit. You can follow the steps below:

  • Go to https://www.nikse.dk/subtitleedit and download the app, install and launch it.
  • Click File and import subtitle from MKV file. Or you can drag and drop the file into the program.
  • Select the video file with PGS subtitles and open it by tapping Open, after which the OCR window pops up.
  • Press Start OCR and let Subtitle Edit transcribe it. Modify the incorrect transcriptions if necessary, located at the left side of OCR button.
  • Click OK. Then click on Format, select SubRip (.srt), set Encoding as Unicode (UTF-8). Click File > Save as, customize the filename and click Save. It will begin converting PGS to SRT subtitles.

Note: OCR through SubtitleEdit is the diy method, which is time consuming and unreliable someway. Here is an alternative. You can download SRT subtitles from somewhere like subscrene.com. If you don’t know where to go, here are some free sites to download subtitles for movies and TV shows. Then use MKVtoolNix to remove PGS subtitles, add SRT files to the MKV file. It will not re-encode the files and run fast.

Part 3. HDMV PGS subtitles Not Showing? Hardcode Subtitles to MKV, MP4 or Other Files

Mostly, subtitles won’t show or appear when playing video with PGS subtitles on PC, TV, mobile, or media player. This is also true when the names of the video file and subtitles are not the same. The way to fix the problem is to hardcode subtitles to video. Winxvideo AI is a free converter and editor that can help you hardcode subtitles to any video, be it 4K, MP4, or MKV, and therefore make it playable on your device.

How to Hardcode Subtitles to Video with Winxvideo AI?

  • Load video into the program. Click +Video button to import the MKV rip with subtitles
  • Choose the video format if needed. For the best compatibility, you can choose MP4 H.264 as per your needs.
  • . Hardcode subtitles to the video. Click the Edit button on the main interface to activate the basic editing feature. Go to Subtitles > Add subtitle files > and hardcode or softcode a subtitle file to video. If you haven’t any subtitle file, you can also click the Search subtitle file to to download your preferred subtitles beforehand.
  • Click Browse button to save the subtitled video, and press RUN button to beging hardcoding subtitles to MKV or MP4.

Related: learn how to add subtitles to video in detail >>

add subtitle to video

Part 4. FAQ about PGS Subtitles

1. how to display pgs subtitles in plex.

If your media contains embedded PGS subtitles, you have local subtitles. You can configure the Local Media Assets by following the steps below: Launch the Plex Web App, choose Settings from the top right of the home screen, select Plex Media Server from the horizontal list, choose Agents, select the library type and agent to change, check Local Media Assents and make it topmost in the list. Plex supports some PGS subtitles, but it will transcode the video into burned-in subtitles for streaming.

2. Why does Handbrake Burn in Subtitles from Blu-ray Source?

Some people noticed that Handbrake always burned in subtitles when creating an MP4 from a blu-ray source. The case is true. Handbrake offers two methods of subtitle output: hard burn and soft subtitles. When it comes to the soft subtitles, Handbrake burns only 1 PGS subtitle track into MP4 video as it doesn’t support PGS pass-through, while it passes through multiple PGS subtitle tracks with MKV. So if you are exporting video to MP4, the PGS subtitles are burned in the video automatically.

ABOUT THE AUTHOR

author-Jack Watt

Jack Watt is a sought-after editor at Digiarty. He is responsible for digital and multimedia world, delivering definitive video and audio related software reviews, enlightening guides, and incisive analysis. As a fan of Apple, Jack Watt also brings his experience to more readers and focuses on writing of the Apple ecosystem at large.

presentation graphic stream specification

W3C

Portable Network Graphics (PNG) Specification (Third Edition)

W3C Candidate Recommendation Snapshot 21 September 2023

Copyright © 1996-2023 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.

This document describes PNG (Portable Network Graphics), an extensible file format for the lossless , portable, well-compressed storage of static and animated raster images. PNG provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-colour , greyscale , and truecolour images are supported, plus an optional alpha channel. Sample depths range from 1 to 16 bits.

PNG is designed to work well in online viewing applications, such as the World Wide Web, so it is fully streamable with a progressive display option. PNG is robust, providing both full file integrity checking and simple detection of common transmission errors. Also, PNG can store colour space data for improved colour matching on heterogeneous platforms.

This specification defines two Internet Media Types, image/png and image/apng.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This specification is intended to become an International Standard, but is not yet one. It is inappropriate to refer to this specification as an International Standard.

This document was published by the Portable Network Graphics (PNG) Working Group as a Candidate Recommendation Snapshot using the Recommendation track .

Publication as a Candidate Recommendation does not imply endorsement by W3C and its Members. A Candidate Recommendation Snapshot has received wide review , is intended to gather implementation experience , and has commitments from Working Group members to royalty-free licensing for implementations.

This Candidate Recommendation is not expected to advance to Proposed Recommendation any earlier than 21 December 2023.

This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

This document is governed by the 12 June 2023 W3C Process Document .

1. Introduction

The design goals for this specification were:

  • Portability: encoding, decoding, and transmission should be software and hardware platform independent.
  • Completeness: it should be possible to represent truecolour , indexed-colour , and greyscale images, in each case with the option of transparency, colour space information, and ancillary information such as textual comments.
  • Serial encode and decode: it should be possible for datastreams to be generated serially and read serially, allowing the datastream format to be used for on-the-fly generation and display of images across a serial communication channel.
  • Progressive presentation: it should be possible to transmit datastreams so that an approximation of the whole image can be presented initially, and progressively enhanced as the datastream is received.
  • Robustness to transmission errors: it should be possible to detect datastream transmission errors reliably.
  • Losslessness: filtering and compression should preserve all information.
  • Performance: any filtering, compression, and progressive image presentation should be aimed at efficient decoding and presentation. Fast encoding is a less important goal than fast decoding. Decoding speed may be achieved at the expense of encoding speed.
  • Compression: images should be compressed effectively, consistent with the other design goals.
  • Simplicity: developers should be able to implement the standard easily.
  • Interchangeability: any standard-conforming PNG decoder shall be capable of reading all conforming PNG datastreams.
  • Flexibility: future extensions and private additions should be allowed for without compromising the interchangeability of standard PNG datastreams.
  • Freedom from legal restrictions: no algorithms should be used that are not freely available.

This specification specifies a datastream and an associated file format, Portable Network Graphics (PNG, pronounced "ping"), for a lossless , portable, compressed individual computer graphics image or frame-based animation, transmitted across the Internet.

3. Terms, definitions, and abbreviated terms

For the purposes of this specification the following definitions apply.

Chromaticity is a measure of the quality of a color regardless of its luminance.

The foreground image is said to be composited against the background.

SOURCE: [ RFC1951 ]

Software causes an image to appear on screen by loading the image into the frame buffer .

Luminance and chromaticity together fully define a perceived colour. A formal definition of luminance is found at [ COLORIMETRY ].

Only RGB may be used in PNG, ICtCp is NOT supported.

a four-byte unsigned integer limited to the range 0 to 2 31 -1.

The restriction is imposed in order to accommodate languages that have difficulty with unsigned four-byte values.

Standard dynamic range is independent of the primaries and hence, gamut. Wide color gamut SDR formats are supported by PNG.

deflate -style compression method.

SOURCE: [ rfc1950 ]

Also refers to the name of a library containing a sample implementation of this method

type of check value designed to detect most transmission errors.

A decoder calculates the CRC for the received data and checks by comparing it to the CRC calculated by the encoder and appended to the data. A mismatch indicates that the data or the CRC were corrupted in transit

4. Concepts

4.1 static and animated images.

All PNG images contain a single static image .

Some PNG images — called Animated PNG ( APNG ) — also contain a frame-based animation sequence, the animated image . The first frame of this may be — but need not be — the static image . Non-animation-capable displays (such as printers) will display the static image rather than the animation sequence.

The static image , and each individual frame of an animated image , corresponds to a reference image and is stored as a PNG image .

This specification specifies the PNG datastream, and places some requirements on PNG encoders, which generate PNG datastreams, PNG decoders, which interpret PNG datastreams, and PNG editors , which transform one PNG datastream into another. It does not specify the interface between an application and either a PNG encoder, decoder, or editor. The precise form in which an image is presented to an encoder or delivered by a decoder is not specified. Four kinds of image are distinguished.

The relationships between the four kinds of image are illustrated in Figure 1 .

The relationships between samples, channels, pixels, and sample depth are illustrated in Figure 2 .

4.3 Colour spaces

The RGB colour space in which colour samples are situated may be specified in one of four ways:

  • by CICP image format signaling metadata;
  • by an ICC profile;
  • by specifying explicitly that the colour space is sRGB when the samples conform to this colour space;
  • by specifying a gamma value and the 1931 CIE x,y chromaticities of the red, green, and blue primaries used in the image and the reference white point .

For high-end applications the first two methods provides the most flexibility and control. The third method enables one particular, but extremely common, colour space to be indicated. The fourth method, which was standardized before ICC profiles were widely adopted, enables the exact chromaticities of the RGB data to be specified, along with the gamma correction to be applied (see C. Gamma and chromaticity ). However, colour-aware applications will prefer one of the first three methods, while colour-unaware applications will typically ignore all four methods.

Gamma correction is not applied to the alpha channel, if present. Alpha samples always represent a linear fraction of full opacity.

Mastering metadata may also be provided

4.4 Reference image to PNG image transformation

Introduction.

A number of transformations are applied to the reference image to create the PNG image to be encoded (see Figure 3 ). The transformations are applied in the following sequence, where square brackets mean the transformation is optional:

When every pixel is either fully transparent or fully opaque, the alpha separation, alpha compaction, and indexing transformations can cause the recovered reference image to have an alpha sample depth different from the original reference image, or to have no alpha channel. This has no effect on the degree of opacity of any pixel. The two reference images are considered equivalent, and the transformations are considered lossless. Encoders that nevertheless wish to preserve the alpha sample depth may elect not to perform transformations that would alter the alpha sample depth.

4.4.1 Alpha separation

If all alpha samples in a reference image have the maximum value, then the alpha channel may be omitted, resulting in an equivalent image that can be encoded more compactly.

4.4.2 Indexing

If the number of distinct pixel values is 256 or less, and the RGB sample depths are not greater than 8, and the alpha channel is absent or exactly 8 bits deep or every pixel is either fully transparent or fully opaque, then the alternative indexed-colour representation, achieved through an indexing transformation, may be more efficient for encoding. In the indexed-colour representation, each pixel is replaced by an index into a palette. The palette is a list of entries each containing three 8-bit samples (red, green, blue). If an alpha channel is present, there is also a parallel table of 8-bit alpha samples, called the alpha table .

A suggested palette or palettes may be constructed even when the PNG image is not indexed-colour in order to assist viewers that are capable of displaying only a limited number of colours.

For indexed-colour images, encoders can rearrange the palette so that the table entries with the maximum alpha value are grouped at the end. In this case the table can be encoded in a shortened form that does not include these entries.

Encoders creating indexed-color PNG must not insert index values greater than the actual length of the palette table; to do so is an error, and decoders will vary in their handling of this error.

4.4.3 RGB merging

If the red, green, and blue channels have the same sample depth, and, for each pixel, the values of the red, green, and blue samples are equal, then these three channels may be merged into a single greyscale channel.

4.4.4 Alpha compaction

For non-indexed images, if there exists an RGB (or greyscale) value such that all pixels with that value are fully transparent while all other pixels are fully opaque, then the alpha channel can be represented more compactly by merely identifying the RGB (or greyscale) value that is transparent.

4.4.5 Sample depth scaling

In the PNG image, not all sample depths are supported (see 6.1 Colour types and values ), and all channels shall have the same sample depth. All channels of the PNG image use the smallest allowable sample depth that is not less than any sample depth in the reference image, and the possible sample values in the reference image are linearly mapped into the next allowable range for the PNG image. Figure 5 shows how samples of depth 3 might be mapped into samples of depth 4.

Allowing only a few sample depths reduces the number of cases that decoders have to cope with. Sample depth scaling is reversible with no loss of data, because the reference image sample depths can be recorded in the PNG datastream. In the absence of recorded sample depths, the reference image sample depth equals the PNG image sample depth. See 12.4 Sample depth scaling and 13.12 Sample depth rescaling .

4.5 PNG image

The transformation of the reference image results in one of five types of PNG image (see Figure 6 ) :

The format of each pixel depends on the PNG image type and the bit depth. For PNG image types other than indexed-colour, the bit depth specifies the number of bits per sample, not the total number of bits per pixel. For indexed-colour images, the bit depth specifies the number of bits in each palette index, not the sample depth of the colours in the palette or alpha table. Within the pixel the samples appear in the following order, depending on the PNG image type.

  • Truecolour with alpha : red, green, blue, alpha.
  • Greyscale with alpha : grey, alpha.
  • Truecolour : red, green, blue.
  • Greyscale : grey.
  • Indexed-colour : palette index.

4.6 Encoding the PNG image

A conceptual model of the process of encoding a PNG image is given in Figure 7 . The steps refer to the operations on the array of pixels or indices in the PNG image. The palette and alpha table are not encoded in this way.

  • Pass extraction: to allow for progressive display, the PNG image pixels can be rearranged to form several smaller images called reduced images or passes.
  • Scanline serialization: the image is serialized a scanline at a time. Pixels are ordered left to right in a scanline and scanlines are ordered top to bottom.
  • Filtering: each scanline is transformed into a filtered scanline using one of the defined filter types to prepare the scanline for image compression.
  • Compression: occurs on all the filtered scanlines in the image.
  • Chunking: the compressed image is divided into conveniently sized chunks. An error detection code is added to each chunk.
  • Datastream construction: the chunks are inserted into the datastream.

4.6.1 Pass extraction

Pass extraction (see Figure 7 ) splits a PNG image into a sequence of reduced images where the first image defines a coarse view and subsequent images enhance this coarse view until the last image completes the PNG image. The set of reduced images is also called an interlaced PNG image. Two interlace methods are defined in this specification. The first method is a null method; pixels are stored sequentially from left to right and scanlines from top to bottom. The second method makes multiple scans over the image to produce a sequence of seven reduced images. The seven passes for a sample image are illustrated in Figure 7 . See 8. Interlacing and pass extraction .

4.6.2 Scanline serialization

Each row of pixels, called a scanline, is represented as a sequence of bytes.

4.6.3 Filtering

PNG allows image data to be filtered before it is compressed. Filtering can improve the compressibility of the data. The filter operation is deterministic, reversible, and lossless. This allows the decompressed data to be reverse-filtered in order to obtain the original data. See 7.3 Filtering .

4.6.4 Compression

The sequence of filtered scanlines in the pass or passes of the PNG image is compressed (see Figure 9 ) by one of the defined compression methods. The concatenated filtered scanlines form the input to the compression stage. The output from the compression stage is a single compressed datastream. See 10. Compression .

4.6.5 Chunking

Chunking provides a convenient breakdown of the compressed datastream into manageable chunks (see Figure 9 ). Each chunk has its own redundancy check. See 11. Chunk specifications .

4.7 Additional information

Ancillary information may be associated with an image. Decoders may ignore all or some of the ancillary information. The types of ancillary information provided are described in Table 1 .

4.8 PNG datastream

4.8.1 chunks.

The PNG datastream consists of a PNG signature (see 5.2 PNG signature ) followed by a sequence of chunks (see 11. Chunk specifications ). Each chunk has a chunk type which specifies its function.

4.8.2 Chunk types

Chunk types are four-byte sequences chosen so that they correspond to readable labels when interpreted in the ISO 646.IRV:1991 [ ISO646 ] character set. The first four are termed critical chunks, which shall be understood and correctly interpreted according to the provisions of this specification. These are:

  • IHDR : image header, which is the first chunk in a PNG datastream.
  • PLTE : palette table associated with indexed PNG images.
  • IDAT : image data chunks.
  • IEND : image trailer, which is the last chunk in a PNG datastream.

The remaining chunk types are termed ancillary chunk types, which encoders may generate and decoders may interpret.

  • Transparency information: tRNS (see 11.3.1 Transparency information ).
  • Colour space information: cHRM , gAMA , iCCP , sBIT , sRGB , cICP , mDCv (see 11.3.2 Colour space information ).
  • Textual information: iTXt , tEXt , zTXt (see 11.3.3 Textual information ).
  • Miscellaneous information: bKGD , hIST , pHYs , sPLT , eXIf (see 11.3.4 Miscellaneous information ).
  • Time information: tIME (see 11.3.5 Time stamp information ).
  • Animation information: acTL , fcTL , fdAT (see 11.3.6 Animation information ).

4.9 APNG : frame-based animation

Animated PNG ( APNG ) extends the original, static-only PNG format, adding support for frame -based animated images. It is intended to be a replacement for simple animated images that have traditionally used the GIF format [ GIF ], while adding support for 24-bit images and 8-bit transparency, which GIF lacks.

APNG is backwards-compatible with earlier versions of PNG; a non-animated PNG decoder will ignore the ancillary APNG -specific chunks and display the static image .

4.9.1 Structure

An APNG stream is a normal PNG stream as defined in previous versions of the PNG Specification, with three additional chunk types describing the animation and providing additional frame data.

To be recognized as an APNG , an acTL chunk must appear in the stream before any IDAT chunks. The acTL structure is described below .

Conceptually, at the beginning of each play the output buffer shall be completely initialized to a fully transparent black rectangle, with width and height dimensions from the IHDR chunk.

The static image may be included as the first frame of the animation by the presence of a single fcTL chunk before IDAT . Otherwise, the static image is not part of the animation.

Subsequent frames are encoded in fdAT chunks, which have the same structure as IDAT chunks, except preceded by a sequence number . Information for each frame about placement and rendering is stored in fcTL chunks. The full layout of fdAT and fcTL chunks is described below .

The boundaries of the entire animation are specified by the width and height parameters of the IHDR chunk, regardless of whether the default image is part of the animation. The default image should be appropriately padded with fully transparent black pixels if extra space will be needed for later frames.

Each frame is identical for each play, therefore it is safe for applications to cache the frames.

4.9.2 Sequence numbers

The fcTL and fdAT chunks have a zero-based, 4 byte sequence number. Both chunk types share the sequence. The purpose of this number is to detect (and optionally correct) sequence errors in an Animated PNG, since this specification does not impose ordering restrictions on ancillary chunks.

The first fcTL chunk shall contain sequence number 0, and the sequence numbers in the remaining fcTL and fdAT chunks shall be in ascending order, with no gaps or duplicates.

The tables below illustrate the use of sequence numbers for images with more than one frame, and more than one fdAT chunk for the second frame. ( IHDR and IEND chunks omitted in these tables, for clarity).

4.9.3 Output buffer

The output buffer is a pixel array with dimensions specified by the width and height parameters of the PNG IHDR chunk. Conceptually, each frame is constructed in the output buffer before being composited onto the canvas . The contents of the output buffer are available to the decoder. The corners of the output buffer are mapped to the corners of the canvas .

4.9.4 Canvas

The canvas is the area on the output device on which the frames are to be displayed. The contents of the canvas are not necessarily available to the decoder. If a bKGD chunk exists, it may be used to fill the canvas if there is no preferable background.

4.10 Error handling

Errors in a PNG datastream fall into two general classes:

  • transmission errors or damage to a computer file system, which tend to corrupt much or all of the datastream;
  • syntax errors, which appear as invalid values in chunks, or as missing or misplaced chunks. Syntax errors can be caused not only by encoding mistakes, but also by the use of registered or private values, if those values are unknown to the decoder.

PNG decoders should detect errors as early as possible, recover from errors whenever possible, and fail gracefully otherwise. The error handling philosophy is described in detail in 13.1 Error handling .

4.11 Extensions

This section is non-normative.

The PNG format exposes several extension points:

  • chunk type ;
  • text keyword ; and
  • private field values .

Some of these extension points are reserved by W3C , while others are available for private use.

5. Datastream structure

5.1 png datastream.

The PNG datastream consists of a PNG signature followed by a sequence of chunks. It is the result of encoding a PNG image .

The term datastream is used rather than "file" to describe a byte sequence that may be only a portion of a file. It is also used to emphasize that the sequence of bytes might be generated and consumed "on the fly", never appearing in a stored file at all.

5.2 PNG signature

The first eight bytes of a PNG datastream always contain the following hexadecimal values:

This signature indicates that the remainder of the datastream contains a single PNG image, consisting of a series of chunks beginning with an IHDR chunk and ending with an IEND chunk.

This signature differentiates a PNG datastream from other types of datastream and allows early detection of some transmission errors.

5.3 Chunk layout

Each chunk consists of three or four fields (see Figure 10 ). The meaning of the fields is described in Table 4 . The chunk data field may be empty.

The chunk data length may be any number of bytes up to the maximum; therefore, implementors cannot assume that chunks are aligned on any boundaries larger than bytes.

5.4 Chunk naming conventions

Chunk types are chosen to be meaningful names when the bytes of the chunk type are interpreted as ISO 646 letters [ ISO646 ]. Chunk types are assigned so that a decoder can determine some properties of a chunk even when the type is not recognized. These rules allow safe, flexible extension of the PNG format, by allowing a PNG decoder to decide what to do when it encounters an unknown chunk.

The naming rules are normally of interest only when the decoder does not recognize the chunk's type, as specified at 13. PNG decoders and viewers .

Four bits of the chunk type, the property bits, namely bit 5 (value 32) of each byte, are used to convey chunk properties. This choice means that a human can read off the assigned properties according to whether the letter corresponding to each byte of the chunk type is uppercase (bit 5 is 0) or lowercase (bit 5 is 1).

The property bits are an inherent part of the chunk type, and hence are fixed for any chunk type. Thus, CHNK and cHNk would be unrelated chunk types, not the same chunk with different properties.

The semantics of the property bits are defined in Table 5 .

The hypothetical chunk type " cHNk " has the property bits:

Therefore, this name represents an ancillary, public, safe-to-copy chunk.

5.5 CRC algorithm

CRC fields are calculated using standardized CRC methods with pre and post conditioning, as defined by [ ISO-3309 ] and [ ITU-T-V.42 ]. The CRC polynomial employed— which is identical to that used in the GZIP file format specification [ RFC1952 ]— is

x 32 + x 26 + x 23 + x 22 + x 16 + x 12 + x 11 + x 10 + x 8 + x 7 + x 5 + x 4 + x 2 + x + 1

In PNG, the 32-bit CRC is initialized to all 1's, and then the data from each byte is processed from the least significant bit (1) to the most significant bit (128). After all the data bytes are processed, the CRC is inverted (its ones complement is taken). This value is transmitted (stored in the datastream) MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x 31 term.

Practical calculation of the CRC often employs a precalculated table to accelerate the computation. See D. Sample CRC implementation .

5.6 Chunk ordering

The constraints on the positioning of the individual chunks are listed in Table 6 and illustrated diagrammatically for static images in Figure 11 and Figure 12 , for animated images where the static image forms the first frame in Figure 13 and Figure 14 , and for animated images where the static image is not part of the animation in Figure 15 and Figure 16 . These lattice diagrams represent the constraints on positioning imposed by this specification. The lines in the diagrams define partial ordering relationships. Chunks higher up shall appear before chunks lower down. Chunks which are horizontally aligned and appear between two other chunk types (higher and lower than the horizontally aligned chunks) may appear in any order between the two higher and lower chunk types to which they are connected. The superscript associated with the chunk type is defined in Table 7 . It indicates whether the chunk is mandatory, optional, or may appear more than once. A vertical bar between two chunk types indicates alternatives.

5.7 Defining chunks

5.7.1 general.

All chunks, private and public, SHOULD be listed at [ PNG-EXTENSIONS ].

5.7.2 Defining public chunks

Public chunks are reserved for definition by the W3C .

Public chunks are intended for broad use consistent with the philosophy of PNG.

Organizations and applications are encouraged to submit any chunk that meet the criteria above for definition as a public chunk by the PNG Working Group .

The definition as a public chunk is neither automatic nor immediate. A proposed public chunk type SHALL not be used in publicly available software or datastreams until defined as such.

The definition of new critical chunk types is discouraged unless necessary.

5.7.3 Defining private chunks

Organizations and applications MAY define private chunks for private and experimental use.

A private chunk SHOULD NOT be defined merely to carry textual information of interest to a human user. Instead iTXt chunk SHOULD BE used and corresponding keyword SHOULD BE used and a suitable keyword defined.

Listing private chunks at [ PNG-EXTENSIONS ] reduces, but does not eliminate, the chance that the same private chunk is used for incompatible purposes by different applications. If a private chunk type is used, additional identifying information SHOULD BE be stored at the beginning of the chunk data to further reduce the risk of conflicts.

An ancillary chunk type, not a critical chunk type, SHOULD be used for all private chunks that store information that is not absolutely essential to view the image.

Private critical chunks SHOULD NOT be defined because PNG datastreams containing such chunks are not portable, and SHOULD NOT be used in publicly available software or datastreams. If a private critical chunk is essential for an application, it SHOULD appear near the start of the datastream, so that a standard decoder need not read very far before discovering that it cannot handle the datastream.

See B. Guidelines for private chunk types for additional guidelines on defining private chunks.

5.8 Private field values

Values greater than or equal to 128 in the following fields are private field values :

  • compression method
  • interlace method
  • filter method

These private field values are neither defined nor reserved by this specification.

Private field values MAY be used for experimental or private semantics.

Private field values SHOULD NOT appear in publicly available software or datastreams since they can result in datastreams that are unreadable by PNG decoders as detailed at 13. PNG decoders and viewers .

6. Reference image to PNG image transformation

6.1 colour types and values.

As explained in 4.5 PNG image there are five types of PNG image. Corresponding to each type is a colour type , which is the sum of the following values: 1 (palette used), 2 ( truecolour used) and 4 (alpha used). greyscale and truecolour images may have an explicit alpha channel. The PNG image types and corresponding colour types are listed in Table 8 .

The allowed bit depths and sample depths for each PNG image type are listed in Image header .

Greyscale samples represent luminance if the transfer curve is indicated (by gAMA , sRGB , or iCCP ) or device-dependent greyscale if not. RGB samples represent calibrated colour information if the colour space is indicated (by gAMA and cHRM , or sRGB , or iCCP , or uncalibrated device-dependent colour if not.

Sample values are not necessarily proportional to light intensity; the gAMA chunk specifies the relationship between sample values and display output intensity. Viewers are strongly encouraged to compensate properly. See 4.3 Colour spaces , 13.13 Decoder gamma handling and C. Gamma and chromaticity .

6.2 Alpha representation

In a PNG datastream transparency may be represented in one of four ways, depending on the PNG image type (see alpha separation and alpha compaction ).

  • Truecolour with alpha , greyscale with alpha : an alpha channel is part of the image array.
  • truecolour , greyscale : A tRNS chunk contains a single pixel value distinguishing the fully transparent pixels from the fully opaque pixels.
  • Indexed-colour : A tRNS chunk contains the alpha table that associates an alpha sample with each palette entry.
  • truecolour , greyscale , indexed-colour : there is no tRNS chunk present and all pixels are fully opaque.

An alpha channel included in the image array has 8-bit or 16-bit samples, the same size as the other samples. The alpha sample for each pixel is stored immediately following the greyscale or RGB samples of the pixel. An alpha value of zero represents full transparency, and a value of 2 sampledepth - 1 represents full opacity. Intermediate values indicate partially transparent pixels that can be composited against a background image to yield the delivered image.

The colour values in a pixel are not premultiplied by the alpha value assigned to the pixel. This rule is sometimes called "unassociated" or "non-premultiplied" alpha. (Another common technique is to store sample values premultiplied by the alpha value; in effect, such an image is already composited against a black background. PNG does not use premultiplied alpha. In consequence an image editor can take a PNG image and easily change its transparency.) See 12.3 Alpha channel creation and 13.16 Alpha channel processing .

7. Encoding the PNG image as a PNG datastream

7.1 integers and byte order.

All integers that require more than one byte shall be in network byte order (as illustrated in Figure 17 ): the most significant byte comes first, then the less significant bytes in descending order of significance ( MSB LSB for two-byte integers, MSB B2 B1 LSB for four-byte integers). The highest bit (value 128) of a byte is numbered bit 7; the lowest bit (value 1) is numbered bit 0. Values are unsigned unless otherwise noted. Values explicitly noted as signed are represented in two's complement notation.

PNG four-byte unsigned integers are limited to the range 0 to 2 31 -1 to accommodate languages that have difficulty with unsigned four-byte values.

7.2 Scanlines

A PNG image (or pass, see 8. Interlacing and pass extraction ) is a rectangular pixel array, with pixels appearing left-to-right within each scanline, and scanlines appearing top-to-bottom. The size of each pixel is determined by the number of bits per pixel.

Pixels within a scanline are always packed into a sequence of bytes with no wasted bits between pixels. Scanlines always begin on byte boundaries. Permitted bit depths and colour types are restricted so that in all cases the packing is simple and efficient.

In PNG images of colour type 0 (greyscale) each pixel is a single sample, which may have precision less than a byte (1, 2, or 4 bits). These samples are packed into bytes with the leftmost sample in the high-order bits of a byte followed by the other samples for the scanline.

In PNG images of colour type 3 (indexed-colour) each pixel is a single palette index. These indices are packed into bytes in the same way as the samples for colour type 0.

When there are multiple pixels per byte, some low-order bits of the last byte of a scanline may go unused. The contents of these unused bits are not specified.

PNG images that are not indexed-colour images may have sample values with a bit depth of 16. Such sample values are in network byte order ( MSB first, LSB second). PNG permits multi-sample pixels only with 8 and 16-bit samples, so multiple samples of a single pixel are never packed into one byte.

7.3 Filtering

A filter method is a transformation applied to an array of scanlines with the aim of improving their compressibility.

PNG standardizes one filter method and several filter types that may be used to prepare image data for compression. It transforms the byte sequence into an equal length sequence of bytes preceded by a filter type byte (see Figure 18 for an example).

The encoder shall use only a single filter method for an interlaced PNG image, but may use different filter types for each scanline in a reduced image. An intelligent encoder can switch filters from one scanline to the next. The method for choosing which filter to employ is left to the encoder.

The filter type byte is not considered part of the image data , but it is included in the datastream sent to the compression step. See 9. Filtering .

8. Interlacing and pass extraction

Pass extraction (see figure 4.8 ) splits a PNG image into a sequence of reduced images (the interlaced PNG image) where the first image defines a coarse view and subsequent images enhance this coarse view until the last image completes the PNG image. This allows progressive display of the interlaced PNG image by the decoder and allows images to "fade in" when they are being displayed on-the-fly. On average, interlacing slightly expands the datastream size, but it can give the user a meaningful display much more rapidly.

8.1 Interlace methods

Two interlace methods are defined in this International Standard, methods 0 and 1. Other values of interlace method are reserved for future standardization.

With interlace method 0, the null method, pixels are extracted sequentially from left to right, and scanlines sequentially from top to bottom. The interlaced PNG image is a single reduced image.

Interlace method 1, known as Adam7, defines seven distinct passes over the image. Each pass transmits a subset of the pixels in the reference image. The pass in which each pixel is transmitted (numbered from 1 to 7) is defined by replicating the following 8-by-8 pattern over the entire image, starting at the upper left corner:

Figure 4.8 shows the seven passes of interlace method 1. Within each pass, the selected pixels are transmitted left to right within a scanline, and selected scanlines sequentially from top to bottom. For example, pass 2 contains pixels 4, 12, 20, etc. of scanlines 0, 8, 16, etc. (where scanline 0, pixel 0 is the upper left corner). The last pass contains all of scanlines 1, 3, 5, etc. The transmission order is defined so that all the scanlines transmitted in a pass will have the same number of pixels; this is necessary for proper application of some of the filters. The interlaced PNG image consists of a sequence of seven reduced images. For example, if the PNG image is 16 by 16 pixels, then the third pass will be a reduced image of two scanlines, each containing four pixels (see figure 4.8 ).

Scanlines that do not completely fill an integral number of bytes are padded as defined in 7.2 Scanlines .

NOTE If the reference image contains fewer than five columns or fewer than five rows, some passes will be empty.

9. Filtering

9.1 filter methods and filter types.

Filtering transforms the PNG image with the goal of improving compression. The overall process is depicted in Figure 7 while the specifics of serializing and filtering a scanline are shown in Figure 18 .

PNG allows for a number of filter methods . All the reduced images in an interlaced image shall use a single filter method . Only filter method 0 is defined by this specification. Other filter methods are reserved for future standardization. Filter method 0 provides a set of five filter types, and individual scanlines in each reduced image may use different filter types.

PNG imposes no additional restriction on which filter types can be applied to an interlaced PNG image. However, the filter types are not equally effective on all types of data. See 12.7 Filter selection .

Filtering transforms the byte sequence in a scanline to an equal length sequence of bytes preceded by the filter type. Filter type bytes are associated only with non-empty scanlines. No filter type bytes are present in an empty pass. See 13.10 Interlacing and progressive display .

9.2 Filter types for filter method 0

Filters are applied to bytes , not to pixels, regardless of the bit depth or colour type of the image. The filters operate on the byte sequence formed by a scanline that has been represented as described in 7.2 Scanlines . If the image includes an alpha channel, the alpha data is filtered in the same way as the image data .

Filters may use the original values of the following bytes to generate the new byte value:

Figure 19 shows the relative positions of the bytes x , a , b , and c .

Filter method 0 defines five basic filter types as listed in Table 10 . Orig(y) denotes the original (unfiltered) value of byte y . Filt(y) denotes the value after a filter type has been applied. Recon(y) denotes the value after the corresponding reconstruction function has been applied. The Paeth filter type PaethPredictor [ Paeth ] is defined below.

Filter method 0 specifies exactly this set of five filter types and this shall not be extended. This ensures that decoders need not decompress the data to determine whether it contains unsupported filter types: it is sufficient to check the filter method in 11.2.1 IHDR Image header .

For all filters, the bytes "to the left of" the first pixel in a scanline shall be treated as being zero. For filters that refer to the prior scanline, the entire prior scanline and bytes "to the left of" the first pixel in the prior scanline shall be treated as being zeroes for the first scanline of a reduced image.

To reverse the effect of a filter requires the decoded values of the prior pixel on the same scanline, the pixel immediately above the current pixel on the prior scanline, and the pixel just to the left of the pixel above.

Unsigned arithmetic modulo 256 is used, so that both the inputs and outputs fit into bytes. Filters are applied to each byte regardless of bit depth. The sequence of Filt values is transmitted as the filtered scanline.

9.3 Filter type 3: Average

The sum Orig(a) + Orig(b) shall be performed without overflow (using at least nine-bit arithmetic). floor() indicates that the result of the division is rounded to the next lower integer if fractional; in other words, it is an integer division or right shift operation.

9.4 Filter type 4: Paeth

The Paeth filter type computes a simple linear function of the three neighbouring pixels (left, above, upper left), then chooses as predictor the neighbouring pixel closest to the computed value. The algorithm used in this specification is an adaptation of the technique due to Alan W. Paeth [ Paeth ].

The PaethPredictor function is defined in the code below. The logic of the function and the locations of the bytes a , b , c , and x are shown in Figure 20 . Pr is the predictor for byte x .

The calculations within the PaethPredictor function shall be performed exactly, without overflow.

The order in which the comparisons are performed is critical and shall not be altered. The function tries to establish in which of the three directions (vertical, horizontal, or diagonal) the gradient of the image is smallest.

Exactly the same PaethPredictor function is used by both encoder and decoder.

10. Compression

10.1 compression method 0.

Only PNG compression method 0 is defined by this International Standard. Other values of compression method are reserved for future standardization. PNG compression method 0 is deflate compression with a sliding window (which is an upper bound on the distances appearing in the deflate stream) of at most 32768 bytes. Deflate compression is derived from LZ77 .

Deflate -compressed datastreams within PNG are stored in the zlib format, which has the structure:

zlib is specified at [ rfc1950 ].

For PNG compression method 0, the zlib compression method/flags code shall specify method code 8 ( deflate compression) and an LZ77 window size of not more than 32768 bytes. The zlib compression method number is not the same as the PNG compression method number in the IHDR chunk. The additional flags shall not specify a preset dictionary.

If the data to be compressed contain 16384 bytes or fewer, the PNG encoder may set the window size by rounding up to a power of 2 (256 minimum). This decreases the memory required for both encoding and decoding, without adversely affecting the compression ratio.

The compressed data within the zlib datastream are stored as a series of blocks, each of which can represent raw (uncompressed) data, LZ77 -compressed data encoded with fixed Huffman codes, or LZ77 -compressed data encoded with custom Huffman codes. A marker bit in the final block identifies it as the last block, allowing the decoder to recognize the end of the compressed datastream. Further details on the compression algorithm and the encoding are given in the deflate specification [ rfc1951 ].

The check value stored at the end of the zlib datastream is calculated on the uncompressed data represented by the datastream. The algorithm used to calculate this is not the same as the CRC calculation used for PNG chunk CRC field values. The zlib check value is useful mainly as a cross-check that the deflate algorithms are implemented correctly. Verifying the individual PNG chunk CRCs provides confidence that the PNG datastream has been transmitted undamaged.

10.2 Compression of the sequence of filtered scanlines

The sequence of filtered scanlines is compressed and the resulting data stream is split into IDAT chunks. The concatenation of the contents of all the IDAT chunks makes up a zlib datastream. This datastream decompresses to filtered image data .

It is important to emphasize that the boundaries between IDAT chunks are arbitrary and can fall anywhere in the zlib datastream. There is not necessarily any correlation between IDAT chunk boundaries and deflate block boundaries or any other feature of the zlib data. For example, it is entirely possible for the terminating zlib check value to be split across IDAT chunks.

Similarly, there is no required correlation between the structure of the image data (i.e., scanline boundaries) and deflate block boundaries or IDAT chunk boundaries. The complete filtered PNG image is represented by a single zlib datastream that is stored in a number of IDAT chunks.

10.3 Other uses of compression

PNG also uses compression method 0 in iTXt , iCCP , and zTXt chunks. Unlike the image data , such datastreams are not split across chunks; each such chunk contains an independent zlib datastream (see 10.1 Compression method 0 ).

11. Chunk specifications

11.1 general.

This clause defines chunk used in this specification.

11.2 Critical chunks

A critical chunk is a chunk that is absolutely required in order to successfully decode a PNG image from a PNG datastream. Extension chunks may be defined as critical chunks (see 14. Editors ), though this practice is strongly discouraged.

A valid PNG datastream shall begin with a PNG signature, immediately followed by an IHDR chunk, then one or more IDAT chunks, and shall end with an IEND chunk. Only one IHDR chunk and one IEND chunk are allowed in a PNG datastream.

11.2.1 IHDR Image header

The four-byte chunk type field contains the hexadecimal values

The IHDR chunk shall be the first chunk in the PNG datastream. It contains:

Width and height give the image dimensions in pixels. They are PNG four-byte unsigned integers . Zero is an invalid value.

Bit depth is a single-byte integer giving the number of bits per sample or per palette index (not per pixel). Valid values are 1, 2, 4, 8, and 16, although not all values are allowed for all colour types . See 6.1 Colour types and values .

Colour type is a single-byte integer.

Bit depth restrictions for each colour type are imposed to simplify implementations and to prohibit combinations that do not compress well. The allowed combinations are defined in Table 11 .

The sample depth is the same as the bit depth except in the case of indexed-colour PNG images ( colour type 3), in which the sample depth is always 8 bits (see 4.5 PNG image ).

Compression method is a single-byte integer that indicates the method used to compress the image data . Only compression method 0 ( deflate compression with a sliding window of at most 32768 bytes) is defined in this specification. All conforming PNG images shall be compressed with this scheme.

Filter method is a single-byte integer that indicates the preprocessing method applied to the image data before compression. Only filter method 0 (adaptive filtering with five basic filter types) is defined in this specification. See 9. Filtering for details.

Interlace method is a single-byte integer that indicates the transmission order of the image data . Two values are defined in this specification: 0 (no interlace) or 1 (Adam7 interlace). See 8. Interlacing and pass extraction for details.

11.2.2 PLTE Palette

The PLTE chunk contains from 1 to 256 palette entries, each a three-byte series of the form:

The number of entries is determined from the chunk length. A chunk length not divisible by 3 is an error.

This chunk shall appear for colour type 3, and may appear for colour types 2 and 6; it shall not appear for colour types 0 and 4. There shall not be more than one PLTE chunk.

For colour type 3 (indexed-colour), the PLTE chunk is required. The first entry in PLTE is referenced by pixel value 0, the second by pixel value 1, etc. The number of palette entries shall not exceed the range that can be represented in the image bit depth (for example, 2 4 = 16 for a bit depth of 4). It is permissible to have fewer entries than the bit depth would allow. In that case, any out-of-range pixel value found in the image data is an error.

For colour types 2 and 6 ( truecolour and truecolour with alpha ), the PLTE chunk is optional. If present, it provides a suggested set of colours (from 1 to 256) to which the truecolour image can be quantized if it cannot be displayed directly. It is, however, recommended that the sPLT chunk be used for this purpose, rather than the PLTE chunk. If neither PLTE nor sPLT chunks are present and the image cannot be displayed directly, quantization has to be done by the viewing system. However, it is often preferable for the selection of colours to be done once by the PNG encoder. (See 12.5 Suggested palettes .)

Note that the palette uses 8 bits (1 byte) per sample regardless of the image bit depth. In particular, the palette is 8 bits deep even when it is a suggested quantization of a 16-bit truecolour image.

There is no requirement that the palette entries all be used by the image, nor that they all be different.

11.2.3 IDAT Image data

The IDAT chunk contains the actual image data which is the output stream of the compression algorithm. See 9. Filtering and 10. Compression for details.

There may be multiple IDAT chunks; if so, they shall appear consecutively with no other intervening chunks. The compressed datastream is then the concatenation of the contents of the data fields of all the IDAT chunks.

Some images have unused trailing bytes at the end of the final IDAT chunk. This could happen when an entire buffer is stored rather than just the portion of the buffer which is used. This is undesirable. Preferably, an encoder would not include these unused bytes. If it must, setting the bytes to zero will prevent accidental data sharing. A decoder should ignore these trailing bytes.

11.2.4 IEND Image trailer

The IEND chunk marks the end of the PNG datastream. The chunk's data field is empty.

11.3 Ancillary chunks

The ancillary chunks defined in this specification are listed in the order in 4.8.2 Chunk types . This is not the order in which they appear in a PNG datastream. Ancillary chunks may be ignored by a decoder. For each ancillary chunk, the actions described are under the assumption that the decoder is not ignoring the chunk.

11.3.1 Transparency information

11.3.1.1 trns transparency.

The tRNS chunk specifies either alpha values that are associated with palette entries (for indexed-colour images) or a single transparent colour (for greyscale and truecolour images). The tRNS chunk contains:

For colour type 3 (indexed-colour), the tRNS chunk contains a series of one-byte alpha values, corresponding to entries in the PLTE chunk. Each entry indicates that pixels of the corresponding palette index shall be treated as having the specified alpha value. Alpha values have the same interpretation as in an 8-bit full alpha channel: 0 is fully transparent, 255 is fully opaque, regardless of image bit depth. The tRNS chunk shall not contain more alpha values than there are palette entries, but a tRNS chunk may contain fewer values than there are palette entries. In this case, the alpha value for all remaining palette entries is assumed to be 255. In the common case in which only palette index 0 need be made transparent, only a one-byte tRNS chunk is needed, and when all palette indices are opaque, the tRNS chunk may be omitted.

For colour types 0 or 2, two bytes per sample are used regardless of the image bit depth (see 7.1 Integers and byte order ). Pixels of the specified grey sample value or RGB sample values are treated as transparent (equivalent to alpha value 0); all other pixels are to be treated as fully opaque (alpha value 2 bitdepth -1). If the image bit depth is less than 16, the least significant bits are used. Encoders should set the other bits to 0, and decoders must mask the other bits to 0 before the value is used.

A tRNS chunk shall not appear for colour types 4 and 6, since a full alpha channel is already present in those cases.

NOTE For 16-bit greyscale or truecolour data, only pixels matching the entire 16-bit values in tRNS chunks are transparent. Decoders have to postpone any sample depth rescaling until after the pixels have been tested for transparency.

11.3.2 Colour space information

11.3.2.1 chrm primary chromaticities and white point.

The cHRM chunk may be used to specify the 1931 CIE x,y chromaticities of the red, green, and blue display primaries used in the image, and the referenced white point . See C. Gamma and chromaticity for more information. The iCCP , and sRGB chunks provide more sophisticated support for colour management and control.

The cHRM chunk contains:

Each value is encoded as a PNG four-byte unsigned integer , representing the x or y value times 100000.

A value of 0.3127 would be stored as the integer 31270.

The cHRM chunk is allowed in all PNG datastreams, although it is of little value for greyscale images.

An sRGB or iCCP chunk when present and recognized, overrides the cHRM chunk.

11.3.2.2 gAMA Image gamma

The gAMA chunk specifies a gamma value .

In fact specifying the desired display output intensity is insufficient. It is also necessary to specify the viewing conditions under which the output is desired. For gAMA these are the reference viewing conditions of the sRGB specification [ SRGB ]. Adjustment for different viewing conditions is normally handled by a Colour Management System. If the adjustment is not performed, the error is usually small. Applications desiring high colour fidelity may wish to use an sRGB , iCCP chunk.

The gAMA chunk contains:

The value is encoded as a PNG four-byte unsigned integer , representing the gamma value times 100000.

A gamma value of 1/2.2 would be stored as the integer 45455.

See 12.1 Encoder gamma handling and 13.13 Decoder gamma handling for more information.

An sRGB or iCCP chunk, when present and recognized, overrides the gAMA chunk.

11.3.2.3 iCCP Embedded ICC profile

The iCCP chunk contains:

The profile name may be any convenient name for referring to the profile. It is case-sensitive. Profile names shall contain only printable Latin-1 characters and spaces (only code points 0x20-7E and 0xA1-FF are allowed). Leading, trailing, and consecutive spaces are not permitted. The only compression method defined in this specification is method 0 ( zlib datastream with deflate compression, see 10.3 Other uses of compression ). The compression method entry is followed by a compressed profile that makes up the remainder of the chunk. Decompression of this datastream yields the embedded ICC profile.

If the iCCP chunk is present, the image samples conform to the colour space represented by the embedded ICC profile as defined by the International Color Consortium [ ICC ][ ISO_15076-1 ]. The colour space of the ICC profile shall be an RGB colour space for colour images ( colour types 2, 3, and 6), or a greyscale colour space for greyscale images ( colour types 0 and 4). A PNG encoder that writes the iCCP chunk is encouraged to also write gAMA and cHRM chunks that approximate the ICC profile, to provide compatibility with applications that do not use the iCCP chunk. When the iCCP chunk is present, PNG decoders that recognize it and are capable of colour management shall ignore the gAMA and cHRM chunks and use the iCCP chunk instead and interpret it according to [ ICC ]. PNG decoders that are used in an environment that is incapable of full-fledged colour management should use the gAMA and cHRM chunks if present.

Unless a cICP chunk exists, a PNG datastream should contain at most one embedded profile, whether specified explicitly with an iCCP or implicitly with an sRGB chunk.

11.3.2.4 sBIT Significant bits

To simplify decoders, PNG specifies that only certain sample depths may be used, and further specifies that sample values should be scaled to the full range of possible values at the sample depth. The sBIT chunk defines the original number of significant bits (which can be less than or equal to the sample depth). This allows PNG decoders to recover the original data losslessly even if the data had a sample depth not directly supported by PNG.

The sBIT chunk contains:

Each depth specified in sBIT shall be greater than zero and less than or equal to the sample depth (which is 8 for indexed-colour images, and the bit depth given in IHDR for other colour types ). Note that sBIT does not provide a sample depth for the alpha channel that is implied by a tRNS chunk; in that case, all of the sample bits of the alpha channel are to be treated as significant. If the sBIT chunk is not present, then all of the sample bits of all channels are to be treated as significant.

11.3.2.5 sRGB Standard RGB colour space

If the sRGB chunk is present, the image samples conform to the sRGB colour space [ SRGB ] and should be displayed using the specified rendering intent defined by the International Color Consortium [ ICC ] or [ ICC-2 ].

The sRGB chunk contains:

The following values are defined for rendering intent:

It is recommended that a PNG encoder that writes the sRGB chunk also write a gAMA chunk (and optionally a cHRM chunk) for compatibility with decoders that do not use the sRGB chunk. Only the following values shall be used.

When the sRGB chunk is present, it is recommended that decoders that recognize it and are capable of colour management ignore the gAMA and cHRM chunks and use the sRGB chunk instead. Decoders that recognize the sRGB chunk but are not capable of colour management are recommended to ignore the gAMA and cHRM chunks, and use the values given above as if they had appeared in gAMA and cHRM chunks.

It is recommended that the sRGB and iCCP chunks do not appear simultaneously in a PNG datastream.

11.3.2.6 cICP Coding-independent code points for video signal type identification

If present, the cICP chunk specifies the colour space, transfer function, matrix coefficients of the image using the code points specified in [ ITU-T-H.273 ]. The video format signaling SHOULD be used when processing the image, including by a decoder or when rendering the image.

The following specifies the syntax of the cICP chunk:

Each of the fields of the cICP chunk corresponds to the parameter of the same name in [ ITU-T-H.273 ].

Currently only RGB is the supported color model in PNG, and as such Matrix Coefficients shall be set to 0 .

The Video Full Range Flag value MUST be either 0 or 1 .

The cICP chunk MUST come before the PLTE and IDAT chunks.

When the cICP chunk is present, decoders that recognize it SHALL ignore the following chunks:

11.3.2.7 mDCv Mastering Display Color Volume

If present, the mDCv chunk characterizes the Mastering Display Color Volume (mDCv) used at the point of content creation, as specified in [ SMPTE-ST-2086 ]. The mDCv chunk provides informative static metadata which allows a target (consumer) display to potentially optimize it's tone mapping decisions on a comparison of its inherent capabilities versus the original mastering displays capabilites.

mDCv is typically used with the PQ [[ITU-R-BT.2100] transfer function and is commonly then called HDR10 (PQ with ST 2086). The mDCv chunk may be included with PQ [ ITU-R-BT.2100 ], SDR (for example [ ITU-R-BT.709 ]) and HLG [ ITU-R-BT.2100 ] and other image formats (even though mDCv use in some of them may be less common). Color Primaries and White Point characteristics can be derived from cICP chunk formats. Specific examples of its most common use-cases for images using both HDR [ ITU-R-BT.2100 ] and SDR [ ITU-R-BT.709 ] are available in [ ITU-T-Series-H-Supplement-19 ].

Issue #319 discusses tone-mapping behavior when the mDCv chunk is present.

For SDR (for example [ ITU-R-BT.709 ]) images, if mDCv display min/max luminance are unknown, the default characteristics can be derived from the values in [ ITU-T-Series-H-Supplement-19 ] Table 11 .”

The following specifies the syntax of the mDCv chunk:

The divisor maps from actual value to stored value. For example, the unitless divisor of 0.00002 for the primaries and white point would store the chromaticity (0.6800, 0.3200) as {34000, 16000}.

The mDCv chunk MUST come before the PLTE and IDAT chunks.

Below are mDCv examples for [ SMPTE-ST-2086 ] with SDR (for example [ ITU-R-BT.709 ]) content that may be native or containerized in HDR [ ITU-R-BT.2100 ]content (as described in the [ MovieLabs-Recommended-Best-Practice-for-SDR-to-HDR-Conversion ]).

11.3.2.8 cLLi Content Light Level Information

If present, the cLLi chunk identifies two characteristics of HDR content:

The cLLi chunk adds static metadata which provides an opportunity to optimize tone mapping of the associated content to a specific target display. This is accomplished by tailoring the tone mapping of the content itself to the specific peak brightness capabilities of the target display to prevent clipping. The method of tone-mapping optimization is currently subjective.

MaxFALL (Maximum Frame Average Light Level) uses a static metadata value to indicate the maximum value of the frame average light level (in cd/m 2 , also known as nits) of the entire playback sequence. MaxFALL is calculated by first averaging the decoded luminance values of all the pixels in each frame, and then using the value for the frame with the highest value.

MaxCLL (Maximum Content Light Level) uses a static metadata value to indicate the maximum light level of any single pixel (in cd/m 2 , also known as nits) of the entire playback sequence. There is often an algorithmic filter to eliminate false values occurring from processing or noise that could adversely affect intended downstream tone mapping.

[ CTA-861.3-A ] describes the method of calculation for generating the cLLi values, but does not specify any filtering. [ HDR-Static-Meta ] describes an improved method which rejects extreme values from statistical outliers, noise or ringing from resampling filters, and is recommended for practical implementations.

[ SMPTE-ST-2067-21 ] Section 7.5 adds additional information in Section 7.5 in the case where the cLLi values are unknown and have not been calculated.

Issue #319 discusses tone-mapping behavior when the cLLi chunk is present.

Each frame is analyzed.

A value of zero for either MaxCLL or MaxFALL means that the value is unknown or not currently calculable.

An example where this will not be calculable is when creating a live animated PNG stream, when not all frames will be available to compute the values until the stream ends. The encoder may wish to use the value zero initially and replace this with the calculated value when the stream ends.

The following specifies the syntax of the cLLi chunk:

11.3.3 Textual information

PNG provides the tEXt , iTXt , and zTXt chunks for storing text strings associated with the image, such as an image description or copyright notice. Keywords are used to indicate what each text string represents. Any number of such text chunks may appear, and more than one with the same keyword is permitted.

11.3.3.1 Keywords and text strings

The following keywords are predefined and should be used where appropriate.

Other keywords MAY be defined by any application for private or general interest.

Keywords SHOULD be .

  • reasonably self-explanatory, since the aim is to let other human users understand what the chunk contains; and
  • chosen to minimize the chance that the same keyword is used for incompatible purposes by different applications.

Keywords of general interest SHOULD be listed in [ PNG-EXTENSIONS ].

Keywords shall contain only printable Latin-1 [ ISO_8859-1 ] characters and spaces; that is, only code points 0x20-7E and 0xA1-FF are allowed. To reduce the chances for human misreading of a keyword, leading spaces, trailing spaces, and consecutive spaces are not permitted in keywords, nor is U+00A0 NON-BREAKING SPACE since it is visually indistinguishable from an ordinary space.

Keywords shall be spelled exactly as registered, so that decoders can use simple literal comparisons when looking for particular keywords. In particular, keywords are considered case-sensitive. Keywords are restricted to 1 to 79 bytes in length.

For the Creation Time keyword, the date format SHOULD be in the RFC 3339 [ rfc3339 ] date-time format or in the date format defined in section 5.2.14 of RFC 1123 [ rfc1123 ]. The RFC3339 date-time format is preferred. The actual format of this field is undefined.

The iTXt chunk uses the UTF-8 encoding [ rfc3629 ] and can be used to convey characters in any language. There is an option to compress text strings in the iTXt chunk. iTXt is recommended for all text strings, as it supports Unicode. There are also tEXt and zTXt chunks, whose content is restricted to the printable Latin-1 character set plus U+000A LINE FEED (LF). Text strings in zTXt are compressed into zlib datastreams using deflate compression (see 10.3 Other uses of compression ).

11.3.3.2 tEXt Textual data

Each tEXt chunk contains a keyword and a text string, in the format:

The keyword and text string are separated by a zero byte (null character). Neither the keyword nor the text string may contain a null character. The text string is not null-terminated (the length of the chunk defines the ending). The text string may be of any length from zero bytes up to the maximum permissible chunk size less the length of the keyword and null character separator.

The keyword indicates the type of information represented by the text string as described in 11.3.3.1 Keywords and text strings .

Text is interpreted according to the Latin-1 character set [ ISO_8859-1 ]. The text string may contain any Latin-1 character. Newlines in the text string should be represented by a single linefeed character (decimal 10). Characters other than those defined in Latin-1 plus the linefeed character have no defined meaning in tEXt chunks. Text containing characters outside the repertoire of ISO/IEC 8859-1 should be encoded using the iTXt chunk.

11.3.3.3 zTXt Compressed textual data

The zTXt and tEXt chunks are semantically equivalent, but the zTXt chunk is recommended for storing large blocks of text.

A zTXt chunk contains:

The keyword and null character are the same as in the tEXt chunk (see 11.3.3.2 tEXt Textual data ). The keyword is not compressed. The compression method entry defines the compression method used. The only value defined in this International Standard is 0 ( deflate compression). Other values are reserved for future standardization. The compression method entry is followed by the compressed text datastream that makes up the remainder of the chunk. For compression method 0, this datastream is a zlib datastream with deflate compression (see 10.3 Other uses of compression ). Decompression of this datastream yields Latin-1 text that is identical to the text that would be stored in an equivalent tEXt chunk.

11.3.3.4 iTXt International textual data

An iTXt chunk contains:

The keyword is described in 11.3.3.1 Keywords and text strings .

The compression flag is 0 for uncompressed text, 1 for compressed text. Only the text field may be compressed. The compression method entry defines the compression method used. The only compression method defined in this specification is 0 ( zlib datastream with deflate compression, see 10.3 Other uses of compression ). For uncompressed text, encoders shall set the compression method to 0, and decoders shall ignore it.

The language tag is a well-formed language tag defined by [ BCP47 ]. Unlike the keyword, the language tag is case-insensitive. Subtags must appear in the IANA language subtag registry. If the language tag is empty, the language is unspecified. Examples of language tags include: en , en-GB , es-419 , zh-Hans , zh-Hans-CN , tlh-Cyrl-AQ , ar-AE-u-nu-latn , and x-private .

The translated keyword and text both use the UTF-8 encoding [ rfc3629 ], and neither shall contain a zero byte (null character). The text, unlike other textual data in this chunk, is not null-terminated; its length is derived from the chunk length.

Line breaks should not appear in the translated keyword. In the text, a newline should be represented by a single linefeed character (hexadecimal 0A). The remaining control characters (01-09, 0B-1F, 7F-9F) are discouraged in both the translated keyword and text. In UTF-8 there is a difference between the characters 80-9F (which are discouraged) and the bytes 80-9F (which are often necessary).

The translated keyword, if not empty, should contain a translation of the keyword into the language indicated by the language tag, and applications displaying the keyword should display the translated keyword in addition.

11.3.4 Miscellaneous information

11.3.4.1 bkgd background colour.

The bKGD chunk specifies a default background colour to present the image against. If there is any other preferred background, either user-specified or part of a larger page (as in a browser), the bKGD chunk should be ignored. The bKGD chunk contains:

For colour type 3 ( indexed-colour ), the value is the palette index of the colour to be used as background.

For colour types 0 and 4 ( greyscale , greyscale with alpha ), the value is the grey level to be used as background in the range 0 to (2 bitdepth )-1. For colour types 2 and 6 ( truecolour , truecolour with alpha ), the values are the colour to be used as background, given as RGB samples in the range 0 to (2 bitdepth )-1. In each case, for consistency, two bytes per sample are used regardless of the image bit depth. If the image bit depth is less than 16, the least significant bits are used. Encoders should set the other bits to 0, and decoders must mask the other bits to 0 before the value is used.

11.3.4.2 hIST Image histogram

The hIST chunk contains a series of two-byte unsigned integers:

The hIST chunk gives the approximate usage frequency of each colour in the palette. A histogram chunk can appear only when a PLTE chunk appears. If a viewer is unable to provide all the colours listed in the palette, the histogram may help it decide how to choose a subset of the colours for display.

There shall be exactly one entry for each entry in the PLTE chunk. Each entry is proportional to the fraction of pixels in the image that have that palette index; the exact scale factor is chosen by the encoder.

Histogram entries are approximate, with the exception that a zero entry specifies that the corresponding palette entry is not used at all in the image. A histogram entry shall be nonzero if there are any pixels of that colour.

NOTE When the palette is a suggested quantization of a truecolour image, the histogram is necessarily approximate, since a decoder may map pixels to palette entries differently than the encoder did. In this situation, zero entries should not normally appear, because any entry might be used.

11.3.4.3 pHYs Physical pixel dimensions

The pHYs chunk specifies the intended pixel size or aspect ratio for display of the image. It contains:

The following values are defined for the unit specifier:

When the unit specifier is 0, the pHYs chunk defines pixel aspect ratio only; the actual size of the pixels remains unspecified.

If the pHYs chunk is not present, pixels are assumed to be square, and the physical size of each pixel is unspecified.

11.3.4.4 sPLT Suggested palette

The sPLT chunk contains:

Each palette entry is six bytes or ten bytes containing five unsigned integers (red, blue, green, alpha, and frequency).

There may be any number of entries. A PNG decoder determines the number of entries from the length of the chunk remaining after the sample depth byte. This shall be divisible by 6 if the sPLT sample depth is 8, or by 10 if the sPLT sample depth is 16. Entries shall appear in decreasing order of frequency. There is no requirement that the entries all be used by the image, nor that they all be different.

The palette name can be any convenient name for referring to the palette (for example "256 colour including Macintosh default", "256 colour including Windows-3.1 default", "Optimal 512"). The palette name may aid the choice of the appropriate suggested palette when more than one appears in a PNG datastream.

The palette name is case-sensitive, and subject to the same restrictions as the keyword parameter for the tEXt chunk. Palette names shall contain only printable Latin-1 characters and spaces (only code points 0x20-7E and 0xA1-FF are allowed). Leading, trailing, and consecutive spaces are not permitted.

The sPLT sample depth shall be 8 or 16.

The red, green, blue, and alpha samples are either one or two bytes each, depending on the sPLT sample depth, regardless of the image bit depth. The colour samples are not premultiplied by alpha, nor are they precomposited against any background. An alpha value of 0 means fully transparent. An alpha value of 255 (when the sPLT sample depth is 8) or 65535 (when the sPLT sample depth is 16) means fully opaque. The sPLT chunk may appear for any colour type . Entries in sPLT use the same gamma value and chromaticity values as the PNG image, but may fall outside the range of values used in the colour space of the PNG image; for example, in a greyscale PNG image, each sPLT entry would typically have equal red, green, and blue values, but this is not required. Similarly, sPLT entries can have non-opaque alpha values even when the PNG image does not use transparency.

Each frequency value is proportional to the fraction of the pixels in the image for which that palette entry is the closest match in RGBA space, before the image has been composited against any background. The exact scale factor is chosen by the PNG encoder; it is recommended that the resulting range of individual values reasonably fills the range 0 to 65535. A PNG encoder may artificially inflate the frequencies for colours considered to be "important", for example the colours used in a logo or the facial features of a portrait. Zero is a valid frequency meaning that the colour is "least important" or that it is rarely, if ever, used. When all the frequencies are zero, they are meaningless, that is to say, nothing may be inferred about the actual frequencies with which the colours appear in the PNG image.

Multiple sPLT chunks are permitted, but each shall have a different palette name.

11.3.4.5 eXIf Exchangeable Image File (Exif) Profile

The data segment of the eXIf chunk contains an Exif profile in the format specified in "4.7.2 Interoperability Structure of APP1 in Compressed Data" of [ CIPA-DC-008 ] except that the JPEG APP1 marker, length, and the "Exif ID code" described in 4.7.2(C), i.e., "Exif", NULL, and padding byte, are not included.

The eXIf chunk size is constrained only by the maximum of 2 31 -1 bytes imposed by the PNG specification. Only one eXIf chunk is allowed in a PNG datastream.

The eXIf chunk contains metadata concerning the original image data . If the image has been edited subsequent to creation of the Exif profile, this data might no longer apply to the PNG image data . It is recommended that unless a decoder has independent knowledge of the validity of the Exif data, the data should be considered to be of historical value only. It is beyond the scope of this specification to resolve potential conflicts between data in the eXIf chunk and in other PNG chunks.

11.3.4.5.1 eXIf General Recommendations

While the PNG specification allows the chunk size to be as large as 2 31 -1 bytes, application authors should be aware that, if the Exif profile is going to be written to a JPEG [ JPEG ] datastream, the total length of the eXIf chunk data may need to be adjusted to not exceed 2 16 -9 bytes, so it can fit into a JPEG APP1 marker (Exif) segment.

11.3.4.5.2 eXIf Recommendations for Decoders

The first two bytes of data are either "II" for little-endian (Intel) or "MM" for big-endian (Motorola) byte order. Decoders should check the first four bytes to ensure that they have the following hexadecimal values:

All other values are reserved for possible future definition.

11.3.4.5.3 eXIf Recommendations for Encoders

Image editing applications should consider Paragraph E.3 of the Exif Specification [ CIPA-DC-008 ], which discusses requirements for updating Exif data when the image is changed. Encoders should follow those requirements, but decoders should not assume that it has been accomplished.

While encoders may choose to update them, there is no expectation that any thumbnails present in the Exif profile have (or have not) been updated if the main image was changed.

11.3.5 Time stamp information

11.3.5.1 time image last-modification time.

The tIME chunk gives the time of the last image modification ( not the time of initial image creation). It contains:

Universal Time (UTC) should be specified rather than local time.

The tIME chunk is intended for use as an automatically-applied time stamp that is updated whenever the image data are changed.

11.3.6 Animation information

11.3.6.1 actl animation control chunk.

The acTL chunk declares that this is an animated PNG image, gives the number of frames, and the number of times to loop. It contains:

Each value is encoded as a PNG four-byte unsigned integer .

num_frames indicates the total number of frames in the animation. This must equal the number of fcTL chunks. 0 is not a valid value. 1 is a valid value, for a single-frame PNG. If this value does not equal the actual number of frames it should be treated as an error.

num_plays indicates the number of times that this animation should play; if it is 0, the animation should play indefinitely. If nonzero, the animation should come to rest on the final frame at the end of the last play.

The acTL chunk must appear before the first IDAT chunk within a valid PNG stream.

For Web compatibility, due to the long time between the development and deployment of this chunk and it's incorporation into the PNG specification, this chunk name is exceptionally defined as if it were a private chunk.

11.3.6.2 fcTL Frame Control Chunk

The fcTL chunk defines the dimensions, position, delay and disposal of an individual frame. Exactly one fcTL chunk chunk is required for each frame. It contains:

sequence_number defines the sequence number of the animation chunk, starting from 0. It is encoded as a PNG four-byte unsigned integer .

width and height define the width and height of the following frame. They are encoded as PNG four-byte unsigned integers . They must be greater than zero.

x_offset and y_offset define the x and y position of the following frame. They are encoded as PNG four-byte unsigned integers . They must be greater than or equal to zero.

The frame must be rendered within the region defined by x_offset , y_offset , width , and height . This region may not fall outside of the default image; thus x_offset plus width must not be greater than the IHDR width; similarly y_offset plus height must not be greater than the IHDR height.

delay_num and delay_den define the numerator and denominator of the delay fraction; indicating the time to display the current frame, in seconds. If the denominator is 0, it is to be treated as if it were 100 (that is, delay_num then specifies 1/100ths of a second). If the the value of the numerator is 0 the decoder should render the next frame as quickly as possible, though viewers may impose a reasonable lower bound. They are encoded as two-byte unsigned integers.

Frame timings should be independent of the time required for decoding and display of each frame, so that animations will run at the same rate regardless of the performance of the decoder implementation.

dispose_op defines the type of frame area disposal to be done after rendering this frame; in other words, it specifies how the output buffer should be changed at the end of the delay (before rendering the next frame). It is encoded as a one-byte unsigned integer.

Valid values for dispose_op are:

If the first fcTL chunk uses a dispose_op of APNG_DISPOSE_OP_PREVIOUS it should be treated as APNG_DISPOSE_OP_BACKGROUND .

blend_op specifies whether the frame is to be alpha blended into the current output buffer content, or whether it should completely replace its region in the output buffer. It is encoded as a one-byte unsigned integer.

Valid values for blend_op are:

If blend_op is APNG_BLEND_OP_SOURCE all color components of the frame, including alpha, overwrite the current contents of the frame's output buffer region. If blend_op is APNG_BLEND_OP_OVER the frame should be composited onto the output buffer based on its alpha, using a simple OVER operation as described in Alpha Channel Processing . Note that the second variation of the sample code is applicable.

Note that for the first frame, the two blend modes are functionally equivalent due to the clearing of the output buffer at the beginning of each play.

The fcTL chunk corresponding to the default image, if it exists, has these restrictions:

  • The x_offset and y_offset fields must be 0.
  • The width and height fields must equal the corresponding fields from the IHDR chunk.

As noted earlier, the output buffer must be completely initialized to fully transparent black at the beginning of each play. This is to ensure that each play of the animation will be identical. Decoders are free to avoid an explicit clear step as long as the result is guaranteed to be identical. For example, if the default image is included in the animation, and uses a blend_op of APNG_BLEND_OP_SOURCE , clearing is not necessary because the entire output buffer will be overwritten.

11.3.6.3 fdAT Frame Data Chunk

The fdAT chunk serves the same purpose for animations as the IDAT chunk does for static images; it contains the image data for all frames (or, for animations which include the static image as first frame, for all frames after the first one). It contains:

At least one fdAT chunk is required for each frame, except for the first frame, if that frame is represented by an IDAT chunk.

The compressed datastream is then the concatenation of the contents of the data fields of all the fdAT chunks within a frame. When decompressed, the datastream is the complete pixel data of a PNG image, including the filter byte at the beginning of each scanline, similar to the uncompressed data of all the IDAT chunks. It utilizes the same bit depth, colour type , compression method, filter method , interlace method, and palette (if any) as the static image .

Each frame inherits every property specified by any critical or ancillary chunks before the first IDAT chunk in the file, except the width and height, which come from the fcTL chunk.

If the PNG pHYs chunk is present, the APNG images and their x_offset and y_offset values must be scaled in the same way as the main image. Conceptually, such scaling occurs while mapping the output buffer onto the canvas .

12. PNG Encoders

This clause gives requirements and recommendations for encoder behaviour. A PNG encoder shall produce a PNG datastream from a PNG image that conforms to the format specified in the preceding clauses. Best results will usually be achieved by following the additional recommendations given here.

12.1 Encoder gamma handling

See C. Gamma and chromaticity for a brief introduction to gamma issues.

PNG encoders capable of full colour management will perform more sophisticated calculations than those described here and may choose to use the iCCP chunk. If it is known that the image samples conform to the sRGB specification [ SRGB ], encoders are strongly encouraged to write the sRGB chunk without performing additional gamma handling. In both cases it is recommended that an appropriate gAMA chunk be generated for use by PNG decoders that do not recognize the iCCP or sRGB chunks.

A PNG encoder has to determine:

  • what value to write in the gAMA chunk;
  • how to transform the provided image samples into the values to be written in the PNG datastream.

The value to write in the gAMA chunk is that value which causes a PNG decoder to behave in the desired way. See 13.13 Decoder gamma handling .

The transform to be applied depends on the nature of the image samples and their precision. If the samples represent light intensity in floating-point or high precision integer form (perhaps from a computer graphics renderer), the encoder may perform gamma encoding (applying a power function with exponent less than 1) before quantizing the data to integer values for inclusion in the PNG datastream. This results in fewer banding artifacts at a given sample depth, or allows smaller samples while retaining the same visual quality. An intensity level expressed as a floating-point value in the range 0 to 1 can be converted to a datastream image sample by:

integer_sample = floor((2 sampledepth -1) * intensity encoding_exponent + 0.5)

If the intensity in the equation is the desired output intensity, the encoding exponent is the gamma value to be used in the gAMA chunk.

If the intensity available to the PNG encoder is the original scene intensity, another transformation may be needed. There is sometimes a requirement for the displayed image to have higher contrast than the original source image. This corresponds to an end-to-end transfer function from original scene to display output with an exponent greater than 1. In this case:

If it is not known whether the conditions under which the original image was captured or calculated warrant such a contrast change, it may be assumed that the display intensities are proportional to original scene intensities, i.e. the end-to-end exponent is 1 and hence:

If the image is being written to a datastream only, the encoder is free to choose the encoding exponent. Choosing a value that causes the gamma value in the gAMA chunk to be 1/2.2 is often a reasonable choice because it minimizes the work for a PNG decoder displaying on a typical video monitor.

Some image renderers may simultaneously write the image to a PNG datastream and display it on-screen. The displayed pixels should be gamma corrected for the display system and viewing conditions in use, so that the user sees a proper representation of the intended scene.

If the renderer wants to write the displayed sample values to the PNG datastream, avoiding a separate gamma encoding step for the datastream, the renderer should approximate the transfer function of the display system by a power function, and write the reciprocal of the exponent into the gAMA chunk. This will allow a PNG decoder to reproduce what was displayed on screen for the originator during rendering.

However, it is equally reasonable for a renderer to compute displayed pixels appropriate for the display device, and to perform separate gamma encoding for data storage and transmission, arranging to have a value in the gAMA chunk more appropriate to the future use of the image.

Computer graphics renderers often do not perform gamma encoding , instead making sample values directly proportional to scene light intensity. If the PNG encoder receives sample values that have already been quantized into integer values, there is no point in doing gamma encoding on them; that would just result in further loss of information. The encoder should just write the sample values to the PNG datastream. This does not imply that the gAMA chunk should contain a gamma value of 1.0 because the desired end-to-end transfer function from scene intensity to display output intensity is not necessarily linear. However, the desired gamma value is probably not far from 1.0. It may depend on whether the scene being rendered is a daylight scene or an indoor scene, etc.

When the sample values come directly from a piece of hardware, the correct gAMA value can, in principle, be inferred from the transfer function of the hardware and lighting conditions of the scene. In the case of video digitizers ("frame grabbers"), the samples are probably in the sRGB colour space, because the sRGB specification was designed to be compatible with modern video standards. Image scanners are less predictable. Their output samples may be proportional to the input light intensity since CCD sensors themselves are linear, or the scanner hardware may have already applied a power function designed to compensate for dot gain in subsequent printing (an exponent of about 0.57), or the scanner may have corrected the samples for display on a monitor. It may be necessary to refer to the scanner's manual or to scan a calibrated target in order to determine the characteristics of a particular scanner. It should be remembered that gamma relates samples to desired display output, not to scanner input.

Datastream format converters generally should not attempt to convert supplied images to a different gamma . The data should be stored in the PNG datastream without conversion, and the gamma value should be deduced from information in the source datastream if possible. Gamma alteration at datastream conversion time causes re-quantization of the set of intensity levels that are represented, introducing further roundoff error with little benefit. It is almost always better to just copy the sample values intact from the input to the output file.

If the source datastream describes the gamma characteristics of the image, a datastream converter is strongly encouraged to write a gAMA chunk. Some datastream formats specify the display exponent (the exponent of the function which maps image samples to display output rather than the other direction). If the source file's gamma value is greater than 1.0, it is probably a display exponent, and the reciprocal of this value should be used for the PNG gamma value . If the source file format records the relationship between image samples and a quantity other than display output, it will be more complex than this to deduce the PNG gamma value .

If a PNG encoder or datastream converter knows that the image has been displayed satisfactorily using a display system whose transfer function can be approximated by a power function with exponent display_exponent , the image can be marked as having the gamma value :

It is better to write a gAMA chunk with a value that is approximately correct than to omit the chunk and force PNG decoders to guess an approximate gamma value . If a PNG encoder is unable to infer the gamma value , it is preferable to omit the gAMA chunk. If a guess has to be made this should be left to the PNG decoder.

gamma does not apply to alpha samples; alpha is always represented linearly.

See also 13.13 Decoder gamma handling .

12.2 Encoder colour handling

See C. Gamma and chromaticity for references to colour issues.

PNG encoders capable of full colour management will perform more sophisticated calculations than those described here and may choose to use the iCCP chunk. If it is known that the image samples conform to the sRGB specification [ SRGB ], PNG encoders are strongly encouraged to use the sRGB chunk.

If it is possible for the encoder to determine the chromaticities of the source display primaries, or to make a strong guess based on the origin of the image, or the hardware running it, the encoder is strongly encouraged to output the cHRM chunk. If this is done, the gAMA chunk should also be written; decoders can do little with a cHRM chunk if the gAMA chunk is missing.

There are a number of recommendations and standards for primaries and white points , some of which are linked to particular technologies, for example the CCIR 709 standard [ ITU-R-BT.709 ] and the SMPTE-C standard [ SMPTE-170M ].

There are three cases that need to be considered:

  • the encoder is part of the generation system;
  • the source image is captured by a camera or scanner;
  • the PNG datastream was generated by translation from some other format.

In the case of hand-drawn or digitally edited images, it is necessary to determine what monitor they were viewed on when being produced. Many image editing programs allow the type of monitor being used to be specified. This is often because they are working in some device-independent space internally. Such programs have enough information to write valid cHRM and gAMA chunks, and are strongly encouraged to do so automatically.

If the encoder is compiled as a portion of a computer image renderer that performs full-spectral rendering, the monitor values that were used to convert from the internal device-independent colour space to RGB should be written into the cHRM chunk. Any colours that are outside the gamut of the chosen RGB device should be mapped to be within the gamut; PNG does not store out-of-gamut colours.

If the computer image renderer performs calculations directly in device-dependent RGB space, a cHRM chunk should not be written unless the scene description and rendering parameters have been adjusted for a particular monitor. In that case, the data for that monitor should be used to construct a cHRM chunk.

A few image formats store calibration information, which can be used to fill in the cHRM chunk. For example, TIFF 6.0 files [ TIFF-6.0 ] can optionally store calibration information, which if present should be used to construct the cHRM chunk.

Video created with recent video equipment probably uses the CCIR 709 primaries and D65 white point [ ITU-R-BT.709 ], which are given in Table 28 .

An older but still very popular video standard is SMPTE-C [ SMPTE-170M ] given in Table 29 .

It is not recommended that datastream format converters attempt to convert supplied images to a different RGB colour space. The data should be stored in the PNG datastream without conversion, and the source primary chromaticities should be recorded if they are known. Colour space transformation at datastream conversion time is a bad idea because of gamut mismatches and rounding errors. As with gamma conversions, it is better to store the data losslessly and incur at most one conversion when the image is finally displayed.

See 13.14 Decoder colour handling .

12.3 Alpha channel creation

The alpha channel can be regarded either as a mask that temporarily hides transparent parts of the image, or as a means for constructing a non-rectangular image. In the first case, the colour values of fully transparent pixels should be preserved for future use. In the second case, the transparent pixels carry no useful data and are simply there to fill out the rectangular image area required by PNG. In this case, fully transparent pixels should all be assigned the same colour value for best compression.

Image authors should keep in mind the possibility that a decoder will not support transparency control in full (see 13.16 Alpha channel processing ). Hence, the colours assigned to transparent pixels should be reasonable background colours whenever feasible.

For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the tRNS transparency chunk is also available.

If the image has a known background colour, this colour should be written in the bKGD chunk. Even decoders that ignore transparency may use the bKGD colour to fill unused screen area.

If the original image has premultiplied (also called "associated") alpha data, it can be converted to PNG's non-premultiplied format by dividing each sample value by the corresponding alpha value, then multiplying by the maximum value for the image bit depth, and rounding to the nearest integer. In valid premultiplied data, the sample values never exceed their corresponding alpha values, so the result of the division should always be in the range 0 to 1. If the alpha value is zero, output black (zeroes).

12.4 Sample depth scaling

When encoding input samples that have a sample depth that cannot be directly represented in PNG, the encoder shall scale the samples up to a sample depth that is allowed by PNG. The most accurate scaling method is the linear equation:

where the input samples range from 0 to MAXINSAMPLE and the outputs range from 0 to MAXOUTSAMPLE (which is 2 sampledepth -1).

A close approximation to the linear scaling method is achieved by "left bit replication", which is shifting the valid bits to begin in the most significant bit and repeating the most significant bits into the open bits. This method is often faster to compute than linear scaling.

Assume that 5-bit samples are being scaled up to 8 bits. If the source sample value is 27 (in the range from 0-31), then the original bits are:

Left bit replication gives a value of 222:

which matches the value computed by the linear equation. Left bit replication usually gives the same value as linear scaling, and is never off by more than one.

A distinctly less accurate approximation is obtained by simply left-shifting the input value and filling the low order bits with zeroes. This scheme cannot reproduce white exactly, since it does not generate an all-ones maximum value; the net effect is to darken the image slightly. This method is not recommended in general, but it does have the effect of improving compression, particularly when dealing with greater-than-8-bit sample depths. Since the relative error introduced by zero-fill scaling is small at high sample depths, some encoders may choose to use it. Zero-fill shall not be used for alpha channel data, however, since many decoders will treat alpha values of all zeroes and all ones as special cases. It is important to represent both those values exactly in the scaled data.

When the encoder writes an sBIT chunk, it is required to do the scaling in such a way that the high-order bits of the stored samples match the original data. That is, if the sBIT chunk specifies a sample depth of S, the high-order S bits of the stored data shall agree with the original S-bit data values. This allows decoders to recover the original data by shifting right. The added low-order bits are not constrained. All the above scaling methods meet this restriction.

When scaling up source image data , it is recommended that the low-order bits be filled consistently for all samples; that is, the same source value should generate the same sample value at any pixel position. This improves compression by reducing the number of distinct sample values. This is not a mandatory requirement, and some encoders may choose not to follow it. For example, an encoder might instead dither the low-order bits, improving displayed image quality at the price of increasing file size.

In some applications the original source data may have a range that is not a power of 2. The linear scaling equation still works for this case, although the shifting methods do not. It is recommended that an sBIT chunk not be written for such images, since sBIT suggests that the original data range was exactly 0..2 S -1.

12.5 Suggested palettes

Suggested palettes may appear as sPLT chunks in any PNG datastream, or as a PLTE chunk in truecolour PNG datastreams. In either case, the suggested palette is not an essential part of the image data , but it may be used to present the image on indexed-colour display hardware. Suggested palettes are of no interest to viewers running on truecolour hardware.

When an sPLT chunk is used to provide a suggested palette, it is recommended that the encoder use the frequency fields to indicate the relative importance of the palette entries, rather than leave them all zero (meaning undefined). The frequency values are most easily computed as "nearest neighbour" counts, that is, the approximate usage of each RGBA palette entry if no dithering is applied. (These counts will often be available "for free" as a consequence of developing the suggested palette.) Because the suggested palette includes transparency information, it should be computed for the un- composited image.

Even for indexed-colour images, sPLT can be used to define alternative reduced palettes for viewers that are unable to display all the colours present in the PLTE chunk. If the PLTE chunk appears without the bKGD chunk in an image of colour type 6, the circumstances under which the palette was computed are unspecified.

An older method for including a suggested palette in a truecolour PNG datastream uses the PLTE chunk. If this method is used, the histogram (frequencies) should appear in a separate hIST chunk. The PLTE chunk does not include transparency information. Hence for images of colour type 6 ( truecolour with alpha ), it is recommended that a bKGD chunk appear and that the palette and histogram be computed with reference to the image as it would appear after compositing against the specified background colour. This definition is necessary to ensure that useful palette entries are generated for pixels having fractional alpha values. The resulting palette will probably be useful only to viewers that present the image against the same background colour. It is recommended that PNG editors delete or recompute the palette if they alter or remove the bKGD chunk in an image of colour type 6.

For images of colour type 2 ( truecolour ), it is recommended that the PLTE and hIST chunks be computed with reference to the RGB data only, ignoring any transparent-colour specification. If the datastream uses transparency (has a tRNS chunk), viewers can easily adapt the resulting palette for use with their intended background colour (see 13.17 Histogram and suggested palette usage ).

For providing suggested palettes, the sPLT chunk is more flexible than the PLTE chunk in the following ways:

  • With sPLT multiple suggested palettes may be provided. A PNG decoder may choose an appropriate palette based on name or number of entries.
  • In a PNG datastream of colour type 6 ( truecolour with alpha channel), the PLTE chunk represents a palette already composited against the bKGD colour, so it is useful only for display against that background colour. The sPLT chunk provides an un- composited palette, which is useful for display against backgrounds chosen by the PNG decoder.
  • Since the sPLT chunk is an ancillary chunk, a PNG editor may add or modify suggested palettes without being forced to discard unknown unsafe-to-copy chunks.
  • Whereas the sPLT chunk is allowed in PNG datastreams for colour types 0, 3, and 4 ( greyscale and indexed-colour ), the PLTE chunk cannot be used to provide reduced palettes in these cases.
  • More than 256 entries may appear in the sPLT chunk.

A PNG encoder that uses the sPLT chunk may choose to write a suggested palette represented by PLTE and hIST chunks as well, for compatibility with decoders that do not recognize the sPLT chunk.

12.6 Interlacing

This specification defines two interlace methods, one of which is no interlacing. Interlacing provides a convenient basis from which decoders can progressively display an image, as described in 13.10 Interlacing and progressive display .

12.7 Filter selection

For images of colour type 3 (indexed-colour), filter type 0 (None) is usually the most effective. Colour images with 256 or fewer colours should almost always be stored in indexed-colour format; truecolour format is likely to be much larger.

Filter type 0 is also recommended for images of bit depths less than 8. For low-bit-depth greyscale images, in rare cases, better compression may be obtained by first expanding the image to 8-bit representation and then applying filtering.

For truecolour and greyscale images, any of the five filters may prove the most effective. If an encoder uses a fixed filter, the Paeth filter type is most likely to be the best.

For best compression of truecolour and greyscale images, and if compression efficiency is valued over speed of compression, the recommended approach is adaptive filtering in which a filter type is chosen for each scanline. Each unique image will have a different set of filters which perform best for it. An encoder could try every combination of filters to find what compresses best for a given image. However, when an exhaustive search is unacceptable, here are some general heuristics which may perform well enough: compute the output scanline using all five filters, and select the filter that gives the smallest sum of absolute values of outputs. (Consider the output bytes as signed differences for this test.) This method usually outperforms any single fixed filter type choice.

Filtering according to these recommendations is effective in conjunction with either of the two interlace methods defined in this specification.

12.8 Compression

The encoder may divide the compressed datastream into IDAT chunks however it wishes. (Multiple IDAT chunks are allowed so that encoders may work in a fixed amount of memory; typically the chunk size will correspond to the encoder's buffer size.) A PNG datastream in which each IDAT chunk contains only one data byte is valid, though remarkably wasteful of space. (Zero-length IDAT chunks are also valid, though even more wasteful.)

12.9 Text chunk processing

A nonempty keyword shall be provided for each text chunk. The generic keyword "Comment" can be used if no better description of the text is available. If a user-supplied keyword is used, encoders should check that it meets the restrictions on keywords.

The iTXt chunk uses the UTF-8 encoding of Unicode and thus can store text in any language. The tEXt and zTXt chunks use the Latin-1 (ISO 8859-1) character encoding, which limits the range of characters that can be used in these chunks. Encoders should prefer iTXt to tEXt and zTXt chunks, in order to allow a wide range of characters without data loss. Encoders must convert characters that use local legacy character encodings to the appropriate encoding when storing text.

When creating iTXt chunks, encoders should follow UTF-8 encode in Encoding Standard .

Encoders should discourage the creation of single lines of text longer than 79 Unicode code points , in order to facilitate easy reading. It is recommended that text items less than 1024 bytes in size should be output using uncompressed text chunks. It is recommended that the basic title and author keywords be output using uncompressed text chunks. Placing large text chunks after the image data (after the IDAT chunks) can speed up image display in some situations, as the decoder will decode the image data first. It is recommended that small text chunks, such as the image title, appear before the IDAT chunks.

12.10 Chunking

12.10.1 use of private chunks.

Encoders MAY use private chunks to carry information that need not be understood by other applications.

12.10.2 Use of non-reserved field values

Encoders MAY use non-reserved field values for experimental or private use.

12.10.3 Ancillary chunks

All ancillary chunks are optional, encoders need not write them. However, encoders are encouraged to write the standard ancillary chunks when the information is available.

13. PNG decoders and viewers

This clause gives some requirements and recommendations for PNG decoder behaviour and viewer behaviour. A viewer presents the decoded PNG image to the user. Since viewer and decoder behaviour are closely connected, decoders and viewers are treated together here. The only absolute requirement on a PNG decoder is that it successfully reads any datastream conforming to the format specified in the preceding chapters. However, best results will usually be achieved by following these additional recommendations.

PNG decoders shall support all valid combinations of bit depth, colour type , compression method, filter method , and interlace method that are explicitly defined in this International Standard.

13.1 Error handling

Errors in a PNG datastream will fall into two general classes, transmission errors and syntax errors (see 4.10 Error handling ).

Examples of transmission errors are transmission in "text" or "ascii" mode, in which byte codes 13 and/or 10 may be added, removed, or converted throughout the datastream; unexpected termination, in which the datastream is truncated; or a physical error on a storage device, in which one or more blocks (typically 512 bytes each) will have garbled or random values. Some examples of syntax errors are an invalid value for a row filter, an invalid compression method, an invalid chunk length, the absence of a PLTE chunk before the first IDAT chunk in an indexed image, or the presence of multiple gAMA chunks. A PNG decoder should handle errors as follows:

  • Detect errors as early as possible using the PNG signature bytes and CRCs on each chunk. Decoders should verify that all eight bytes of the PNG signature are correct. A decoder can have additional confidence in the datastream's integrity if the next eight bytes begin an IHDR chunk with the correct chunk length. A CRC should be checked before processing the chunk data. Sometimes this is impractical, for example when a streaming PNG decoder is processing a large IDAT chunk. In this case the CRC should be checked when the end of the chunk is reached.
  • Recover from an error, if possible; otherwise fail gracefully. Errors that have little or no effect on the processing of the image may be ignored, while those that affect critical data shall be dealt with in a manner appropriate to the application.
  • Provide helpful messages describing errors, including recoverable errors.

Three classes of PNG chunks are relevant to this philosophy. For the purposes of this classification, an "unknown chunk" is either one whose type was genuinely unknown to the decoder's author, or one that the author chose to treat as unknown, because default handling of that chunk type would be sufficient for the program's purposes. Other chunks are called "known chunks". Given this definition, the three classes are as follows:

  • known chunks, which necessarily includes all of the critical chunks defined in this specification ( IHDR , PLTE , IDAT , IEND )
  • unknown critical chunks (bit 5 of the first byte of the chunk type is 0)
  • unknown ancillary chunks (bit 5 of the first byte of the chunk type is 1)

See 5.4 Chunk naming conventions for a description of chunk naming conventions.

PNG chunk types are marked "critical" or "ancillary" according to whether the chunks are critical for the purpose of extracting a viewable image (as with IHDR , PLTE , and IDAT ) or critical to understanding the datastream structure (as with IEND ). This is a specific kind of criticality and one that is not necessarily relevant to every conceivable decoder. For example, a program whose sole purpose is to extract text annotations (for example, copyright information) does not require a viewable image but should decode UTF-8 correctly . Another decoder might consider the tRNS and gAMA chunks essential to its proper execution.

Syntax errors always involve known chunks because syntax errors in unknown chunks cannot be detected. The PNG decoder has to determine whether a syntax error is fatal (unrecoverable) or not, depending on its requirements and the situation. For example, most decoders can ignore an invalid IEND chunk; a text-extraction program can ignore the absence of IDAT ; an image viewer cannot recover from an empty PLTE chunk in an indexed image but it can ignore an invalid PLTE chunk in a truecolour image; and a program that extracts the alpha channel can ignore an invalid gAMA chunk, but may consider the presence of two tRNS chunks to be a fatal error. Anomalous situations other than syntax errors shall be treated as follows:

  • Encountering an unknown ancillary chunk is never an error. The chunk can simply be ignored.
  • Encountering an unknown critical chunk is a fatal condition for any decoder trying to extract the image from the datastream. A decoder that ignored a critical chunk could not know whether the image it extracted was the one intended by the encoder.
  • A PNG signature mismatch, a CRC mismatch, or an unexpected end-of-stream indicates a corrupted datastream, and may be regarded as a fatal error. A decoder could try to salvage something from the datastream, but the extent of the damage will not be known.

When a fatal condition occurs, the decoder should fail immediately, signal an error to the user if appropriate, and optionally continue displaying any image data already visible to the user (i.e. "fail gracefully"). The application as a whole need not terminate.

When a non-fatal error occurs, the decoder should signal a warning to the user if appropriate, recover from the error, and continue processing normally.

When decoding an indexed-color PNG, if out-of-range indexes are encountered, decoders have historically varied in their handling of this error. Displaying the pixel as opaque black is one common error recovery tactic, and is now required by this specification. Older implementations will vary, and so the behavior must not be relied on by encoders.

Decoders that do not compute CRCs should interpret apparent syntax errors as indications of corruption (see also 13.2 Error checking ).

Errors in compressed chunks ( IDAT , zTXt , iTXt , iCCP ) could lead to buffer overruns. Implementors of deflate decompressors should guard against this possibility.

APNG is designed to allow incremental display of frames before the entire datastream has been read. This implies that some errors may not be detected until partway through the animation. It is strongly recommended that when any error is encountered decoders should discard all subsequent frames, stop the animation, and revert to displaying the static image. A decoder which detects an error before the animation has started should display the static image. An error message may be displayed to the user if appropriate.

Decoders shall treat out-of-order APNG chunks as an error. APNG -aware PNG editors should restore them to correct order, using the sequence numbers.

13.2 Error checking

The PNG error handling philosophy is described in 13.1 Error handling .

An unknown chunk type is not to be treated as an error unless it is a critical chunk.

The chunk type can be checked for plausibility by seeing whether all four bytes are in the range codes 41-5A and 61-7A (hexadecimal); note that this need be done only for unrecognized chunk types. If the total datastream size is known (from file system information, HTTP protocol, etc), the chunk length can be checked for plausibility as well. If CRCs are not checked, dropped/added data bytes or an erroneous chunk length can cause the decoder to get out of step and misinterpret subsequent data as a chunk header.

For known-length chunks, such as IHDR , decoders should treat an unexpected chunk length as an error. Future extensions to this specification will not add new fields to existing chunks; instead, new chunk types will be added to carry new information.

Unexpected values in fields of known chunks (for example, an unexpected compression method in the IHDR chunk) shall be checked for and treated as errors. However, it is recommended that unexpected field values be treated as fatal errors only in critical chunks. An unexpected value in an ancillary chunk can be handled by ignoring the whole chunk as though it were an unknown chunk type. (This recommendation assumes that the chunk's CRC has been verified. In decoders that do not check CRCs, it is safer to treat any unexpected value as indicating a corrupted datastream.)

Standard PNG images shall be compressed with compression method 0. The compression method field of the IHDR chunk is provided for possible future standardization or proprietary variants. Decoders shall check this byte and report an error if it holds an unrecognized code. See 10. Compression for details.

13.3 Security considerations

A PNG datastream is composed of a collection of explicitly typed chunks. Chunks whose contents are defined by the specification could actually contain anything, including malicious code. Similarly there could be data after the IEND chunk which could contain anything, including malicious code. There is no known risk that such malicious code could be executed on the recipient's computer as a result of decoding the PNG image . However, a malicious application might hide such code inside an innocent-looking image file and then execute it.

The possible security risks associated with future chunk types cannot be specified at this time. Security issues will be considered when defining future public chunks. There is no additional security risk associated with unknown or unimplemented chunk types, because such chunks will be ignored, or at most be copied into another PNG datastream.

The iTXt , tEXt , and zTXt chunks contain keywords and data that are meant to be displayed as plain text. The iCCP and sPLT chunks contain keywords that are meant to be displayed as plain text. It is possible that if the decoder displays such text without filtering out control characters, especially the ESC (escape) character, certain systems or terminals could behave in undesirable and insecure ways. It is recommended that decoders filter out control characters to avoid this risk; see 13.7 Text chunk processing .

Every chunk begins with a length field, which makes it easier to write decoders that are invulnerable to fraudulent chunks that attempt to overflow buffers. The CRC at the end of every chunk provides a robust defence against accidentally corrupted data. The PNG signature bytes provide early detection of common file transmission errors.

A decoder that fails to check CRCs could be subject to data corruption. The only likely consequence of such corruption is incorrectly displayed pixels within the image. Worse things might happen if the CRC of the IHDR chunk is not checked and the width or height fields are corrupted. See 13.2 Error checking .

A poorly written decoder might be subject to buffer overflow, because chunks can be extremely large, up to 2 31 -1 bytes long. But properly written decoders will handle large chunks without difficulty.

13.4 Privacy considerations

Some image editing tools have historically performed redaction by merely setting the alpha channel of the redacted area to zero, without also removing the actual image data. Users who rely solely on the visual appearance of such images run a privacy risk because the actual image data can be easily recovered.

Similarly, some image editing tools have historically performed clipping by rewriting the width and height in IHDR without re-encoding the image data, which thus extends beyond the new width and height and may be recovered.

Images with eXIf chunks may contain automatically-included data, such as photographic GPS coordinates, which could be a privacy risk if the user is unaware that the PNG image contains this data. (Other image formats that contain EXIF, such as JPEG/JFIF, have the same privacy risk).

13.5 Chunking

Decoders shall recognize chunk types by a simple four-byte literal comparison; it is incorrect to perform case conversion on chunk types. A decoder encountering an unknown chunk in which the ancillary bit is 1 may safely ignore the chunk and proceed to display the image. A decoder trying to extract the image, upon encountering an unknown chunk in which the ancillary bit is 0, indicating a critical chunk, shall indicate to the user that the image contains information it cannot safely interpret.

Decoders should test the properties of an unknown chunk type by numerically testing the specified bits. Testing whether a character is uppercase or lowercase is inefficient, and even incorrect if a locale-specific case definition is used.

Decoders should not flag an error if the reserved bit is set to 1, however, as some future version of the PNG specification could define a meaning for this bit. It is sufficient to treat a chunk with this bit set in the same way as any other unknown chunk type.

Decoders do not need to test the chunk type private bit, since it has no functional significance and is used to avoid conflicts between chunks defined by W3C and those defined privately.

All ancillary chunks are optional; decoders may ignore them. However, decoders are encouraged to interpret these chunks when appropriate and feasible.

13.6 Pixel dimensions

Non-square pixels can be represented (see 11.3.4.3 pHYs Physical pixel dimensions ), but viewers are not required to account for them; a viewer can present any PNG datastream as though its pixels are square.

Where the pixel aspect ratio of the display differs from the aspect ratio of the physical pixel dimensions defined in the PNG datastream, viewers are strongly encouraged to rescale images for proper display.

When the pHYs chunk has a unit specifier of 0 (unit is unknown), the behaviour of a decoder may depend on the ratio of the two pixels-per-unit values, but should not depend on their magnitudes. For example, a pHYs chunk containing (ppuX, ppuY, unit) = (2, 1, 0) is equivalent to one containing (1000, 500, 0) ; both are equally valid indications that the image pixels are twice as tall as they are wide.

One reasonable way for viewers to handle a difference between the pixel aspect ratios of the image and the display is to expand the image either horizontally or vertically, but not both. The scale factors could be obtained using the following floating-point calculations:

Because other methods such as maintaining the image area are also reasonable, and because ignoring the pHYs chunk is permissible, authors should not assume that all viewing applications will use this scaling method.

As well as making corrections for pixel aspect ratio, a viewer may have reasons to perform additional scaling both horizontally and vertically. For example, a viewer might want to shrink an image that is too large to fit on the display, or to expand images sent to a high-resolution printer so that they appear the same size as they did on the display.

13.7 Text chunk processing

If practical, PNG decoders should have a way to display to the user all the iTXt , tEXt , and zTXt chunks found in the datastream. Even if the decoder does not recognize a particular text keyword, the user might be able to understand it.

When processing tEXt and zTXt chunks, decoders could encounter characters other than those permitted. Some can be safely displayed (e.g., TAB, FF, and CR, hexadecimal 09, 0C, and 0D, respectively), but others, especially the ESC character (hexadecimal 1B), could pose a security hazard (because unexpected actions may be taken by display hardware or software). Decoders should not attempt to directly display any non-Latin-1 characters (except for newline and perhaps TAB, FF, CR) encountered in a tEXt or zTXt chunk. Instead, they should be ignored or displayed in a visible notation such as " \nnn ". See 13.3 Security considerations .

When processing iTXt chunks, decoders should follow UTF-8 decode in Encoding Standard .

Even though encoders are recommended to represent newlines as linefeed (hexadecimal 0A), it is recommended that decoders not rely on this; it is best to recognize all the common newline combinations (CR, LF, and CR-LF) and display each as a single newline. TAB can be expanded to the proper number of spaces needed to arrive at a column multiple of 8.

Decoders running on systems with a non-Latin-1 legacy character encoding should remap character codes so that Latin-1 characters are displayed correctly. Unsupported characters should be replaced with a system-appropriate replacement character (such as U+FFFD REPLACEMENT CHARACTER, U+003F QUESTION MARK, or U+001A SUB) or mapped to a visible notation such as " \nnn ". Characters should be only displayed if they are printable characters on the decoding system. Some byte values may be interpreted by the decoding system as control characters; for security, decoders running on such systems should not display these control characters.

Decoders should be prepared to display text chunks that contain any number of printing characters between newline characters, even though it is recommended that encoders avoid creating lines in excess of 79 characters.

13.8 Decompression

The compression technique used in this specification does not require the entire compressed datastream to be available before decompression can start. Display can therefore commence before the entire decompressed datastream is available. It is extremely unlikely that any general purpose compression methods in future versions of this specification will not have this property.

It is important to emphasize that IDAT chunk boundaries have no semantic significance and can occur at any point in the compressed datastream. There is no required correlation between the structure of the image data (for example, scanline boundaries) and deflate block boundaries or IDAT chunk boundaries. The complete image data is represented by a single zlib datastream that is stored in some number of IDAT chunks; a decoder that assumes any more than this is incorrect. Some encoder implementations may emit datastreams in which some of these structures are indeed related, but decoders cannot rely on this.

13.9 Filtering

To reverse the effect of a filter, the decoder may need to use the decoded values of the prior pixel on the same line, the pixel immediately above the current pixel on the prior line, and the pixel just to the left of the pixel above. This implies that at least one scanline's worth of image data needs to be stored by the decoder at all times. Even though some filter types do not refer to the prior scanline, the decoder will always need to store each scanline as it is decoded, since the next scanline might use a filter type that refers to it. See 7.3 Filtering .

13.10 Interlacing and progressive display

Decoders are required to be able to read interlaced images. If the reference image contains fewer than five columns or fewer than five rows, some passes will be empty. Encoders and decoders shall handle this case correctly. In particular, filter type bytes are associated only with nonempty scanlines; no filter type bytes are present in an empty reduced image.

When receiving images over slow transmission links, viewers can improve perceived performance by displaying interlaced images progressively. This means that as each reduced image is received, an approximation to the complete image is displayed based on the data received so far. One simple yet pleasing effect can be obtained by expanding each received pixel to fill a rectangle covering the yet-to-be-transmitted pixel positions below and to the right of the received pixel. This process can be described by the following ISO C code [ ISO_9899 ]:

The function visit(row,column,height,width) obtains the next transmitted pixel and paints a rectangle of the specified height and width, whose upper-left corner is at the specified row and column, using the colour indicated by the pixel. Note that row and column are measured from 0,0 at the upper left corner.

If the viewer is merging the received image with a background image, it may be more convenient just to paint the received pixel positions (the visit() function sets only the pixel at the specified row and column, not the whole rectangle). This produces a "fade-in" effect as the new image gradually replaces the old. An advantage of this approach is that proper alpha or transparency processing can be done as each pixel is replaced. Painting a rectangle as described above will overwrite background-image pixels that may be needed later, if the pixels eventually received for those positions turn out to be wholly or partially transparent. This is a problem only if the background image is not stored anywhere offscreen.

13.11 Truecolour image handling

To achieve PNG's goal of universal interchangeability, decoders shall accept all types of PNG image: indexed-colour , truecolour , and greyscale . Viewers running on indexed-colour display hardware need to be able to reduce truecolour images to indexed-colour for viewing. This process is called "colour quantization".

A simple, fast method for colour quantization is to reduce the image to a fixed palette. Palettes with uniform colour spacing ("colour cubes") are usually used to minimize the per-pixel computation. For photograph-like images, dithering is recommended to avoid ugly contours in what should be smooth gradients; however, dithering introduces graininess that can be objectionable.

The quality of rendering can be improved substantially by using a palette chosen specifically for the image, since a colour cube usually has numerous entries that are unused in any particular image. This approach requires more work, first in choosing the palette, and second in mapping individual pixels to the closest available colour. PNG allows the encoder to supply suggested palettes, but not all encoders will do so, and the suggested palettes may be unsuitable in any case (they may have too many or too few colours). Therefore, high-quality viewers will need to have a palette selection routine at hand. A large lookup table is usually the most feasible way of mapping individual pixels to palette entries with adequate speed.

Numerous implementations of colour quantization are available. The PNG sample implementation, libpng ( http://www.libpng.org/pub/png/libpng.html ), includes code for the purpose.

13.12 Sample depth rescaling

Decoders may wish to scale PNG data to a lesser sample depth (data precision) for display. For example, 16-bit data will need to be reduced to 8-bit depth for use on most present-day display hardware. Reduction of 8-bit data to 5-bit depth is also common.

The most accurate scaling is achieved by the linear equation

output = floor((input * MAXOUTSAMPLE / MAXINSAMPLE) + 0.5)

MAXINSAMPLE = (2 sampledepth )-1 MAXOUTSAMPLE = (2 desired_sampledepth )-1

A slightly less accurate conversion is achieved by simply shifting right by (sampledepth - desired_sampledepth) places. For example, to reduce 16-bit samples to 8-bit, the low-order byte can be discarded. In many situations the shift method is sufficiently accurate for display purposes, and it is certainly much faster. (But if gamma correction is being done, sample rescaling can be merged into the gamma correction lookup table, as is illustrated in 13.13 Decoder gamma handling .)

If the decoder needs to scale samples up (for example, if the frame buffer has a greater sample depth than the PNG image), it should use linear scaling or left-bit-replication as described in 12.4 Sample depth scaling .

When an sBIT chunk is present, the reference image data can be recovered by shifting right to the sample depth specified by sBIT . Note that linear scaling will not necessarily reproduce the original data, because the encoder is not required to have used linear scaling to scale the data up. However, the encoder is required to have used a method that preserves the high-order bits, so shifting always works. This is the only case in which shifting might be said to be more accurate than linear scaling. A decoder need not pay attention to the sBIT chunk; the stored image is a valid PNG datastream of the sample depth indicated by the IHDR chunk; however, using sBIT to recover the original samples before scaling them to suit the display often yields a more accurate display than ignoring sBIT .

When comparing pixel values to tRNS chunk values to detect transparent pixels, the comparison shall be done exactly. Therefore, transparent pixel detection shall be done before reducing sample precision.

13.13 Decoder gamma handling

Viewers capable of full colour management will perform more sophisticated calculations than those described here.

For an image display program to produce correct tone reproduction, it is necessary to take into account the relationship between samples and display output, and the transfer function of the display system. This can be done by calculating:

sample = integer_sample / (2 sampledepth - 1.0) display_output = sample 1.0/gamma display_input = inverse_display_transfer(display_output) framebuf_sample = floor((display_input * MAX_FRAMEBUF_SAMPLE)+0.5)

where integer_sample is the sample value from the datastream, framebuf_sample is the value to write into the frame buffer , and MAX_FRAMEBUF_SAMPLE is the maximum value of a frame buffer sample (255 for 8-bit, 31 for 5-bit, etc). The first line converts an integer sample into a normalized floating point value (in the range 0.0 to 1.0), the second converts to a value proportional to the desired display output intensity, the third accounts for the display system's transfer function , and the fourth converts to an integer frame buffer sample. Zero raised to any positive power is zero.

A step could be inserted between the second and third to adjust display_output to account for the difference between the actual viewing conditions and the reference viewing conditions. However, this adjustment requires accounting for veiling glare, black mapping, and colour appearance models, none of which can be well approximated by power functions. Such calculations are not described here. If viewing conditions are ignored, the error will usually be small.

The display transfer function can typically be approximated by a power function with exponent display_exponent , in which case the second and third lines can be merged into:

display_input = sample 1.0/(gamma * display_exponent) = sample decoding_exponent

so as to perform only one power calculation. For colour images, the entire calculation is performed separately for R, G, and B values.

The gamma value can be taken directly from the gAMA chunk. Alternatively, an application may wish to allow the user to adjust the appearance of the displayed image by influencing the gamma value . For example, the user could manually set a parameter user_exponent which defaults to 1.0, and the application could set:

The user would set user_exponent greater than 1 to darken the mid-level tones, or less than 1 to lighten them.

A gAMA chunk containing zero is meaningless but could appear by mistake. Decoders should ignore it, and editors may discard it and issue a warning to the user.

It is not necessary to perform a transcendental mathematical computation for every pixel. Instead, a lookup table can be computed that gives the correct output value for every possible sample value. This requires only 256 calculations per image (for 8-bit accuracy), not one or three calculations per pixel. For an indexed-colour image, a one-time correction of the palette is sufficient, unless the image uses transparency and is being displayed against a nonuniform background.

If floating-point calculations are not possible, gamma correction tables can be computed using integer arithmetic and a precomputed table of logarithms. Example code appears in [ PNG-EXTENSIONS ].

When the incoming image has unknown gamma value ( gAMA , sRGB , and iCCP all absent), standalone image viewers should choose a likely default gamma value , but allow the user to select a new one if the result proves too dark or too light. The default gamma value may depend on other knowledge about the image, for example whether it came from the Internet or from the local system. For consistency, viewers for document formats such as HTML, or vector graphics such as SVG, should treat embedded or linked PNG images with unknown gamma value in the same way that they treat other untagged images.

In practice, it is often difficult to determine what value of display exponent should be used. In systems with no built-in gamma correction, the display exponent is determined entirely by the CRT . A display exponent of 2.2 should be used unless detailed calibration measurements are available for the particular CRT used.

Many modern frame buffers have lookup tables that are used to perform gamma correction, and on these systems the display exponent value should be the exponent of the lookup table and CRT combined. It may not be possible to find out what the lookup table contains from within the viewer application, in which case it may be necessary to ask the user to supply the display system's exponent value. Unfortunately, different manufacturers use different ways of specifying what should go into the lookup table, so interpretation of the system gamma value is system-dependent.

The response of real displays is actually more complex than can be described by a single number (the display exponent). If actual measurements of the monitor's light output as a function of voltage input are available, the third and fourth lines of the computation above can be replaced by a lookup in these measurements, to find the actual frame buffer value that most nearly gives the desired brightness.

13.14 Decoder colour handling

In many cases, the image data in PNG datastreams will be treated as device-dependent RGB values and displayed without modification (except for appropriate gamma correction). This provides the fastest display of PNG images. But unless the viewer uses exactly the same display hardware as that used by the author of the original image, the colours will not be exactly the same as those seen by the original author, particularly for darker or near-neutral colours. The cHRM chunk provides information that allows closer colour matching than that provided by gamma correction alone.

The cHRM data can be used to transform the image data from RGB to XYZ and thence into a perceptually linear colour space such as CIE LAB. The colours can be partitioned to generate an optimal palette, because the geometric distance between two colours in CIE LAB is strongly related to how different those colours appear (unlike, for example, RGB or XYZ spaces). The resulting palette of colours, once transformed back into RGB colour space, could be used for display or written into a PLTE chunk.

Decoders that are part of image processing applications might also transform image data into CIE LAB space for analysis.

In applications where colour fidelity is critical, such as product design, scientific visualization, medicine, architecture, or advertising, PNG decoders can transform the image data from source RGB to the display RGB space of the monitor used to view the image. This involves calculating the matrix to go from source RGB to XYZ and the matrix to go from XYZ to display RGB, then combining them to produce the overall transformation. The PNG decoder is responsible for implementing gamut mapping.

Decoders running on platforms that have a Colour Management System (CMS) can pass the image data , gAMA , and cHRM values to the CMS for display or further processing.

PNG decoders that provide colour printing facilities can use the facilities in Level 2 PostScript to specify image data in calibrated RGB space or in a device-independent colour space such as XYZ. This will provide better colour fidelity than a simple RGB to CMYK conversion. The PostScript Language Reference manual [ PostScript ] gives examples. Such decoders are responsible for implementing gamut mapping between source RGB (specified in the cHRM chunk) and the target printer. The PostScript interpreter is then responsible for producing the required colours.

PNG decoders can use the cHRM data to calculate an accurate greyscale representation of a colour image. Conversion from RGB to grey is simply a case of calculating the Y (luminance) component of XYZ, which is a weighted sum of R, G, and B values. The weights depend upon the monitor type, i.e. the values in the cHRM chunk. PNG decoders may wish to do this for PNG datastreams with no cHRM chunk. In this case, a reasonable default would be the CCIR 709 primaries [ ITU-R-BT.709 ]. The original NTSC primaries should not be used unless the PNG image really was colour-balanced for such a monitor.

13.15 Background colour

The background colour given by the bKGD chunk will typically be used to fill unused screen space around the image, as well as any transparent pixels within the image. (Thus, bKGD is valid and useful even when the image does not use transparency.) If no bKGD chunk is present, the viewer will need to decide upon a suitable background colour. When no other information is available, a medium grey such as 153 in the 8-bit sRGB colour space would be a reasonable choice. Transparent black or white text and dark drop shadows, which are common, would all be legible against this background.

Viewers that have a specific background against which to present the image (such as web browsers) should ignore the bKGD chunk, in effect overriding bKGD with their preferred background colour or background image.

The background colour given by the bKGD chunk is not to be considered transparent, even if it happens to match the colour given by the tRNS chunk (or, in the case of an indexed-colour image, refers to a palette index that is marked as transparent by the tRNS chunk). Otherwise one would have to imagine something "behind the background" to composite against. The background colour is either used as background or ignored; it is not an intermediate layer between the PNG image and some other background.

Indeed, it will be common that the bKGD and tRNS chunks specify the same colour, since then a decoder that does not implement transparency processing will give the intended display, at least when no partially-transparent pixels are present.

13.16 Alpha channel processing

The alpha channel can be used to composite a foreground image against a background image. The PNG datastream defines the foreground image and the transparency mask, but not the background image. PNG decoders are not required to support this most general case. It is expected that most will be able to support compositing against a single background colour.

The equation for computing a composited sample value is:

where alpha and the input and output sample values are expressed as fractions in the range 0 to 1. This computation should be performed with intensity samples (not gamma -encoded samples). For colour images, the computation is done separately for R, G, and B samples.

The following code illustrates the general case of compositing a foreground image against a background image. It assumes that the original pixel data are available for the background image, and that output is to a frame buffer for display. Other variants are possible; see the comments below the code. The code allows the sample depths and gamma values of foreground image and background image all to be different and not necessarily suited to the display system. In practice no assumptions about equality should be made without first checking.

This code is ISO C [ ISO_9899 ], with line numbers added for reference in the comments below.

Variations:

  • If output is to another PNG datastream instead of a frame buffer , lines 21, 22, 33, and 34 should be changed along the following lines /* * Gamma encode for storage in output datastream. * Convert to integer sample value. */ gamout = pow(comppix, outfile_gamma); outpix[i] = (int) (gamout * out_maxsample + 0.5 ); Also, it becomes necessary to process background pixels when alpha is zero, rather than just skipping pixels. Thus, line 15 will need to be replaced by copies of lines 17-23, but processing background instead of foreground pixel values.
  • If the sample depths of the output file, foreground file, and background file are all the same, and the three gamma values also match, then the no-compositing code in lines 14-23 reduces to copying pixel values from the input file to the output file if alpha is one, or copying pixel values from background to output file if alpha is zero. Since alpha is typically either zero or one for the vast majority of pixels in an image, this is a significant saving. No gamma computations are needed for most pixels.
  • When the sample depths and gamma values all match, it may appear attractive to skip the gamma decoding and encoding (lines 28-31, 33-34) and just perform line 32 using gamma -encoded sample values. Although this does not have too bad an effect on image quality, the time savings are small if alpha values of zero and one are treated as special cases as recommended here.
  • If the original pixel values of the background image are no longer available, only processed frame buffer pixels left by display of the background image, then lines 30 and 31 need to extract intensity from the frame buffer pixel values using code such as /* * Convert frame buffer value into intensity sample. */ gcvideo = (float) fbpix[i] / fb_maxsample; linbg = pow(gcvideo, display_exponent); However, some roundoff error can result, so it is better to have the original background pixels available if at all possible.
  • Note that lines 18-22 are performing exactly the same gamma computation that is done when no alpha channel is present. If the no-alpha case is handled with a lookup table, the same lookup table can be used here. Lines 28-31 and 33-34 can also be done with (different) lookup tables.
  • Integer arithmetic can be used instead of floating point, providing care is taken to maintain sufficient precision throughout.

NOTE In floating point, no overflow or underflow checks are needed, because the input sample values are guaranteed to be between 0 and 1, and compositing always yields a result that is in between the input values (inclusive). With integer arithmetic, some roundoff-error analysis might be needed to guarantee no overflow or underflow.

When displaying a PNG image with full alpha channel, it is important to be able to composite the image against some background, even if it is only black. Ignoring the alpha channel will cause PNG images that have been converted from an associated-alpha representation to look wrong. (Of course, if the alpha channel is a separate transparency mask, then ignoring alpha is a useful option: it allows the hidden parts of the image to be recovered.)

Even if the decoder does not implement true compositing logic, it is simple to deal with images that contain only zero and one alpha values. (This is implicitly true for greyscale and truecolour PNG datastreams that use a tRNS chunk; for indexed-colour PNG datastreams it is easy to check whether the tRNS chunk contains any values other than 0 and 255.) In this simple case, transparent pixels are replaced by the background colour, while others are unchanged.

If a decoder contains only this much transparency capability, it should deal with a full alpha channel by treating all nonzero alpha values as fully opaque or by dithering. Neither approach will yield very good results for images converted from associated-alpha formats, but this is preferable to doing nothing. Dithering full alpha to binary alpha is very much like dithering greyscale to black-and-white, except that all fully transparent and fully opaque pixels should be left unchanged by the dither.

13.17 Histogram and suggested palette usage

For viewers running on indexed-colour hardware attempting to display a truecolour image, or an indexed-colour image whose palette is too large for the frame buffer , the encoder may have provided one or more suggested palettes in sPLT chunks. If one of these is found to be suitable, based on size and perhaps name, the PNG decoder can use that palette. Suggested palettes with a sample depth different from what the decoder needs can be converted using sample depth rescaling (see 13.12 Sample depth rescaling ).

When the background is a solid colour, the viewer should composite the image and the suggested palette against that colour, then quantize the resulting image to the resulting RGB palette. When the image uses transparency and the background is not a solid colour, no suggested palette is likely to be useful.

For truecolour images, a suggested palette might also be provided in a PLTE chunk. If the image has a tRNS chunk and the background is a solid colour, the viewer will need to adapt the suggested palette for use with its desired background colour. To do this, the palette entry closest to the tRNS colour should be replaced with the desired background colour; or alternatively a palette entry for the background colour can be added, if the viewer can handle more colours than there are PLTE entries.

For images of colour type 6 ( truecolour with alpha ), any PLTE chunk should have been designed for display of the image against a uniform background of the colour specified by the bKGD chunk. Viewers should probably ignore the palette if they intend to use a different background, or if the bKGD chunk is missing. Viewers can use a suggested palette for display against a different background than it was intended for, but the results may not be very good.

If the viewer presents a transparent truecolour image against a background that is more complex than a uniform colour, it is unlikely that the suggested palette will be optimal for the composite image. In this case it is best to perform a truecolour compositing step on the truecolour PNG image and background image, then colour-quantize the resulting image.

In truecolour PNG datastreams, if both PLTE and sPLT chunks appear, the PNG decoder may choose from among the palettes suggested by both, bearing in mind the different transparency semantics described above.

The frequencies in the sPLT and hIST chunks are useful when the viewer cannot provide as many colours as are used in the palette in the PNG datastream. If the viewer has a shortfall of only a few colours, it is usually adequate to drop the least-used colours from the palette. To reduce the number of colours substantially, it is best to choose entirely new representative colours, rather than trying to use a subset of the existing palette. This amounts to performing a new colour quantization step; however, the existing palette and histogram can be used as the input data, thus avoiding a scan of the image data in the IDAT chunks.

If no suggested palette is provided, a decoder can develop its own, at the cost of an extra pass over the image data in the IDAT chunks. Alternatively, a default palette (probably a colour cube) can be used.

See also 12.5 Suggested palettes .

14. Editors

14.1 additional chunk types.

Authors are encouraged to look existing chunk types in both this specification and [ PNG-EXTENSIONS ] before considering introducing a new chunk types. The chunk types at [ PNG-EXTENSIONS ] are expected to be less widely supported than those defined in this specification.

14.2 Behaviour of PNG editors

Two examples of PNG editors are a program that adds or modifies text chunks, and a program that adds a suggested palette to a truecolour PNG datastream. Ordinary image editors are not PNG editors because they usually discard all unrecognized information while reading in an image.

To allow new chunk types to be added to PNG, it is necessary to establish rules about the ordering requirements for all chunk types. Otherwise a PNG editor does not know what to do when it encounters an unknown chunk.

EXAMPLE Consider a hypothetical new ancillary chunk type that is safe-to-copy and is required to appear after PLTE if PLTE is present. If a program attempts to add a PLTE chunk and does not recognize the new chunk, it may insert the PLTE chunk in the wrong place, namely after the new chunk. Such problems could be prevented by requiring PNG editors to discard all unknown chunks, but that is a very unattractive solution. Instead, PNG requires ancillary chunks not to have ordering restrictions like this.

To prevent this type of problem while allowing for future extension, constraints are placed on both the behaviour of PNG editors and the allowed ordering requirements for chunks. The safe-to-copy bit defines the proper handling of unrecognized chunks in a datastream that is being modified.

  • If a chunk's safe-to-copy bit is 1, the chunk may be copied to a modified PNG datastream whether or not the PNG editor recognizes the chunk type, and regardless of the extent of the datastream modifications.
  • If a chunk's safe-to-copy bit is 0, it indicates that the chunk depends on the image data . If the program has made any changes to critical chunks, including addition, modification, deletion, or reordering of critical chunks, then unrecognized unsafe chunks shall not be copied to the output PNG datastream. (Of course, if the program does recognize the chunk, it can choose to output an appropriately modified version.)
  • A PNG editor is always allowed to copy all unrecognized ancillary chunks if it has only added, deleted, modified, or reordered ancillary chunks. This implies that it is not permissible for ancillary chunks to depend on other ancillary chunks.
  • PNG editors shall terminate on encountering an unrecognized critical chunk type, because there is no way to be certain that a valid datastream will result from modifying a datastream containing such a chunk. (Simply discarding the chunk is not good enough, because it might have unknown implications for the interpretation of other chunks.) The safe/unsafe mechanism is intended for use with ancillary chunks. The safe-to-copy bit will always be 0 for critical chunks.

The rules governing ordering of chunks are as follows.

  • When copying an unknown unsafe-to-copy ancillary chunk, a PNG editor shall not move the chunk relative to any critical chunk. It may relocate the chunk freely relative to other ancillary chunks that occur between the same pair of critical chunks. (This is well defined since the editor shall not add, delete, modify, or reorder critical chunks if it is preserving unknown unsafe-to-copy chunks.)
  • When copying an unknown safe-to-copy ancillary chunk, a PNG editor shall not move the chunk from before IDAT to after IDAT or vice versa. (This is well defined because IDAT is always present.) Any other reordering is permitted.
  • When copying a known ancillary chunk type, an editor need only honour the specific chunk ordering rules that exist for that chunk type. However, it may always choose to apply the above general rules instead.

These rules are expressed in terms of copying chunks from an input datastream to an output datastream, but they apply in the obvious way if a PNG datastream is modified in place.

See also 5.4 Chunk naming conventions .

PNG editors that do not change the image data should not change the tIME chunk. The Creation Time keyword in the tEXt , zTXt , and iTXt chunks may be used for a user-supplied time.

14.3 Ordering of chunks

14.3.1 ordering of critical chunks.

Critical chunks may have arbitrary ordering requirements, because PNG editors are required to terminate if they encounter unknown critical chunks. For example IHDR has the specific ordering rule that it shall always appear first. A PNG editor, or indeed any PNG-writing program, shall know and follow the ordering rules for any critical chunk type that it can generate.

14.3.2 Ordering of ancillary chunks

The strictest ordering rules for an ancillary chunk type are:

  • Unsafe-to-copy chunks may have ordering requirements relative to critical chunks.
  • Safe-to-copy chunks may have ordering requirements relative to IDAT .

The actual ordering rules for any particular ancillary chunk type may be weaker. See for example the ordering rules for the standard ancillary chunk types in 5.6 Chunk ordering .

Decoders shall not assume more about the positioning of any ancillary chunk than is specified by the chunk ordering rules. In particular, it is never valid to assume that a specific ancillary chunk type occurs with any particular positioning relative to other ancillary chunks.

EXAMPLE It is unsafe to assume that a particular private ancillary chunk occurs immediately before IEND . Even if it is always written in that position by a particular application, a PNG editor might have inserted some other ancillary chunk after it. But it is safe to assume that the chunk will remain somewhere between IDAT and IEND .

15. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY , MUST , SHALL , SHOULD , and SHOULD NOT in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.

15.1 Conformance

15.2 introduction, 15.2.1 objectives.

This clause addresses conformance of PNG datastreams, PNG encoders, PNG decoders, and PNG editors .

The primary objectives of the specifications in this clause are:

  • to promote interoperability by eliminating arbitrary subsets of, or extensions to, this specification;
  • to promote uniformity in the development of conformance tests;
  • to promote consistent results across PNG encoders, decoders, and editors;
  • to facilitate automated test generation.

15.2.2 Scope

Conformance is defined for PNG datastreams and for PNG encoders, decoders, and editors.

This clause addresses the PNG datastream and implementation requirements including the range of allowable differences for PNG encoders, PNG decoders, and PNG editors . This clause does not directly address the environmental, performance, or resource requirements of the encoder, decoder, or editor.

The scope of this clause is limited to rules for the open interchange of PNG datastreams.

15.3 Conformance conditions

15.3.1 conformance of png datastreams.

A PNG datastream conforms to this specification if the following conditions are met.

  • The PNG datastream contains a PNG signature as the first content (see 5.2 PNG signature ).
  • the PNG datastream contains as its first chunk, an IHDR chunk, immediately following the PNG signature;
  • the PNG datastream contains as its last chunk, an IEND chunk.
  • No chunks or other content follow the IEND chunk.
  • All chunks contained therein match the specification of the corresponding chunk types of this specification. The PNG datastream shall obey the relationships among chunk types defined in this specification.
  • The sequence of chunks in the PNG datastream obeys the ordering relationship specified in this International Standard.
  • All field values in the PNG datastream obey the relationships specified in this specification producing the structure specified in this specification.
  • No chunks appear in the PNG datastream other than those specified in this specification or those defined according to the rules for creating new chunk types as defined in this specification.
  • The PNG datastream is encoded according to the rules of this International Standard.

15.3.2 Conformance of PNG encoders

A PNG encoder conforms to this specification if it satisfies the following conditions.

  • All PNG datastreams that are generated by the PNG encoder are conforming PNG datastreams.
  • When encoding input samples that have a sample depth that cannot be directly represented in PNG, the encoder scales the samples up to the next higher sample depth that is allowed by PNG. The data are scaled in such a way that the high-order bits match the original data.
  • Private field values are used when encoding experimental or private definitions of values for any of the method or type fields.

15.3.3 Conformance of PNG decoders

A PNG decoder conforms to this specification if it satisfies the following conditions.

  • It is able to read any PNG datastream that conforms to this International Standard, including both public and private chunks whose types may not be recognized.
  • It supports all the standardized critical chunks, and all the standardized compression, filter, and interlace methods and types in any PNG datastream that conforms to this International Standard.
  • Unknown chunk types are handled as described in 5.4 Chunk naming conventions . An unknown chunk type is not treated as an error unless it is a critical chunk.
  • Unexpected values in fields of known chunks (for example, an unexpected compression method in the IHDR chunk) are treated as errors.
  • All types of PNG images (indexed-colour, truecolour , greyscale , truecolour with alpha , and greyscale with alpha ) are processed. For example, decoders which are part of viewers running on indexed-colour display hardware shall reduce truecolour images to indexed format for viewing.
  • Encountering an unknown chunk in which the ancillary bit is 0 generates an error if the decoder is attempting to extract the image.
  • A chunk type in which the reserved bit is set is treated as an unknown chunk type.
  • All valid combinations of bit depth and colour type as defined in 11.2.1 IHDR Image header are supported.
  • An error is reported if an unrecognized value is encountered in the bit depth, colour type , compression method, filter method , or interlace method bytes of the IHDR chunk.
  • When processing 16-bit greyscale or truecolour data in the tRNS chunk, both bytes of the sample values are evaluated to determine whether a pixel is transparent.
  • When processing an image compressed by compression method 0, the decoder assumes no more than that the complete image data is represented by a single compressed datastream that is stored in some number of IDAT chunks.
  • No assumptions are made concerning the positioning of any ancillary chunk other than those that are specified by the chunk ordering rules.

15.3.4 Conformance of PNG editors

A PNG editor conforms to this specification if it satisfies the following conditions.

  • It conforms to the requirements for PNG encoders.
  • It conforms to the requirements for PNG decoders.
  • It is able to encode all chunks that it decodes.
  • It preserves the ordering of the chunks presented within the rules in 5.6 Chunk ordering .
  • It properly processes the safe-to-copy bit information and preserves unknown chunks when the safe-to-copy rules permit it.
  • Unless the user specifically permits lossy operations or the editor issues a warning, it preserves all information required to reconstruct the reference image exactly, except that the sample depth of the alpha channel need not be preserved if it contains only zero and maximum values. Operations such as changing the colour type or rearranging the palette in an indexed-colour datastream are permitted provided that the new datastream losslessly represents the same reference image.

A. Internet Media Types

A.1 image/png.

This updates the existing image/png Internet Media type, under the image top level type. This appendix is in conformance with BCP 13 and W3CRegMedia .

A PNG document is composed of a collection of explicitly typed "chunks". For each of the chunk types defined in the PNG specification (except for gIFx ), the only effect associated with those chunks is to cause an image to be rendered on the recipient's display or printer.

The gIFx chunk type is used to encapsulate Application Extension data, and some use of that data might present security risks, though no risks are known. Likewise, the security risks associated with future chunk types cannot be evaluated, particularly unregistered chunks. However, it is the intention of the PNG Working Group to disallow chunks containing "executable" data to become registered chunks.

The text chunks, tEXt , iTXT and zTXt , contain data that can be displayed in the form of comments, etc. Some operating systems or terminals might allow the display of textual data with embedded control characters to perform operations such as re-mapping of keys, creation of files, etc. For this reason, the specification recommends that the text chunks be filtered for control characters before direct display.

The PNG format is specifically designed to facilitate early detection of file transmission errors, and makes use of cyclical redundancy checks to ensure the integrity of the data contained in its chunks.

This registration updates the earlier one:

  • The old one points to an expired Internet Draft. This updated registration points to a W3C Recommendation.
  • The old contact person is sadly deceased. The new contact email is a publicly archived W3C mailing list for the PNG Working Group.
  • Change controller is W3C

A.2 image/apng

This appendix is in conformance with BCP 13 and W3CRegMedia .

An APNG document is composed of a collection of explicitly typed "chunks". For each of the chunk types defined in the PNG specification (except for gIFx ), the only effect associated with those chunks is to cause an animated image to be rendered on the recipient's display.

The text chunks, tEXt , iTXt and zTXt , contain data that can be displayed in the form of comments, etc. Some operating systems or terminals might allow the display of textual data with embedded control characters to perform operations such as re-mapping of keys, creation of files, etc. For this reason, the specification recommends that the text chunks be filtered for control characters before direct display.

If one creates an APNG file with unrelated static image and animated image chunks, somebody using a tool not supporting the APNG format would only see the static image and be unaware of the additional content. This could be used e.g. to bypass moderation.

image/apng has been in widespread, unregistered use since 2015. Animated PNG was not part of the official PNG specification until 2022. This registration, plus the PNG specification (3rd Edition) brings official documentation into alignment with already widely-deployed reality.

B. Guidelines for private chunk types

The following specifies guidelines for the definition of private chunks:

  • Do not define new chunks that redefine the meaning of existing chunks or change the interpretation of an existing standardized chunk, e.g., do not add a new chunk to say that RGB and alpha values actually mean CMYK.
  • Minimize the use of private chunks to aid portability.
  • Avoid defining chunks that depend on total datastream contents. If such chunks have to be defined, make them critical chunks.
  • For textual information that is representable in Latin-1 avoid defining a new chunk type. Use a tEXt or zTXt chunk with a suitable keyword to identify the type of information. For textual information that is not representable in Latin-1 but which can be represented in UTF-8, use an iTXt chunk with a suitable keyword.
  • Group mutually dependent ancillary information into a single chunk. This avoids the need to introduce chunk ordering relationships.
  • Avoid defining private critical chunks.

C. Gamma and chromaticity

A gamma value is a numerical parameter used to describe approximations to certain non-linear transfer functions encountered in image capture and reproduction. The gamma value is the exponent in a power law function. For example the function:

intensity = (voltage + constant) exponent

which is used to model the non-linearity of CRT displays. It is often assumed, as in this International Standard, that the constant is zero.

For the purposes of this specification, it is convenient to consider five places in a general image pipeline at which non-linear transfer functions may occur and which may be modelled by power laws. The characteristic exponent associated with each is given a specific name.

It is convenient to define some additional entities that describe some composite transfer functions , or combinations of stages.

The PNG gAMA chunk is used to record the gamma value . This information may be used by decoders together with additional information about the display environment in order to achieve, or approximate, the desired display output.

Additional information about this subject may be found [ GAMMA-FAQ ].

Additional information on the impact of color space on image encoding may be found in [ Kasson ] and [ Hill ].

Background information about chromaticity and colour spaces may be found in [ Luminance-Chromaticity ] and [ COLOR-FAQ ].

D. Sample CRC implementation

The following sample code — which is informative — represents a practical implementation of the CRC (Cyclic Redundancy Check) employed in PNG chunks. (See also ISO 3309 [ ISO-3309 ] or ITU-T V.42 [ ITU-T-V.42 ] for a formal specification.)

The sample code is in the ISO C [ ISO_9899 ] programming language. The hints in Table 30 may help non-C users to read the code more easily.

E. Online resources

This annex gives the locations of some Internet resources for PNG software developers. By the nature of the Internet, the list is incomplete and subject to change.

E.1 ICC profile specifications

ICC profile specifications are available at: https://www.color.org/

E.2 PNG web site

There is a World Wide Web site for PNG at http://www.libpng.org/pub/png/ . This page is a central location for current information about PNG and PNG-related tools.

Additional documentation and portable C code for deflate , and an optimized implementation of the CRC algorithm are available from the zlib web site, https://www.zlib.net/ .

E.3 Sample implementation and test images

A sample implementation in portable C, libpng , is available at http://www.libpng.org/pub/png/libpng.html . Sample viewer and encoder applications of libpng are available at http://www.libpng.org/pub/png/book/sources.html and are described in detail in PNG: The Definitive Guide [ ROELOFS ]. Test images can also be accessed from the PNG web site.

F.1 Changes since the Working Draft of 20 July 2023 (Third Edition)

  • Added guidance on calculating MaxCLL and MaxFALL values
  • Added example (live streaming) where cLLi could not be pre-calculated
  • Added definitions for stop, SDR, HDR, HLG and PQ
  • Clarified definition of narrow-range
  • Updated ITU-T H Suppl. 19 reference to latest version
  • Added Simon Thompson as an author
  • Mandated current browser handling of out-of-range palette indices
  • Updated "Additional Information" table to add mDCv and cLLi .

F.2 Changes since the First Public Working Draft of 25 October 2022 (Third Edition)

  • Explained preferable handling of trailing bytes in the final IDAT chunk for encoders and decoders.
  • Linked to open issue on tone-mapping HDR [ ITU-R-BT.2100 ] images in the presence of mDCv .
  • Follow the Encoding Standard on UTF-8 encode and decode.
  • Added definition of a frame.
  • Required the Matrix Coefficients in cICP to be zero (RGB data).
  • Added known Privacy issue with recoverable data that only appears to have been redacted.
  • Improved advice on choosing filters.
  • Added links to colour image type definitions.
  • Clarified that MaxFALL uses the values of the frame with highest mean luminance.
  • Clarified luminance units.
  • Prefer RFC 3339 format for Creation Time.
  • Improved the definition of mDCv , with better descriptions, default values, and reference to SMPTE standards.
  • Refactored the terms and definitions, for clarity.
  • Improved definitions of source, reference, and PNG images.
  • Moved concepts from the terms and definitions section to the main prose.
  • Corrected error in eXIf chunk, which conflicted with the chunk ordering section.
  • Simplified Scope section to remove redundant detail described elsewhere.
  • Redrew chunk-ordering lattice diagrams to be clearer and more consistent.
  • Added a new chunk, cLLi , to describe the Maximum Single-Pixel and Frame-Average Luminance Levels for both static and animated HDR [ ITU-R-BT.2100 ] PNG images.
  • Updated external links to latest versions, preferring https over http.
  • Specified interoperable handling of extra sample bits, beyond the specified bit depth, in tRNS and bKGD chunks.
  • Added a new chunk, mDCv to describe the color volume of the mastering display used to grade HDR [ ITU-R-BT.2100 ] content.
  • Used correct Unicode character names.
  • Changed chunk type codes to use hexadecimal, rather than decimal.
  • Described textual chunk processing more clearly.
  • Recommended iTXt for new content.
  • Clarifications on the language tag field of the iTXt chunk, corrected examples to conform to BCP47.
  • Updated image/apng registration appendix. APNG MIME type registered with IANA.
  • Converted ACII-art figures to more accessible diagrams.

F.3 Changes since the W3C Recommendation of 10 November 2003 (PNG Second Edition)

  • The three previously defined, but unofficial, chunks for APNG have been added. This brings the PNG specification into alignment with widely deployed industry practice.
  • Added the cICP chunk, Coding-independent code points for video signal type identification, to contain image format metadata defined in [ ITU-T-H.273 ] which enables PNG to contain [ ITU-R-BT.2100 ] High Dynamic Range ( HDR ) and Wide Colour Gamut (WCG) images.
  • The previously defined eXIf chunk has been moved from the PNG-Extensions document [ PNG-EXTENSIONS ] into the main body of this specification, to reflect it's widespread use.
  • Added the mDCv chunk, which contains metadata about the display used in mastering. This enabled more accurate colour matching on heterogeneous platforms
  • Incorporation of all PNG Second Edition Errata
  • Various editorial clarifications in response to community feedback
  • References updated to latest versions
  • Markup corrections and link fixes
  • Document source reformatted to use ReSpec

F.4 Changes between First and Second Editions

For the list of changes between W3C Recommendation PNG Specification Version 1.0 and PNG Second Edition , see PNG Second Edition changelist

G. References

G.1 normative references, g.2 informative references.

Referenced in:

  • § 3. Terms, definitions, and abbreviated terms (2) (3) (4)
  • § 3. Terms, definitions, and abbreviated terms
  • § 3. Terms, definitions, and abbreviated terms (2)
  • § 4.7 Additional information
  • § 11.3.4.4 sPLT Suggested palette
  • § C. Gamma and chromaticity
  • § 4.9.3 Output buffer
  • § 6.2 Alpha representation (2)
  • § 11.3.6.2 fcTL Frame Control Chunk
  • § 12.5 Suggested palettes (2) (3)
  • § 13.15 Background colour
  • § 13.16 Alpha channel processing (2) (3)
  • § 13.17 Histogram and suggested palette usage (2)
  • § 5.2 PNG signature
  • § 10.1 Compression method 0 (2) (3) (4) (5) (6) (7)
  • § 10.2 Compression of the sequence of filtered scanlines (2)
  • § 11.2.1 IHDR Image header
  • § 11.3.2.3 iCCP Embedded ICC profile
  • § 11.3.3.1 Keywords and text strings
  • § 11.3.3.3 zTXt Compressed textual data
  • § 11.3.3.4 iTXt International textual data
  • § 13.1 Error handling
  • § 13.8 Decompression
  • § E.2 PNG web site
  • § Introduction
  • § 11.3.2.8 cLLi Content Light Level Information (2)
  • § 13.12 Sample depth rescaling
  • § 13.13 Decoder gamma handling (2) (3) (4) (5)
  • § 13.16 Alpha channel processing (2) (3) (4)
  • § 13.17 Histogram and suggested palette usage
  • § C. Gamma and chromaticity (2)
  • § 4.9.1 Structure (2)
  • § 4.3 Colour spaces
  • § 11.3.2.2 gAMA Image gamma (2) (3)
  • § 12.1 Encoder gamma handling (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
  • § 13.13 Decoder gamma handling (2) (3) (4) (5) (6) (7)
  • § C. Gamma and chromaticity (2) (3)
  • § 4.3 Colour spaces (2)
  • § 12.1 Encoder gamma handling (2) (3) (4) (5) (6) (7) (8)
  • § 12.2 Encoder colour handling
  • § 13.12 Sample depth rescaling (2)
  • § 13.13 Decoder gamma handling (2) (3) (4)
  • § 13.14 Decoder colour handling (2)
  • § 13.16 Alpha channel processing (2) (3) (4) (5)
  • § 11.3.2.7 mDCv Mastering Display Color Volume (2) (3) (4)
  • § 11.3.2.8 cLLi Content Light Level Information
  • § F.2 Changes since the First Public Working Draft of 25 October 2022 (Third Edition) (2) (3)
  • § F.3 Changes since the W3C Recommendation of 10 November 2003 (PNG Second Edition)
  • § 11.3.2.6 cICP Coding-independent code points for video signal type identification
  • § 11.3.2.7 mDCv Mastering Display Color Volume
  • § 11.3.2.6 cICP Coding-independent code points for video signal type identification (2) (3) (4) (5) (6)
  • § 4.6.3 Filtering
  • § 7.3 Filtering (2)
  • § 9.2 Filter types for filter method 0
  • § 10.3 Other uses of compression
  • § 11.2.1 IHDR Image header (2) (3)
  • § 11.2.2 PLTE Palette
  • § 11.2.3 IDAT Image data
  • § 11.3.4.5 eXIf Exchangeable Image File (Exif) Profile (2)
  • § 11.3.5.1 tIME Image last-modification time
  • § 11.3.6.3 fdAT Frame Data Chunk
  • § 12.4 Sample depth scaling
  • § 12.5 Suggested palettes
  • § 12.9 Text chunk processing (2)
  • § 13.8 Decompression (2)
  • § 13.9 Filtering
  • § 13.14 Decoder colour handling (2) (3) (4) (5) (6)
  • § 14.2 Behaviour of PNG editors (2)
  • § 15.3.3 Conformance of PNG decoders
  • § 4.5 PNG image
  • § 10.1 Compression method 0 (2) (3) (4)
  • § 11.3.2.6 cICP Coding-independent code points for video signal type identification (2) (3)
  • § 7.1 Integers and byte order
  • § 7.2 Scanlines
  • § 11.3.2.7 mDCv Mastering Display Color Volume (2)
  • § 4.2 Images
  • § 5.4 Chunk naming conventions (2)
  • § 12.5 Suggested palettes (2)
  • § 14.2 Behaviour of PNG editors (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
  • § 14.3.1 Ordering of critical chunks
  • § 14.3.2 Ordering of ancillary chunks
  • § 15.2.1 Objectives
  • § 15.2.2 Scope
  • § 15.3.4 Conformance of PNG editors
  • § 5.3 Chunk layout
  • § 11.3.2.1 cHRM Primary chromaticities and white point
  • § 11.3.2.2 gAMA Image gamma
  • § 11.3.4.3 pHYs Physical pixel dimensions (2)
  • § 11.3.6.1 acTL Animation Control Chunk
  • § 11.3.6.2 fcTL Frame Control Chunk (2) (3)
  • § 4.5 PNG image (2) (3) (4) (5) (6)
  • § 7.3 Filtering
  • § 11.3.2.7 mDCv Mastering Display Color Volume (2) (3) (4) (5) (6)
  • § 12.1 Encoder gamma handling (2) (3) (4) (5)
  • § 13.13 Decoder gamma handling (2) (3)
  • § C. Gamma and chromaticity (2) (3) (4) (5) (6) (7)
  • § 11.3.2.5 sRGB Standard RGB colour space
  • § 12.2 Encoder colour handling (2)
  • § 10.2 Compression of the sequence of filtered scanlines (2) (3) (4) (5)
  • § 4.1 Static and Animated images (2) (3)
  • § 11.3.6.3 fdAT Frame Data Chunk (2)
  • Not referenced in this document.
  • § 4.1 Static and Animated images
  • § 4.5 PNG image (2) (3) (4) (5) (6) (7)
  • § 4.5 PNG image (2)
  • § 5.1 PNG datastream
  • § 4.2 Images (2)
  • § 6.2 Alpha representation
  • § 1. Introduction
  • § 4.4.2 Indexing (2) (3)
  • § 4.5 PNG image (2) (3)
  • § 11.3.1.1 tRNS Transparency
  • § 11.3.2.4 sBIT Significant bits
  • § 11.3.4.1 bKGD Background colour
  • § 12.7 Filter selection
  • § 13.11 Truecolour image handling
  • § 13.16 Alpha channel processing
  • § 4.4.5 Sample depth scaling
  • § 6.1 Colour types and values (2)
  • § 11.2.2 PLTE Palette (2) (3)
  • § 11.3.1.1 tRNS Transparency (2)
  • § 11.3.4.2 hIST Image histogram
  • § 12.5 Suggested palettes (2) (3) (4)
  • § 12.7 Filter selection (2) (3)
  • § 13.11 Truecolour image handling (2)
  • § 13.17 Histogram and suggested palette usage (2) (3) (4) (5) (6)
  • § 14.2 Behaviour of PNG editors
  • § 15.3.3 Conformance of PNG decoders (2) (3)
  • § 6.1 Colour types and values
  • § 12.7 Filter selection (2)
  • § 15.3.3 Conformance of PNG decoders (2)
  • § 4.5 PNG image (2) (3) (4)
  • § 4.9.1 Structure
  • § 4.9.3 Output buffer (2)
  • § 4.11 Extensions
  • § 5.8 Private field values (2) (3)
  • § 15.3.2 Conformance of PNG encoders
  • § 7.2 Scanlines (2) (3) (4)
  • § 11.2.1 IHDR Image header (2) (3) (4) (5)
  • § 11.2.2 PLTE Palette (2) (3) (4) (5)
  • § 11.3.1.1 tRNS Transparency (2) (3) (4) (5) (6)
  • § 11.3.2.3 iCCP Embedded ICC profile (2)
  • § 11.3.4.1 bKGD Background colour (2) (3)
  • § 12.5 Suggested palettes (2) (3) (4) (5) (6)
  • § 9.1 Filter methods and filter types (2) (3) (4) (5)
  • § 9.2 Filter types for filter method 0 (2) (3)
  • § 12.1 Encoder gamma handling (2) (3) (4)

The Matroska Media Container logo

  • What is Matroska?
  • mkvalidator
  • Third-party applications
  • Source code repositories
  • Data Layout
  • Element Specifications
  • Specification Notes
  • Element Ordering
  • Chapter Codecs
  • Attachments
  • Codec Mapping
  • Block Additional Mappings
  • Audio Examples
  • Video Examples
  • Tags Precedence
  • Implementation Recommendations
  • Contributors
  • Legal aspects
  • Logos & trademarks

Because Matroska is a general container format, we try to avoid specifying the formats to store in it. This type of work is really outside of the scope of a container-only format. However, because the use of subtitles in A/V containers has been so limited (with the exception of DVD) we are taking the time to specify how to store some of the more common subtitle formats in Matroska. This is being done to help facilitate their growth. Otherwise, incompatibilities could prevent the standardization and use of subtitle storage.

This page is not meant to be a complete listing of all subtitle formats that will be used in Matroska, it is only meant to be a guide for the more common, current formats. It is possible that we will add future formats to this page as they are created, but it is not likely as any other new subtitle format designer would likely have their own specifications. Any specification listed here SHOULD be strictly adhered to or it SHOULD NOT use the corresponding Codec ID.

Here is a list of pointers for storing subtitles in Matroska:

  • Any Matroska file containing only subtitles SHOULD use the extension “.mks”.
  • As a general rule of thumb for all codecs, information that is global to an entire stream SHOULD be stored in the CodecPrivate element.
  • Start and stop timestamps that are used in a timestamps native storage format SHOULD be removed when being placed in Matroska as they could interfere if the file is edited afterwards. Instead, the Blocks timestamp and Duration SHOULD be used to say when the timestamp is displayed.
  • Because a “subtitle” stream is actually just an overlay stream, anything with a transparency layer could be use, including video.

Images Subtitles

The first image format that is a goal to import into Matroska is the VobSub subtitle format. This subtitle type is generated by exporting the subtitles from a DVD [@?DVD-Video].

The requirement for muxing VobSub into Matroska is v7 subtitles (see first line of the .IDX file). If the version is smaller, you must remux them using the SubResync utility from VobSub 2.23 (or MPC) into v7 format. Generally any newly created subs will be in v7 format.

The .IFO file will not be used at all.

If there is more than one subtitle stream in the VobSub set, each stream will need to be separated into separate tracks for storage in Matroska. E.g. the VobSub file contains streams for both English and German subtitles. Then the resulting Matroska file SHOULD contain two tracks. That way the language information can be dropped and mapped to Matroska’s language tags.

The .IDX file is reformatted (see below) and placed in the CodecPrivate.

Each .BMP will be stored in its own Block. The Timestamp with be stored in the Blocks Timestamp and the duration will be stored in the Default Duration.

Here is an example .IDX file:

First, lines beginning with “#” are removed. These are comments to make text file editing easier, and as this is not a text file, they aren’t needed.

Next remove the “langidx” and “id” lines. These are used to differentiate the subtitle streams and define the language. As the streams will be stored separately anyway, there is no need to differentiate them here. Also, the language setting will be stored in the Matroska tags, so there is no need to store it here.

Finally, the “timestamp” will be used to set the Block’s timestamp. Once it is set there, there is no need for it to be stored here. Also, as it may interfere if the file is edited, it SHOULD NOT be stored here.

Once all of these items are removed, the data to store in the CodecPrivate SHOULD look like this:

There SHOULD also be two Blocks containing one image each with the timestamps “00:00:01:101” and “00:00:08:708”.

SRT Subtitles

SRT is perhaps the most basic of all subtitle formats.

It consists of four parts, all in text:

1. A number indicating which subtitle it is in the sequence. 2. The time that the subtitle appears on the screen, and then disappears. 3. The subtitle itself. 4. A blank line indicating the start of a new subtitle.

When placing SRT in Matroska, part 3 is converted to UTF-8 (S_TEXT/UTF8) and placed in the data portion of the Block. Part 2 is used to set the timestamp of the Block, and BlockDuration element. Nothing else is used.

Here is an example SRT file:

In this example, the text “Senator, we’re making our final approach into Coruscant.” would be converted into UTF-8 and placed in the Block. The timestamp of the block would be set to “00:02:17,440”. And the BlockDuration element would be set to “00:00:02,935”.

The same is repeated for the next subtitle.

Because there are no general settings for SRT, the CodecPrivate is left blank.

SSA/ASS Subtitles

SSA stands for Sub Station Alpha. It’s the file format used by the popular subtitle editor, SubStation Alpha . This format is widely used by fansubbers.

It allows you to do some advanced display features, like positioning, karaoke, style managements…

For detailed information on SSA/ASS, see the SSA specs . It includes an SSA specs description and the advanced features added by ASS format (standing for Advanced SSA). Because SSA and ASS are so similar, they are treated the same here.

Like SRT, this format is text based with a particular syntax.

A file consists of 4 or 5 parts, declared ala INI file (but it’s not an INI !)

The first, “[Script Info]” contains some information about the subtitle file, such as it’s title, who created it, type of script and a very important one: “PlayResY”. Be careful of this value, everything in your script (font size, positioning) is scaled by it. Sub Station Alpha uses your desktops Y resolution to write this value, so if a friend with a large monitor and a high screen resolution gives you an edited script, you can mess everything up by saving the script in SSA with your low-cost monitor.

The second, “[V4 Styles]”, is a list of style definitions. A style describe how will look a text on the screen. It defines font, font size, primary/…/outile colour, position, alignment, etc.

For example this:

The third, “[Events]”, is the list of text you want to display at the right timing. You can specify some attribute here. Like the style to use for this event ( MUST be defined in the list), the position of the text (Left, Right, Vertical Margin), an effect. Name is mostly used by translator to know who said this sentence. Timing is in h:mm:ss.cc (centisec).

“[Pictures]” or “[Fonts]” part can be found in some SSA file, they contains UUE-encoded pictures/font but those features are only used by Sub Station Alpha – i.e. no filter (Vobsub/Avery Lee Subtiler filter) use them.

Now, how are they stored in Matroska?

  • All text is converted to UTF-8
  • All the headers are stored in CodecPrivate (Script Info and the Styles list)
  • Start & End field are used to set TimeStamp and the BlockDuration element. the data stored is:
  • Events are stored in the Block in this order: ReadOrder, Layer, Style, Name, MarginL, MarginR, MarginV, Effect, Text (Layer comes from ASS specs … it’s empty for SSA.) “ReadOrder field is needed for the decoder to be able to reorder the streamed samples as they were placed originally in the file.”

Here is an example of an SSA file.

Here is what would be placed into the CodecPrivate element.

And here are the two blocks that would be generated.

Block’s timestamp: 00:02:40.650 BlockDuration: 00:00:01.140

Block’s timestamp: 00:02:42.420 BlockDuration: 00:00:01.730

The “Web Video Text Tracks Format” (short: WebVTT) is developed by the World Wide Web Consortium (W3C) . Its specifications are freely available .

The guiding principles for the storage of WebVTT in Matroska are:

  • Consistency: store data in a similar way to other subtitle codecs
  • Simplicity: making decoding and remuxing as easy as possible for existing infrastructures
  • Completeness: keeping as much data as possible from the original WebVTT file

Storage of WebVTT in Matroska

Codecid: codec identification.

The CodecID to use is S_TEXT/WEBVTT .

CodecPrivate: storage of global WebVTT blocks

This element contains all global blocks before the first subtitle entry. This starts at the “ WEBVTT ” file identification marker but excludes the optional byte order mark.

Storage of non-global WebVTT blocks

Non-global WebVTT blocks (e.g., “NOTE”) before a WebVTT Cue Text are stored in Matroska’s BlockAddition element together with the Matroska Block containing the WebVTT Cue Text these blocks precede (see below for the actual format).

Storage of Cues in Matroska blocks

Each WebVTT Cue Text is stored directly in the Matroska Block.

A muxer MUST change all WebVTT Cue Timestamps present within the Cue Text to be relative to the Matroska Block’s timestamp.

The Cue’s start timestamp is used as the Matroska Block’s timestamp.

The difference between the Cue’s end timestamp and its start timestamp is used as the Matroska Block’s duration.

BlockAdditions: storing non-global WebVTT blocks, Cue Settings Lists and Cue identifiers

Each Matroska Block may be accompanied by one BlockAdditions element. Its format is as follows:

  • The first line contains the WebVTT Cue Text’s optional Cue Settings List followed by one line feed character (U+0x000a). The Cue Settings List may be empty, in which case the line consists of the line feed character only.
  • The second line contains the WebVTT Cue Text’s optional Cue Identifier followed by one line feed character (U+0x000a). The line may be empty indicating that there was no Cue Identifier in the source file, in which case the line consists of the line feed character only.
  • The third and all following lines contain all WebVTT Comment Blocks that precede the current WebVTT Cue Block. These may be absent.

If there is no Matroska BlockAddition element stored together with the Matroska Block, then all three components (Cue Settings List, Cue Identifier, Cue Comments) MUST be assumed to be absent.

Examples of transformation

Here’s an example how a WebVTT is transformed.

Example WebVTT file

Let’s take the following example file:

Example of CodecPrivate

The resulting CodecPrivate element will look like this:

Storage of Cue 1

Example Cue 1: timestamp 00:00:00.000, duration 00:00:10.000, Block’s content:

BlockAddition’s content starts with one empty line as there’s no Cue Settings List:

Storage of Cue 2

Example Cue 2: timestamp 00:00:25.000, duration 00:00:10.000, Block’s content:

BlockAddition’s content starts with two empty lines as there’s neither a Cue Settings List nor a Cue Identifier:

Storage of Cue 3

Example Cue 3: timestamp 00:01:03.000, duration 00:00:03.500, Block’s content:

BlockAddition’s content ends with an empty line as there’s no Cue Identifier and there were no WebVTT Comment blocks:

Storage of Cue 4

Example Cue 4: timestamp 00:03:10.000, duration 00:00:10.000, Block’s content:

Example entry 4: Entries can even include timestamps. For example:<00:00:05.000>This becomes visible five seconds after the first part.

This Block does not need a BlockAddition as the Cue did not contain an Identifier, nor a Settings List, and it wasn’t preceded by Comment blocks.

Storage of WebVTT in Matroska vs. WebM

Note: the storage of WebVTT in Matroska is not the same as the design document for storage of WebVTT in WebM. There are several reasons for this including but not limited to: the WebM document is old (from February 2012) and was based on an earlier draft of WebVTT and ignores several parts that were added to WebVTT later; WebM does still not support subtitles at all ; the proposal suggests splitting the information across multiple tracks making demuxer’s and remuxer’s life very difficult.

HDMV presentation graphics subtitles

The specifications for the HDMV presentation graphics subtitle format (short: HDMV PGS) can be found in the document “Blu-ray Disc Read-Only Format; Part 3 — Audio Visual Basic Specifications” in section 9.14 “HDMV graphics streams”.

Storage of HDMV presentation graphics subtitles

The CodecID to use is S_HDMV/PGS . A CodecPrivate element is not used.

Storage of HDMV PGS Segments in Matroska Blocks

Each HDMV PGS Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.14.2.1 “Segment coding structure and parameters” of the Blu-ray specifications.

Each Segment contains a presentation timestamp. This timestamp will be used as the timestamp for the Matroska Block.

A Segment is normally shown until a subsequent Segment is encountered. Therefore the Matroska Block MAY have no Duration. In that case, a player MUST display a Segment within a Matroska Block until the next Segment is encountered.

A muxer MAY use a Duration, e.g., by calculating the distance between two subsequent Segments. If a Matroska Block has a Duration, a player MUST display that Segment only for the duration of the Block’s Duration.

HDMV text subtitles

The specifications for the HDMV text subtitle format (short: HDMV TextST) can be found in the document “Blu-ray Disc Read-Only Format; Part 3 — Audio Visual Basic Specifications” in section 9.15 “HDMV text subtitle streams”.

Storage of HDMV text subtitles

The CodecID to use is S_HDMV/TEXTST .

A CodecPrivate Element is required. It MUST contain the stream’s Dialog Style Segment as described in section 9.15.4.2 “Dialog Style Segment” of the Blu-ray specifications.

Storage of HDMV TextST Dialog Presentation Segments in Matroska Blocks

Each HDMV Dialog Presentation Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.15.4.3 “Dialog presentation segment” of the Blu-ray specifications.

Each Segment contains a start and an end presentation timestamp (short: start PTS & end PTS). The start PTS will be used as the timestamp for the Matroska Block. The Matroska Block MUST have a Duration, and that Duration is the difference between the end PTS and the start PTS.

A player MUST use the Matroska Block’s timestamp and Duration instead of the Segment’s start and end PTS for determining when and how long to show the Segment.

Character set

When TextST subtitles are stored inside Matroska, the only allowed character set is UTF-8.

Each HDMV text subtitle stream in a Blu-ray can use one of a handful of character sets. This information is not stored in the MPEG2 Transport Stream itself but in the accompanying Clip Information file.

Therefore a muxer MUST parse the accompanying Clip Information file. If the information indicates a character set other than UTF-8, it MUST re-encode all text Dialog Presentation Segments from the indicated character set to UTF-8 prior to storing them in Matroska.

Digital Video Broadcasting (DVB) subtitles

The specifications for the Digital Video Broadcasting subtitle bitstream format (short: DVB subtitles) can be found in the document “ETSI EN 300 743 - Digital Video Broadcasting (DVB); Subtitling systems”. The storage of DVB subtitles in MPEG transport streams is specified in the document “ETSI EN 300 468 - Digital Video Broadcasting (DVB); Specification for Service Information (SI) in DVB systems”.

Storage of DVB subtitles

The CodecID to use is S_DVBSUB .

CodecPrivate

The CodecPrivate element is five bytes long and has the following structure:

  • 2 bytes: composition page ID (bit string, left bit first)
  • 2 bytes: ancillary page ID (bit string, left bit first)
  • 1 byte: subtitling type (bit string, left bit first)

The semantics of these bytes are the same as the ones described in section 6.2.41 “Subtitling descriptor” of ETSI EN 300 468.

Storage of DVB subtitles in Matroska Blocks

Each Matroska Block consists of one or more DVB Subtitle Segments as described in segment 7.2 “Syntax and semantics of the subtitling segment” of ETSI EN 300 743.

Each Matroska Block SHOULD have a Duration indicating how long the DVB Subtitle Segments in that Block SHOULD be displayed.

ARIB (ISDB) subtitles

The specifications for the ARIB B-24 subtitle bitstream format (short: ARIB subtitles) and its storage in MPEG transport streams can be found in the documents [@!ARIB.STD-B24], [@!ARIB.STD-B10], and [@!ARIB.TR-B14].

Storage of ARIB subtitles

The CodecID to use is S_ARIBSUB .

The CodecPrivate element is three bytes long and has the following structure:

  • 1 byte: component tag (bit string, left bit first)
  • 2 bytes: data component ID (bit string, left bit first)

The semantics of the component tag are the same as those described in [@!ARIB.STD-B10], part 2, Annex J. The semantics of the data component ID are the same as those described in [@!ARIB.TR-B14], fascicle 2, Vol. 3, Section 2, 4.2.8.1.

Storage of ARIB subtitles in Matroska Blocks

Each Matroska Block consists of a single synchronized PES data structure as described in chapter 5 “Independent PES transmission protocol” of [@!ARIB.STD-B24], volume 3, with a Synchronized_PES_data_byte block containing one or more ISDB Caption Data Groups as described in chapter 9 “Transmission of caption and superimpose” of [@!ARIB.STD-B24], volume 1, part 3. All of the Caption Statement Data Groups in a given Matroska Track MUST use the same language index.

A Data Group is normally shown until a subsequent Group provides instructions to clear it. Therefore the Matroska Block SHOULD NOT have a Duration. A player SHOULD display a Data Group within a Matroska Block until its internal duration elapses, or until a subsequent Data Group removes it.

Optimizing Stream Graphics for Different Platforms: A Comprehensive Guide

  • May 16, 2024
  • Streamer Station
  • Optimizing Stream Graphics for ...

In the bustling world of online streaming, standing out amidst the sea of content is crucial for attracting and retaining viewers. One key aspect that streamers often overlook is the optimization of their stream graphics for different platforms. Whether streaming on Twitch, YouTube, Facebook Gaming, or Mixer, each platform has its unique requirements and opportunities for maximizing viewer engagement. In this comprehensive guide, we’ll explore the strategies and best practices for optimizing stream graphics across various platforms to ensure maximum impact and audience reach.

First and foremost, understanding the specific guidelines and specifications of each streaming platform is essential for effective optimization. Twitch, for example, has its own set of recommended overlay sizes, resolution requirements, and file format preferences. Similarly, YouTube, Facebook Gaming, and Mixer each have their own specifications for stream graphics. By familiarizing themselves with these guidelines, streamers can ensure that their graphics are displayed correctly and look their best on each platform, maximizing their visual impact and professionalism.

Furthermore, tailoring stream graphics to match the branding and aesthetic of each platform can significantly enhance viewer engagement. Each platform has its unique visual style and audience demographics, and streamers should adapt their graphics accordingly. For example, Twitch is known for its gaming-centric atmosphere and vibrant, energetic community, while YouTube may cater to a broader range of content genres and preferences. By customizing their overlays, logos, and other graphics to align with the platform’s vibe, streamers can better resonate with their target audience and stand out in the crowded streaming landscape.

Another crucial aspect of optimizing stream graphics for different platforms is ensuring responsiveness and compatibility across devices. With the increasing popularity of mobile streaming, it’s essential that stream graphics are responsive and scalable, appearing correctly on various screen sizes and resolutions. Streamers should test their graphics on different devices and platforms to ensure a consistent and enjoyable viewing experience for all viewers, regardless of their chosen device or platform.

  • Platform-Specific Customization
  • Responsive Design
  • Guideline Adherence
  • Visual Hierarchy Optimization
  • Engagement Enhancers
  • Branding Consistency
  • Cross-Platform Compatibility
  • Quality Assurance Testing
  • Adaptability and Flexibility

In addition to platform-specific optimizations, streamers should also consider the overall visual hierarchy and readability of their stream graphics. Clear, easy-to-read fonts, contrasting colors, and uncluttered layouts are essential for ensuring that important information, such as donation alerts or chat messages, is prominently displayed and easily accessible to viewers. By maintaining a clean and organized visual presentation, streamers can enhance viewer engagement and make their content more enjoyable to watch.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Presentation Graphic Stream encoder (BDSup). Create a .sup from png images.

drouarb/PGSEncoder

Folders and files, repository files navigation.

This software allows to generate a BDSup file from PNG images. This software is for educational purpose.

This software is available on BSD 3-Clause License

https://github.com/drouarb/PGSEncoder/blob/master/LICENSE

http://blog.thescorpius.com/index.php/2017/07/15/presentation-graphic-stream-sup-files-bluray-subtitle-format/

IMAGES

  1. Stream Graphics on Behance

    presentation graphic stream specification

  2. 7 Elements Lean Value Stream Ppt PowerPoint Presentation Infographic Template Graphic Images Cpb

    presentation graphic stream specification

  3. Top Workstream PowerPoint Templates

    presentation graphic stream specification

  4. Streaming service presentation templates set. Video content. Digital media consumption. Online

    presentation graphic stream specification

  5. PPT

    presentation graphic stream specification

  6. PPT

    presentation graphic stream specification

VIDEO

  1. Monthly Work Presentation Graphic Portfolio

  2. Requirements Presentation 2

  3. PUBG PC Ultra Graphic Stream

  4. PR-053: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

  5. Software Requirement Specification report presentation Team HE

  6. How to Write a Graphic Design Proposal To Win Your Dream Project!

COMMENTS

  1. Presentation Graphic Stream (SUP files) BluRay Subtitle Format

    The Presentation Graphic Stream (PGS) specification is defined in the US Patent US 20090185789 A1. This graphic stream definition is used to show subtitles in BluRay movies. When a subtitle stream in PGS format is ripped from a BluRay disc is usually saved in a file with the SUP extension (Subtitle Presentation). A Presentation Graphic Stream (PGS) […]

  2. Presentation Graphic Stream - Wikipedia

    Presentation Graphic Stream (PGS) is a standard used to encode video subtitles on Blu-ray Discs. References This page was last edited on 28 November ...

  3. Subtitle File Formats: A Comprehensive Overview | Matesub

    HDMV PGS (Presentation Graphic Stream): A format used in Blu-ray Discs, providing graphic-based subtitles. Important for authoring Blu-ray content. Important for authoring Blu-ray content. Teletext : A binary subtitle format used for analog and digital television broadcasting, commonly found in some regions.

  4. PGS - Presentation Grapic Stream Subtitle Format - WinXDVD

    Load video into the program. Click +Video button to import the MKV rip with subtitles. Choose the video format if needed. For the best compatibility, you can choose MP4 H.264 as per your needs. Hardcode subtitles to the video. Click the Edit button on the main interface to activate the basic editing feature. Go to Subtitles > Add subtitle files ...

  5. Portable Network Graphics (PNG) Specification (Third Edition)

    An APNG stream is a normal PNG stream as defined in previous versions of the PNG Specification, with three additional chunk types describing the animation and providing additional frame data. To be recognized as an APNG, an acTL chunk must appear in the stream before any IDAT chunks. The acTL structure is described below.

  6. A PGS (Presentation Graphic Stream / .sup) Subtitle File Decoder

    Shell 0.2%. A PGS (Presentation Graphic Stream / .sup) Subtitle File Decoder - robjtede/sup-decode.

  7. Subtitles - Matroska

    HDMV presentation graphics subtitles. The specifications for the HDMV presentation graphics subtitle format (short: HDMV PGS) can be found in the document “Blu-ray Disc Read-Only Format; Part 3 — Audio Visual Basic Specifications” in section 9.14 “HDMV graphics streams”. Storage of HDMV presentation graphics subtitles

  8. 4. Physical, Logical and Application Specifications - Hugh's News

    Initial BD-RE (rewritable) specifications (version 1.0) were released in 2002 to define 1x recording speed to single (SL) and dual-layer (DL) discs storing 23.3 GB/layer, 25 GB/layer and 27 GB/layer (reserved as a future possibility) in open or sealed cartridge types. In 2003 Sony marketed the first set-top BD-RE recorder designed to capture HD ...

  9. Stream Graphics for Different Platforms: A ...">Optimizing Stream Graphics for Different Platforms: A ...

    Whether streaming on Twitch, YouTube, Facebook Gaming, or Mixer, each platform has its unique requirements and opportunities for maximizing viewer engagement. In this comprehensive guide, we’ll explore the strategies and best practices for optimizing stream graphics across various platforms to ensure maximum impact and audience reach.

  10. GitHub - drouarb/PGSEncoder: Presentation Graphic Stream ...

    Presentation Graphic Stream encoder (BDSup). Create a .sup from png images. - drouarb/PGSEncoder