3 :ID: 657a645b-0fad-4f95-a022-cd837ce188d6 5 https:
//github.com/apache/parquet-format 6 https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift 7 https://github.com/apache/parquet-testing 8 https://github.com/apache/parquet-java
11 :ID: e71f388c-9ed1-4862-8890-7f74271e8df0 13 - block :: same as HDFS block
14 - file :: file metadata is required, data is not
15 - row-group :: a logical horizontal partitioning of the data into
16 rows. no physical rep is guaranteed for row-group
17 - column-chunk :: a chunk of the data for a particular column
18 - page :: column chunks are divided into pages. a page is conceptually
19 indivisible in terms of compression/encoding. multiple page types
20 can be interleaved in a column chunk.
22 Files consists of 1+ row-groups. A row-group contains exactly one
23 column chunk per column. Column chunks contain one or more pages.
27 :ID: ae54516c-c8a8-49f8-aac6-a95c18f5de8e 30 4-byte magic number "PAR1"
45 4-byte length in bytes of file metadata (little endian)
46 4-byte magic number "PAR1"