2 https:
//github.com/apache/parquet-format 3 https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift 4 https://github.com/apache/parquet-testing 5 https://github.com/apache/parquet-java
7 - block :: same as HDFS block
8 - file :: file metadata is required, data is not
9 - row-group :: a logical horizontal partitioning of the data into
10 rows. no physical rep is guaranteed for row-group
11 - column-chunk :: a chunk of the data for a particular column
12 - page :: column chunks are divided into pages. a page is conceptually
13 indivisible in terms of compression/encoding. multiple page types
14 can be interleaved in a column chunk.
16 Files consists of 1+ row-groups. A row-group contains exactly one
17 column chunk per column. Column chunks contain one or more pages.
21 4-byte magic number "PAR1"
36 4-byte length in bytes of file metadata (little endian)
37 4-byte magic number "PAR1"