changelog shortlog graph tags branches changeset files revisions annotate raw help

Mercurial > core / lisp/lib/dat/parquet/pkg.lisp

changeset 698: 96958d3eb5b0
parent: ec1d4d544c36
author: Richard Westhaver <ellis@rwest.io>
date: Fri, 04 Oct 2024 22:04:59 -0400
permissions: -rw-r--r--
description: fixes
1 ;;; pkg.lisp --- Apache Parquet Packages
2 
3 ;; Common Lisp Parquet Implementation
4 
5 ;;; Commentary:
6 
7 #|
8 https://github.com/apache/parquet-format
9 https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
10 https://github.com/apache/parquet-testing
11 https://github.com/apache/parquet-java
12 https://github.com/apache/arrow-rs
13 https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf
14 https://thrift.apache.org/docs/types
15 |#
16 
17 #|
18  4-byte magic number "PAR1"
19  <Column 1 Chunk 1>
20  <Column 2 Chunk 1>
21  ...
22  <Column N Chunk 1>
23  <Column 1 Chunk 2>
24  <Column 2 Chunk 2>
25  ...
26  <Column N Chunk 2>
27  ...
28  <Column 1 Chunk M>
29  <Column 2 Chunk M>
30  ...
31  <Column N Chunk M>
32  File Metadata
33  4-byte length in bytes of file metadata (little endian)
34  4-byte magic number "PAR1"
35 |#
36 
37 ;; In this file we're being as lazy as possible. To generate our base objects
38 ;; we depend on the file parquet.thrift in the parquet-format repo. The core
39 ;; skelfile includes a script to download it and convert it to parquet.json
40 ;; (requires the thirft cli tool). We then decode it with DAT/JSON and
41 ;; generate lisp classes, and types.
42 
43 ;; NOTE: there is actually a Common Lisp code generate for Thrift. It seems to
44 ;; work but it requires an ASDF system named thrift which I couldn't find
45 ;; anywhere. Granted I didn't look that hard, but I don't think it matters
46 ;; because we ultimately don't want to depend on the Thrift CLI tool for
47 ;; codegen.
48 
49 ;;; Code:
50 (in-package :dat/parquet)
51 
52 (define-constant +parquet-magic-number+ "PAR1" :test 'equal)
53 
54 (defconstant +default-parquet-page-size+ (* 8 1024)) ;; 8kb
55 (defconstant +default-parquet-row-group-size (expt 1024 3)) ;; 1gb
56 
57 (defvar *parquet-creator* "dat/parquet version 0.1.0")