changeset 8: |
6ac37a61456a |
parent 7: |
d543f73892d3 |
child 9: |
4839b0675118 |
author: |
Richard Westhaver <ellis@rwest.io> |
date: |
Sat, 27 Jul 2024 02:45:34 -0400 |
files: |
parquet-parsing.org q-notes.org |
description: |
bump |
1.1--- /dev/null Thu Jan 01 00:00:00 1970 +0000
1.2+++ b/parquet-parsing.org Sat Jul 27 02:45:34 2024 -0400
1.3@@ -0,0 +1,38 @@
1.4+* DAT/PARQUET
1.5+https://github.com/apache/parquet-format
1.6+https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
1.7+https://github.com/apache/parquet-testing
1.8+https://github.com/apache/parquet-java
1.9+** glossary
1.10+- block :: same as HDFS block
1.11+- file :: file metadata is required, data is not
1.12+- row-group :: a logical horizontal partitioning of the data into
1.13+ rows. no physical rep is guaranteed for row-group
1.14+- column-chunk :: a chunk of the data for a particular column
1.15+- page :: column chunks are divided into pages. a page is conceptually
1.16+ indivisible in terms of compression/encoding. multiple page types
1.17+ can be interleaved in a column chunk.
1.18+
1.19+Files consists of 1+ row-groups. A row-group contains exactly one
1.20+column chunk per column. Column chunks contain one or more pages.
1.21+
1.22+** format summary
1.23+#+begin_example
1.24+ 4-byte magic number "PAR1"
1.25+ <Column 1 Chunk 1>
1.26+ <Column 2 Chunk 1>
1.27+ ...
1.28+ <Column N Chunk 1>
1.29+ <Column 1 Chunk 2>
1.30+ <Column 2 Chunk 2>
1.31+ ...
1.32+ <Column N Chunk 2>
1.33+ ...
1.34+ <Column 1 Chunk M>
1.35+ <Column 2 Chunk M>
1.36+ ...
1.37+ <Column N Chunk M>
1.38+ File Metadata
1.39+ 4-byte length in bytes of file metadata (little endian)
1.40+ 4-byte magic number "PAR1"
1.41+#+end_example
2.1--- /dev/null Thu Jan 01 00:00:00 1970 +0000
2.2+++ b/q-notes.org Sat Jul 27 02:45:34 2024 -0400
2.3@@ -0,0 +1,33 @@
2.4+* Queries
2.5+Q --- Query languages
2.6+
2.7+EQL = Event Query Language
2.8+SQL = Structured Query Language
2.9+LQL = Logical Query Language
2.10+GQL = Graph Query Language
2.11+
2.12+refs:
2.13+https://tdop.github.io/
2.14+https://howqueryengineswork.com/01-what-is-a-query-engine.html
2.15+https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
2.16+https://en.wikipedia.org/wiki/Graph_Query_Language
2.17+https://code.kx.com/q
2.18+https://docs.xtdb.com/reference/main/xtql/queries.html
2.19+https://neo4j.com/docs/cypher-manual
2.20+https://eql.readthedocs.io/en/latest/
2.21+https://clojure.github.io/clojure-contrib/doc/datalog.html
2.22+https://www.researchgate.net/publication/2850953_Soft_Stratification_for_Magic_Set_Based_Query_Evaluation_in_Deductive_Databases
2.23+http://sieve.info/
2.24+https://www.researchgate.net/publication/221542994_SARI-SQL_Event_query_language_for_event_analysis
2.25+https://www.elastic.co/guide/en/elasticsearch/reference/current/eql.html
2.26+https://www.youtube.com/watch?v=8XUutFBbUrg
2.27+https://www.scryer.pl/
2.28+https://github.com/inconvergent/cl-grph
2.29+https://jakewheat.github.io/sql-overview/
2.30+https://github.com/ronsavage/SQL
2.31+https://web.csulb.edu/colleges/coe/cecs/dbdesign/dbdesign.php
2.32+https://github.com/nikodemus/screamer
2.33+https://github.com/defunkydrummer/cl-gambol
2.34+https://www.lispworks.com/products/knowledgeworks.html
2.35+https://namin.seas.harvard.edu/files/namin/files/sql2c_jfp.pdf
2.36+https://scala-lms.github.io/tutorials/query.html