summaryrefslogtreecommitdiff
path: root/README.md
blob: 9b456e5e31cf8125bc78116cba6a035271c52b0a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# One Trillion Row Challenge

Inspired by Gunnar Morling's [one billion row challenge](https://github.com/gunnarmorling/1brc), we thought we'd take things one step further and start the one trillion row challenge (1TRC).

## Data Generation

You can generate the dataset yourself using the [data generation script](generate_data.py). We've also hosted the dataset in a requester pays S3 bucket s3://coiled-datasets-rp/1trc in `us-east-1`. 

It draws a random sample of weather stations and normally distributed temperatures drawn from the mean for each station based on the values in [lookup.csv](lookup.csv).
## The Challenge

The main task, like the 1BRC, is to calculate the min, mean, and max values per weather station, sorted alphabetically.