Add attribution to Jacob's script

author: scharlottej13 <sarah@coiled.io> 2024-02-02 15:35:13 -0800
committer: scharlottej13 <sarah@coiled.io> 2024-02-02 15:35:13 -0800
commit: bb23f9245dcd631c33000657bd21c1fe532abfcc (patch)
tree: 75e1898242594e7f14c238fafb78f4c7067aae98
parent: 7ec23a39198016eb285cee324c0f967ffda8b084 (diff)
2 files changed, 5 insertions, 1 deletions
diff --git a/README.md b/README.md
index 075e993..603dc34 100644
--- a/README.md
+++ b/README.md
@@ -4,9 +4,10 @@ Inspired by Gunnar Morling's [one billion row challenge](https://github.com/gunn
 
 ## Data Generation
 
-You can generate the dataset yourself using the [data generation script](generate_data.py). We've also hosted the dataset in a requester pays S3 bucket `s3://coiled-datasets-rp/1trc` in `us-east-1`. 
+You can generate the dataset yourself using the [data generation script](generate_data.py), adapted from [Jacob Tomlinson's data generation script](https://github.com/gunnarmorling/1brc/discussions/487). We've also hosted the dataset in a requester pays S3 bucket `s3://coiled-datasets-rp/1trc` in `us-east-1`. 
 
 It draws a random sample of weather stations and normally distributed temperatures drawn from the mean for each station based on the values in [lookup.csv](lookup.csv).
+
 ## The Challenge
 
 The main task, like the 1BRC, is to calculate the min, mean, and max values per weather station, sorted alphabetically.
diff --git a/generate_data.py b/generate_data.py
index fa30785..f5de1a8 100644
--- a/generate_data.py
+++ b/generate_data.py
@@ -1,3 +1,6 @@
+# This script was adapted from Jacob Tomlinson's 1BRC submission
+# https://github.com/gunnarmorling/1brc/discussions/487
+
 import os
 import tempfile
 import coiled
author	scharlottej13 <sarah@coiled.io>	2024-02-02 15:35:13 -0800
committer	scharlottej13 <sarah@coiled.io>	2024-02-02 15:35:13 -0800
commit	bb23f9245dcd631c33000657bd21c1fe532abfcc (patch)
tree	75e1898242594e7f14c238fafb78f4c7067aae98
parent	7ec23a39198016eb285cee324c0f967ffda8b084 (diff)