๐Ÿ“š Personal bits of knowledge

๐Ÿšš Move data documentation files from Data/ to root directory

+4 -4
Data/Analytics Engineering.md Analytics Engineering.md
··· 5 5 - [More analysis might not be the best solution if you can't validate or take actions on previous analysis](https://twitter.com/ejames_c/status/1753692862548697464)! 6 6 - Accept that [analytics is a mess](https://benn.substack.com/p/analytics-is-a-mess). 7 7 - Explicitly create separate workspaces for curated (production) and messy (experimental) work. 8 - - Reports are rarely read, and often forgotten. [[Data Culture|Decision-making involves getting data, summarizing and predicting ad then taking action]]. 9 - - One of the best ways to communicate data is telling stories. Stories are more captative and present a coherent view around a topic. 8 + - Reports are rarely read, and often forgotten. [[Data Culture|Decision-making involves getting data, summarizing and predicting and then taking action]]. 9 + - One of the best ways to communicate data is telling stories. Stories are more captivating and present a coherent view around a topic. 10 10 - [The analytics engineer workload is a lot like being a data librarian](https://www.youtube.com/watch?v=T0Z_ibd3Hx0). 11 11 - If you are running a library you have these books coming in and you have people who are looking for books on specific topics and you've got to figure out a way to organize all those books so that all those people can find what they need. There are many different ways to organize books, not just one perfect solution. A librarian is interested in helping people find the books that they're looking for but also discovering new books that they didn't realize that they were looking for. 12 12 - [Analytics code should be version controlled, tested, modular and maintainable](https://www.getdbt.com/analytics-engineering/why/). ··· 59 59 - [Skrimmage](https://github.com/Skrimmage/Data-Platform) 60 60 - [Ibis](https://github.com/ibis-project/ibis-analytics) 61 61 - [OSO](https://github.com/opensource-observer/oso) 62 - - [Tuba](https://github.com/tuva-health/tuva) 62 + - [Tuva](https://github.com/tuva-health/tuva) 63 63 - [Department of Education for New South Wales](https://github.com/wisemuffin/nsw-doe-data-stack-in-a-box) 64 - - [OP Analytitcs](https://github.com/ethereum-optimism/op-analytics) 64 + - [OP Analytics](https://github.com/ethereum-optimism/op-analytics) 65 65 - [Transfermarkt Datasets](https://github.com/dcaribou/transfermarkt-datasets) 66 66 - [OpenTimes](https://github.com/dfsnow/opentimes) 67 67
Data/Dashboards.md Dashboards.md
Data/Data Culture.md Data Culture.md
+1 -1
Data/Data Engineering.md Data Engineering.md
··· 22 22 23 23 - **Simplicity**: Each steps is easy to understand and modify. Rely on immutable data. Write only. No deletes. No updates. Avoid having too much "state". Hosting static files on S3 is much less friction and maintenance than a server somewhere serving an API. 24 24 - **Reliability**: Errors in the pipelines can be recovered. Pipelines are monitored and tested. Data is saved in each step (storage is cheap) so it can be used later if needed. For example, adding a new column to a table can be done extracting the column from the intermediary data without having to query the data source. It is better to support 1 feature that works reliably and has a great UX than 2 that are unreliable or hard to use. One solid step is better than 2 finicky ones. 25 - - **[[Modularity]]**: Steps are independent, declarative, and [[Idempotence|itempotent]]. This makes pipelines composable. 25 + - **[[Modularity]]**: Steps are independent, declarative, and [[Idempotence|idempotent]]. This makes pipelines composable. 26 26 - **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. Schema on write as there is always a schema. 27 27 - **Efficiency**: Low event latency when needed. Easy to scale up and down. A user should not be able to configure something that will not work. Don't mix heterogeneous workloads under the same tooling (e.g: big data warehouses doing simple queries 95% of their time and 1 big batch once a day). 28 28 - **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data. Fail fast and upstream.
Data/Data IDE.md Data IDE.md
Data/Data Package Manager.md Data Package Manager.md
+1 -1
Data/Data Practices.md Data Practices.md
··· 12 12 - People need to prepare and that means better kickoff conversation. 13 13 - Easier to triage and connect dots across requests. 14 14 - Creates easily referenced records of requests. 15 - - Friction cuts down on lazy asks. Too much friction will disuade legit ask. 15 + - Friction cuts down on lazy asks. Too much friction will dissuade legit ask. 16 16 17 17 ### Data Request Form 18 18
Data/Data Quality.md Data Quality.md
Data/Experimentation.md Experimentation.md
Data/Machine Learning.md Machine Learning.md
Data/Metrics.md Metrics.md
Data/Open Source Data Projects.md Open Source Data Projects.md
Data/Product Analytics.md Product Analytics.md
Data/Reverse ETL.md Reverse ETL.md
Data/Unified Schema Design.md Unified Schema Design.md