DuckDB
- @duckdb@mastodon.social
- DuckDB Book
- https://www.quary.dev/
- Rill - Preset - Metabase - ClickHouse
What is data engineering
- https://towardsdatascience.com/data-engineering-in-julia-3fd37eaa618a
At the highest possible level, Data Engineering is the moving of data from one source to another source. Let’s say for example that you have a website running in production with multiple servers, each storing data for specific product sales. One way to check the overall sales is to run queries on the production databases and aggregate the results to get the total sales. But what if you want to do a more complex analysis?
In this case, a Data Engineer might connect all of the production databases to a different database which could be used specifically for analysis purposes. So far, we have covered the Extract and Load functions of Data Engineering. In many cases, however, we might want the data in a different format than it was originally in. Perhaps you were logging total transactions but now want the line-by-line item cost. This is where a Data Engineer would perform data transformations, the 3rd main function of Data Engineering.
So in Data Engineering terms, what have we actually done in this example? Well, we defined a pipeline! To make the example work, we had a data source (the original production servers) where we did an extraction, then augmented / transformed the data, followed by loading it into the data warehouse (storage) where the data science team can now do further analysis.
Why Golang
- Performance: Go is known for its excellent performance, making it suitable for handling large-scale data processing tasks efficiently.
- Concurrency: Goroutines and channels in Go make concurrent programming easier, which can be advantageous for parallel data processing tasks.
- Robust standard library: Go has a rich standard library that includes packages for handling data formats (JSON, CSV, etc.) and working with databases, which can accelerate development.
- Maturity: Go has been around longer and has a more mature ecosystem compared to Deno, which means you'll find more libraries, frameworks, and resources available for data engineering tasks.
from: https://github.com/cyclone-github/writeups/blob/main/Why Develop With Go.pdf
- Creators: Go was written in 2007 at Google by Robert Griesemer, Rob Pike, and Ken
Thompson, who are also well-known for their contributions to the C programming language
and Unix operating system. Go was published as open source in 2009 as a statically typed,
compiled language designed to address common criticisms of other languages such as C++
and Java, and to make it easier to develop efficient, reliable software at scale. - Write-Once, Run-Anywhere: Go supports a wide range of platforms, including ×86 and
ARM architectures on Windows, Linux, Unix, BSD, macOS, Raspberry Pi, and others such
as WebAssembly, Android and iOS. - Compiled Language: Like C, Rust and Zig, Go is a statically typed, compiled language that
does not require an interpreter like Python, Ruby, or JavaScript, or a JIT (Just-In-Time
compiler) like Java or C#. Compiled languages result in faster execution and more efficient
use of system resources vs interpreted languages. - Concurrency: Go is designed for concurrent programming with lightweight goroutines and
channels, making it ideal for high-performance, multi-threaded applications. - Performance Compared to C: Go typically performs within 10-20% of optimized C code
while offering a much simpler syntax and development process. - Memory Safety: Go ensures memory safety through its strong type system, nil safety,
efficient garbage collection, bounds checking, and concurrency safety. These features help
prevent common issues like null pointer dereferencing, buffer overflows, race conditions
and memory leaks. - Compile-Time Error Checking: Like Rust, Go enforces compile-time error checking with
strict syntax rules and comprehensive checks to find errors during compilation rather than at
runtime. - Code Quality: Go promotes clean and efficient code with its minimalistic design, built-in
formatting tool, gormt, and comprehensive standard library. The language enforces code
formatting standards leading to a consistent and readable codebase. This makes Go code
both easy to write and read. - Major Companies and Programs Use Go: Adobe, AT&T, BBC, Canonical, Cloudflare,
CockroachDB, Crowdstrike, Dell, DigitalOcean, Disney, Docker, Dropbox, eBay, Eted,
Expedia, Facebook, GitHub, GitLab, Google, Grafana, InfluxDB, Kubernetes, Medium,
Netflix, Paypal, Prometheus, SendGrid, Slack, SoundCloud, Tailscale, Terraform, Traefik,
Twitch, Uber, and Youtube, and many more. - Developer Friendly: With an easy-to-learn syntax, built-in concurrency, performance,
memory safety, and cross-platform compatibility, Go is an excellent choice for modern
software development. Whether writing a simple tool or a large-scale distributed system, Go
provides the features, safety, and efficiency needed to succeed.
Code from few years back work. This is a huge benefit when you are working on large code base on a longer time project.