DuckDB

What is data engineering

In this case, a Data Engineer might connect all of the production databases to a different database which could be used specifically for analysis purposes. So far, we have covered the Extract and Load functions of Data Engineering. In many cases, however, we might want the data in a different format than it was originally in. Perhaps you were logging total transactions but now want the line-by-line item cost. This is where a Data Engineer would perform data transformations, the 3rd main function of Data Engineering.

So in Data Engineering terms, what have we actually done in this example? Well, we defined a pipeline! To make the example work, we had a data source (the original production servers) where we did an extraction, then augmented / transformed the data, followed by loading it into the data warehouse (storage) where the data science team can now do further analysis.

Why Golang

  1. Performance: Go is known for its excellent performance, making it suitable for handling large-scale data processing tasks efficiently.
  2. Concurrency: Goroutines and channels in Go make concurrent programming easier, which can be advantageous for parallel data processing tasks.
  3. Robust standard library: Go has a rich standard library that includes packages for handling data formats (JSON, CSV, etc.) and working with databases, which can accelerate development.
  4. Maturity: Go has been around longer and has a more mature ecosystem compared to Deno, which means you'll find more libraries, frameworks, and resources available for data engineering tasks.

from: https://github.com/cyclone-github/writeups/blob/main/Why Develop With Go.pdf

  1. Creators: Go was written in 2007 at Google by Robert Griesemer, Rob Pike, and Ken
    Thompson, who are also well-known for their contributions to the C programming language
    and Unix operating system. Go was published as open source in 2009 as a statically typed,
    compiled language designed to address common criticisms of other languages such as C++
    and Java, and to make it easier to develop efficient, reliable software at scale.
  2. Write-Once, Run-Anywhere: Go supports a wide range of platforms, including ×86 and
    ARM architectures on Windows, Linux, Unix, BSD, macOS, Raspberry Pi, and others such
    as WebAssembly, Android and iOS.
  3. Compiled Language: Like C, Rust and Zig, Go is a statically typed, compiled language that
    does not require an interpreter like Python, Ruby, or JavaScript, or a JIT (Just-In-Time
    compiler) like Java or C#. Compiled languages result in faster execution and more efficient
    use of system resources vs interpreted languages.
  4. Concurrency: Go is designed for concurrent programming with lightweight goroutines and
    channels, making it ideal for high-performance, multi-threaded applications.
  5. Performance Compared to C: Go typically performs within 10-20% of optimized C code
    while offering a much simpler syntax and development process.
  6. Memory Safety: Go ensures memory safety through its strong type system, nil safety,
    efficient garbage collection, bounds checking, and concurrency safety. These features help
    prevent common issues like null pointer dereferencing, buffer overflows, race conditions
    and memory leaks.
  7. Compile-Time Error Checking: Like Rust, Go enforces compile-time error checking with
    strict syntax rules and comprehensive checks to find errors during compilation rather than at
    runtime.
  8. Code Quality: Go promotes clean and efficient code with its minimalistic design, built-in
    formatting tool, gormt, and comprehensive standard library. The language enforces code
    formatting standards leading to a consistent and readable codebase. This makes Go code
    both easy to write and read.
  9. Major Companies and Programs Use Go: Adobe, AT&T, BBC, Canonical, Cloudflare,
    CockroachDB, Crowdstrike, Dell, DigitalOcean, Disney, Docker, Dropbox, eBay, Eted,
    Expedia, Facebook, GitHub, GitLab, Google, Grafana, InfluxDB, Kubernetes, Medium,
    Netflix, Paypal, Prometheus, SendGrid, Slack, SoundCloud, Tailscale, Terraform, Traefik,
    Twitch, Uber, and Youtube, and many more.
  10. Developer Friendly: With an easy-to-learn syntax, built-in concurrency, performance,
    memory safety, and cross-platform compatibility, Go is an excellent choice for modern
    software development. Whether writing a simple tool or a large-scale distributed system, Go
    provides the features, safety, and efficiency needed to succeed.

Code from few years back work. This is a huge benefit when you are working on large code base on a longer time project.

Tasks