Today is my first day as Data Engineer
@supabase
! I'm thrilled to be joining an incredible team with a great mission. I can't wait to start contributing.
There is something interesting going on in the data industry right now. Malloy, dbt, SQLGlot, Ibis, Substrait, Modin, Fugue, Coral are all basically SQL compilers
Perfect travel reading material. I’m halfway through and really liking it. I love the focus on data engineering as a discipline opposed to a tech stack.
I have been using anti-joins a lot lately to understand entity relationships
Left Anti-Join answers the question "what records are in tbl_a but not tbl_b?"
Full Outer Anti-Join answers the question "what records are in tbl_a or tbl_b but are not in both?"
The ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. I can read this code, but VERY few engineers out there could write it from scratch.
Yesterday marked my last day at Nasdaq. It was a bittersweet moment but I had an exciting opportunity that I couldn't pass up. Looking forward to sharing what's next for me soon!
Highly recommend listening to this talk on metric trees by
@_abhisivasailam
. I have been implementing some of these ideas ever since I heard of it and it has been a game changer in understanding my team's data needs.
Connecting to Snowflake with the newly released Harlequin ADBC adapter.
Stills blows my mind how responsive Harlequin is even when displaying millions of rows.
🌟 Seven days until
#AdventOfCode
2022! 🌟
Please buckle your IDE and keep your arms and legs inside your programming language(s) of choice until the puzzles come to a complete stop.
I'm happy with the progress I have been making on a personal project of mine. Building a data pipeline using modern data stack tooling. Shout-out to
@dagsterio
&
@AirbyteHQ
for the awesome docs!
@europython
The recording of Pedro Holanda's talk “DuckDB: Bringing analytical SQL directly to your Python shell” at
@EuroPython
2023 is now available.
3 years ago I was in the worst shape of my life. I gained 30lbs in 6 months after my marathon race was cancelled because of COVID-19.
Now I am in the best shape of my life & have completely changed my lifestyle.
Has anyone checked in on
@holanda_pe
? The man has been grinding at making DuckDB csv reader best in class.
I mean look at what people are trying to read with it.
I just want to make sure he is okay.
Had the pleasure today of debugging an inherited mess of dynamic SQL, wrapped in VBA, inside Microsoft Access.
Thank goodness there were code comments - from February 2001.
I really like this article by
@s_ryz
about partitions. Having the ability to run & especially rerun your pipelines on specific partitions is extremely useful.
This is MASSIVE. The Windows Subsystem for Linux in the Microsoft Store is now generally available on Windows 10 and 11! Windows 10 users can now run Linux GUI apps natively!
The Apache Arrow ecosystem can be confusing, this is another great article explaining how the various components fit together (Arrow, Arrow Flight, Arrow Flight SQL, DataFusion)
Announcing uv: an extremely fast Python package installer and resolver, written in Rust.
uv is designed as a drop-in alternative to pip, pip-tools, and virtualenv.
With a warm cache, uv installs are near-instant. Here, it's > 75x faster than pip and pip-tools.
It would be cool if dbt jobs would automatically skip over models that are materialized as views or MVs unless they were modified since last run. There is no need to recreate a view if the underlying SQL hasn't changed.
I’m watching all the
@normconf
talks this weekend. If I were to use one word to describe the conference it would be REAL 🔥
Real people
Real work
Real problems
Real solutions
All the talks are on their YouTube channel!
🎉 We’re excited to share our intent to acquire
@Immerokcom
!
Together, we’ll build a cloud-native service for
@apacheflink
that delivers the same simplicity, security, & scalability that you expect from Confluent for Kafka.
Learn more →
@kellabyte
I feel like
@tembo_io
is trying to solve this. They help deploy Postgres "stacks" customized for specific use cases e.g. OLTP, OLAP, Vector, Message Queue