Skip to content

Data Engineers are becoming Software Engineers: Practice 1

20 mins

In this video, Sung discusses the evolution of data engineering and why it matters. He shares his deep emotional attachment to making people's lives better and highlights his experience mentoring and helping people land data jobs. Sung emphasizes the importance of data engineers becoming software engineers and the need for better tooling in the field. He explores the current state of tools and the challenges faced by data teams. Sung invites viewers to be part of the story and encourages them to reach out and collaborate.

View Comments and Reply

Transcript

Show Transcript

Hey, this is Sung here, and I'm about to give a practice demo of My MDS Fest Talk and so future Sung when you're watching this This is really just to see how much you can rely on your raw wit and charisma to Deliver a compelling Out of talk that's worth listening to and engaging with and then figure

out what areas I need to get back to The lab and just grind away at refining and evolving and so I'm gonna role play in three two one Hey folks, this is Sung here Really glad to see such a great turnout for this talk.

I'm honestly really humbled. I think it's really beautiful that You folks wanna well one figure out if I'm full of it with this claim of data engineers are becoming software engineers Or if you're really resonating with this and you're just trying to validate that, you know Some of the more zoomed in

investigative work that i've been doing is attuned with just how you're observing your reality and maybe we can evolve this story together All right, but first and foremost for those that don't know me that much It's important to understand like who the heck is talking with you in the first place

And or maybe a better put like who the heck is sung and why do you care? Just some context. I have been in data engineering for the past 10 plus years of my career If you see me around and you Your feed.

I'm just some guy that makes loom videos here and there. I've been I was browsed deep in data engineering since 2014 All the way back in August 2022 was a driving reason why dbt mesh is becoming to dbt multi-project experience near you feel free to talk with me after if you want to hear more about that

origin story, but You know, ultimately, why you should care about what I have to say is just like I'm just so deeply emotionally attached, you know, to making, you know, people's lives better and, you know, as a quick example of that, like I've there's a time where I was mentoring seven people to get

of all their data careers or get new jobs and I've helped two people get data jobs in the past year or so.

And it's just been so incredibly moving for me. And I can't wait to see more of that over the next coming years, whether it's whether I'm a causal factor or not, you know, and thankfully, I've gone to a really luxurious place in my career.

And maybe some of the folks here have too, where it's less about look how cool my code is, but wow, look how cool the problems I'm solving are and look at how.

Look at all these beautiful people I get to work with every day. And I get, I'm just so excited that you folks in this chat and in this talk get to be part of that story.

But just a level set here just to understand some of your focus is on origin story, because I always get fun and wild answers here.

What did you do before p***? Or pull request code reviews? For me, mine was color coded email replies, but you know what?

Actually, I won't even say that. I'm not even gonna bias you. I'm gonna post this in the chat, pick your numbers, we'll go from there.

Boom, boom, boom. Okay, nice, nice, I'm sick. Seeing a lot of nothings, okay, okay. Spot checking, great, great DMs. DMs are always reliable when tech fails, I totally get that.

Okay, cool. And so what this reveals to me is that we've all understood the natural evolution story of data engineering from this scrappy, hey, just do whatever mechanic makes sense at the time, to like, wait, isn't arm pull requests just normal and shouldn't they always be normal?

And hopefully you get attuned to thinking, hey, data engineers are becoming software engineers, oh yeah, isn't that totally normal already?

And maybe you're already there, but if you're not, interrupt me, and let's see if we can evolve the content. But overall, I think what's really important is to understand just the evolution of our tools in the first place and just how they've, you know, parallel the journey of what we need to deliver

over time. When we talk about the current set of tools, I remember back during my accounting days where I would literally use this thing called, like, audit command language, think like a proprietary version of SQL for accountants, and yes, it's as awful as it sounds, to essentially make debits equal

credits. And I was running these on, like, supercharged virtual machines somewhere out there in a data center to literally make debits equal credits.

And it was a very labor-intensive, like, hey, I add a word. I have to change a filter, click run, and wait literally, I could do not 16 hours for something to see and if an error is out, I have to restart the process all over again.

And that felt like the stone ages to me. This is all the way back in 2014. And then as we evolved, we saw things like Redshift and BigQuery, I mean, you already know the story from, you know, a couple years ago with the modern data stack.

But we really saw that when we got to a place where it's like, hey, things are like fast enough now to actually do something productive with it and not have all our time spent on, okay, and I just wait for this machine to run.

We've been able to have a lot more headspace to include all this unbundling of the data. Which is now the modern data stack.

And what's really powerful is that now that we have this kind of normal psychology of like, okay, speed is relatively good, iteration speed is relatively good, tooling is relatively good, but there's still something missing here and that's because the tooling still needs to evolve for, for what's

to come. And that takes me to the evolution of what we've had to deliver over time. And so all the way back in the early days, and this is probably true for a lot of you, and maybe even true for some business analysts or actuaries out there today, but it's actually, you have to build this Excel

report or what have you to send to the executive to make some decisions. It's very air prone. It's always, it feels like every conversation starts with, hey, this formula looks funkier.

Like, hey, do I use a VLOOK up here? Or I'll just use a pivot table, right? To, hey, we're delivering a bunch of dashboards that are automated with things like 5Trans, Snowflake, Looker, what have you.

And that's, A new look at your forecast this morning... A new look at your A new look at your you know, that meets the status quo for a while.

But during my time, even at dbt labs and seeing this at other like customer organizations I worked with, it feels like the ass have evolved from, hey, give me this historical report to update this automated sales dashboard for like child length or how can we increase or how do we decrease the sales

cycles to like, hey, let's automate. Revenue recognition and you notice like over time, these things go from like the static discrete deliverables to hey, mid evolve the way I operate evolve the way I as a finance person, I as a marketing person can do my job and part of that is why we've seen really

popular tools such as like high touch and such, you know, our census take data from, you know, snowflake into sales for so that people could have product usage data embedded directly in their deal cycles.

And it's like coming to this like natural kind of conclusion where it's like people just want the data to flow through how they operate.

And yeah, that like sounds kind of like obvious, but like now that's becoming much more of less than ideal and much more of a normal ask of like evolve the way I operate some like that is the deliverable make sure I am supercharged as a business person.

To do my job. And I think we saw a huge lift in that expectation and perception because things like chat GPT opened everyone's eyes of like, yeah, this should be normal now.

Yeah, like now we have headspace to care about automating this data delivery to evolve in the way we operate. And we're even seeing this for the folks of you that read my blog.

We'll be right back. We're seeing new players to sell data. Like there's some new startups called like Cybersyn and like patch that their whole value prop is making sure that they can deliver data.

That's it to to either empower teams or just, you know, float into the systems directly or even seeing the popularity of reverse ETL like high touch and census like.

Like there's something compelling here that's kind of lifting the zeitgeist to go like, this is just normal now. This should be normal now.

And you know what? I'm all for it. You know, I don't think the goal should be, hey, look how many cool dashboards I create.

It's like, how did my work and data elevate the way you operate in either increasing revenue and or do you?

That's it. No, like, gibberish optics, just plain measurable results. That's what we should care about. And, and I think because these incentives have roared out of the generative AI gates, like, I think our data tools and our craft really has to catch on here.

Heck, I've literally seen a company We'll be right back. Sell data back to the customer in the form of embedded Tableau dashboards using like snowflake dbt and five tram done and it's kind of both beautiful and terrifying because I think all of us can kind of have this internal cringe factor understanding

like, actually, I don't know how scalable that is to like hundreds of thousands of customers without causing fire drills every day.

And I think we're at a place now where it's a lot less about, hey, how do we optimize for fire fighting tools?

How do we optimize for fire prevention tools? Which makes sense, right? Because when we think about the fundamental principles of our craft, it's understand the problem, plan around it with some like basic human questions of like, hey, how do I decrease, you know, sales cycles?

We'll see you next week. And like, oh, what data do I need? Oh, like maybe some product usage, maybe marketing campaigns and marry that data together.

But if fire drills are always happening at the data and analysis part after something's already been delivered, like, we'll never get to a conclusion that we're really proud of.

It'll always be this thing, and I'm sure all of you have these battles. It's like, ooh, I built this beautiful dashboard or that meme where it's like, but why did you still ask for an export to, to CSV or Excel, right?

And how do we live in a world where people don't need to do that? Because the primary reason is because they don't trust you, right?

They don't trust it. And I think it's time to evolve that. But, you know, before I go further, you know, I really want to get a pulse check.

Like, what are you doing? You're typically delivering today. Boom, boom, boom, boom, boom, boom. Okay, great. Nice. I'm seeing some historical ports at our request.

Okay. Okay. Ooh, there's a lot more automating operations I thought that's really, really cool to see that. Like that energizes me that serving a day to day.

Ooh, I'm gonna, I'm gonna DM some of you folks that, that answer that way because that's actually really compelling and I want to, I want to know what your battle experiences so far, but overall, you know, just to ground this in something when it comes to that firefighting analogy I was giving, it's

understanding that We'll see you next time. This is like the quintessential kind of question of our craft. Hey data looks off.

What's the deal? And a lot of it is because there are a lot of little fires that end up into this big fire And I think what's kind of tragic about that is all of us only think the fire happens here when there was very preventable embers earlier on in the cycle that we could have done if you know certain

components of our job were just less toil and manual and just kind of like this elegant like yeah like this is just be a normal elegant part of my craft similar to like dbt tests is the reason why I know even for me that I even tested my data projects at scale just because it did provide that elegant

mechanism. And so, you know, I'm curious here when it relates to the fires that happened in your life like what is the hardest part of your job today that's causing some of the fires and how you do your job and so I'm gonna I'm gonna pause here and yeah let's see what happens let's see this pull of

all okay I'm seeing data testing observability sharing I'm seeing questions in the chat around what's difference between data testing and diffing I'll get to that in a little bit and boom boom boom okay yeah I mean scheme evolution totally yeah there's what's tough is like there's no like intuitive easy

like oh yeah this is how you always handle it like this it's a very opinionated process back to those yep money I think that really ties into option 9 which is money obviously right back to those can be very expensive especially if you change the schema for like 10 years plus of transactional data so

they get that I mean but really the trick answer is probably all of the above right to a certain degree I would say I I have yet to meet a team or person that is has experienced you know a 10 out of 10 beautiful way of life with all nine of these parts of their job and part of it is because you know

incentives have roared way past the traditional evolution arc of our data tools that it requires us to have that more diligent view of like how do we make these easy because we're already acting slash being treated like software engineers across all nine of those categories where essentially there's

someone Nina this is a public post from her but I think this really captured kind of the moment for me it was it was my personal aha moment where it's like she's using all these tools do something that like software engineers with like websites that are down do all the time yeah the mechanics are different

but the spirit is the same it's because when data is down and it is mission critical to how your team makes money or saves money you you you lose money you lose customers and we're even seeing this with like the analytics engineer title where cypress and one of the startups i mentioned to be clear i'm

not sponsored or anything by them i think it's just cool what they're doing is they just name it front and center you're helping build products that generate revenue and as a result what's really powerful about this is that you know dbt is a reason for a lot of you know data analysts can make six years

in the first place now we're basically making the same as offer engineers and i think it's a really beautiful thing and it's not just because we have these fancy tools in our belt it's because it's so tangible now with these incentives understanding involving the way we operate that are and deliverables

just delivering the data into systems that we deserve this and a lot of hiring managers can nod their heads up like yep this number is fair and equitable because this hire is gonna make money for our company and i think that's a really really beautiful thing but what's really funny is like what i'm

talking about really isn't that novel because our worlds are already converging big tech has adopted this practice in some ways you'll notice where like people have the the title software engineer comma data like i even just typed that in very generically in there and you already see a bunch of people

already doing that and maybe there's it's because there's maybe a stigma and big tech for or maybe maybe they're trying to game the algorithm where you get paid more when you have software engineering in your title versus data engineer what's really kind of powerful about this is i think we

have something more charming up our sleeves for those that have gone through the arc of data analyst to analyst engineer data engineer what have you to software engineer is because we've known what it's like to work with very little virtually nothing right starting with excel and like a notepad and

we've come to this understanding of like data teams of one we've come to this notion of like we don't need an army of engineers to deliver something most of us know that this craft is less about, hey look how cool my tech is but using the tooling and mechanisms in a way to compel apathetic people

and to advocates to convince them hey use this data to evolve the way you work use this excel report dashboard high touch to see what's going on.

Sink into sales force or, you know, just even the data set because I'm selling the data to like evolve the way you work.

That's what we do. We help evolve the way people work. And I think that's kind of the charm of the modern data stack and why it's hit us so hard.

Because it's, it's met us where we're at and it's been honest and practical to the incentives that we care about.

Simple is typically better. And I think, and I think it's okay to see us make a huge blast radius of impact without all of us having to learn Rust and C++ and assembly to like.

Feel like we're a 10X engineer to get results. What matters is, are you getting results, increasing revenue? Are you decreasing costs?

And understanding that an analyst engineer is, is more than just a data plumbing exercise. We're, we're convincing people that, you know, their story is worth evolving.

And we have evidence to prove. And it's worth sharing with others. So honestly, like, I'd love to understand even for you folks, you know, let's presume you want to be a sovereign engineer, comma, data, or, you know, title, semantics aside, like, I'm curious, like, more tactically, like, what do you

want your tools to look and feel like? Answer in the chat, folks. You're like, oh, when I look at this, this, this, this, this.

Cool. Now, honestly, this is the ultimate question, like, do you believe dead engineers are becoming software engineers? Like, answer honestly.

I don't care if, I'm right or wrong. I care if this feels like a true statement as a whole. Boom, boom, boom, boom.

A lot of yeses on the nose. Ooh, for the nose, like, please put it in the chat. Like, what? Why do you disagree?

You know? I'm, I'm open. Like, ooh, bla, bla, bla, bla, bla, bla, bla, bla, bla, bla, bla, bla, bla. The five is fundamentally interferable.

I guess would that change if that's an external stakeholder? Then, ok. Cool, cool, cool. Alright, cool, cool, cool, cool. Nice.

How? How can I help you? And this is very litical. Literal, not rhetorical. There's a the reason why I even joined you.

There's a lot of people asking me that question. It's because there's certain mechanisms and experiences in the data engineering craft that I'm just so wildly surprised they're not normal.

Like, we've all grown accustomed to git diff, we've grown accustomed to data testing, observability, catalogs, all these things, but like it's a very unloved problem to get up a fore and after story of how exactly is my data changing?

Because the code can only show so much. Because by default, I think all of us recognize data is a very stateful exercise that it's not a matter of if surprises will come up and data changes, it's when.

And I'd rather have that in a Perfect. And so that could look like a lot of things. I know some of you folks know, I've been building like flow state tool, like I guess maybe what would be the best way to do that? What would help is, you know, what do you wish would be absurdly easy today?

And I think we can evolve the conversation from there. So it's like, okay, you guys don't really care about data diffing.

That's okay. That's just something that I'm emotionally attached to. Doesn't mean that you have to be too. Just give me evolution, cool, cool, cool, cool, cool, cool.

Nice, nice, okay. So I'm hearing some signals here. Bring some ideas in here. You know what, like, that's actually really compelling and cool.

I'd love to investigate that further. For the those of you that did that, like, feel free to DM me on LinkedIn, where have you.

Like, I actually care. I want to see this happen. Like, I think this is very akin to, like, when I was growing up in the 90s and early 2000s.

Like, you know, learning how to customize zenga pages. Like, figure out the ins and nights of, like, being invisible on Days so that people can chat you but you can chat that, right?

Like, silly little things like that. And I think we're on the cusp of like, really seeing an exciting- exponential evolution in our craft, and I wanna- I don't wanna be a bystander to it.

And I don't think that people here wanna be bystanders, either. And so, like, let's- let's build something, like, really beautiful together.

Alright? That's it, folks. Any questions in the craft? Maybe one ad-hoc question, like, what resonated with you folks most? If anything.

Alright, thanks, y'all. Here's my contact info, feel free to reach out, my DMs are open, like, let's make something fun happen.

Alright? Peace.

Transcript

More than 25 million people across 400,000 companies choose Loom

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick

My new daily email habit. Begin writing an email. Get to the second paragraph and think 'what a time suck.' Record a Loom instead. Feel like 😎.
Kieran Flanagan

Kieran FlanaganVP of Marketing, HubSpot

Loom amplifies my communication with the team like nothing else has. It's a communication tool that should be in every executive's toolbox.
David Okuinev

David OkuinevCo-CEO, Typeform

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick