dbt Python Demo: Basic data profiling
3 mins
When do I use this? - When it's less code than SQL AND it's easy to reason through the mechanics!
View Comments and ReplyTranscript
Show Transcript
Hey folks, this is sunk speaking here, and I'm here to give a demo on D P T Python models and, uh, a simple use case on why you should care and when to use it.
And so you're probably wondering where do I start to get some basic context on this? That's where you have a full doc's website, just to give you some copy and paste code snippets, and just honestly like a really thorough rundown of what works and what doesn't work.
Um, overall, the main problem that this solves for is things that are really hard, but doable and SQL should probably be, um, replaced with something that's readily battle tested and easy to do in Python.
And a simple example of that is even just this file right here. And so there's a couple extra stuff here, but let's really focus in lines eight and nine, where overall I'm defining a function to wrap this around.
Uh, but the problem I wanna solve for, for this is, Hey, I have this really giant table and I could probably write a bunch of SQL that gets me information like account mean median MinMax, things like that over like revenue data, as a simple example or transactions, I could probably, you know, spend a lot of time modeling, you know, a couple dozen lines, maybe a hundred lines of SQL to do that.
And it's relatively doable to reason about, but instead of doing all that, I literally write a single function word called the scribe, which is built into, uh, Python, Panda, uh, data frame kind of, uh, syntax.
And it essentially takes this upstream fact orders SQL table over here, and then massages that data into a data frame and makes it, uh, snowflake, uh, compliant in this case using the snow park APIs.
And so let's just go and, you know, build this directly over here and I'm gonna move my icon around a bit just so you can see I'm gonna go here.
I'm gonna watch this run and wait a bit further to finish. Now, mind you, it's gonna take a bit more time just because since snow park is relatively new, uh, that more processing comput time needs to take place.
I'm sure this will evolve over time, but it is what it is today. Okay. All right. Took a while, but that's okay.
I'm gonna click on details. I look at this, I notice here it's a critter place, a procedure. That's the snow park specific syntax you probably have.
Don't have to care about that too deeply. The thing you should care about essentially is just making sure that you can select from this because at the end it creates a table within snowflake for you to query from.
Um, yeah. So here are the full logs for that, but essentially I would take something like this and then run it over here.
I'm gonna run this again. And then I get the exact statistical details and profiling I cared about in the first place and all that literally with a single line.
See? Yeah, that's about it. See ya.
Transcript
Show Transcript
Hey folks, this is sunk speaking here, and I'm here to give a demo on D P T Python models and, uh, a simple use case on why you should care and when to use it.
And so you're probably wondering where do I start to get some basic context on this? That's where you have a full doc's website, just to give you some copy and paste code snippets, and just honestly like a really thorough rundown of what works and what doesn't work.
Um, overall, the main problem that this solves for is things that are really hard, but doable and SQL should probably be, um, replaced with something that's readily battle tested and easy to do in Python.
And a simple example of that is even just this file right here. And so there's a couple extra stuff here, but let's really focus in lines eight and nine, where overall I'm defining a function to wrap this around.
Uh, but the problem I wanna solve for, for this is, Hey, I have this really giant table and I could probably write a bunch of SQL that gets me information like account mean median MinMax, things like that over like revenue data, as a simple example or transactions, I could probably, you know, spend a lot of time modeling, you know, a couple dozen lines, maybe a hundred lines of SQL to do that.
And it's relatively doable to reason about, but instead of doing all that, I literally write a single function word called the scribe, which is built into, uh, Python, Panda, uh, data frame kind of, uh, syntax.
And it essentially takes this upstream fact orders SQL table over here, and then massages that data into a data frame and makes it, uh, snowflake, uh, compliant in this case using the snow park APIs.
And so let's just go and, you know, build this directly over here and I'm gonna move my icon around a bit just so you can see I'm gonna go here.
I'm gonna watch this run and wait a bit further to finish. Now, mind you, it's gonna take a bit more time just because since snow park is relatively new, uh, that more processing comput time needs to take place.
I'm sure this will evolve over time, but it is what it is today. Okay. All right. Took a while, but that's okay.
I'm gonna click on details. I look at this, I notice here it's a critter place, a procedure. That's the snow park specific syntax you probably have.
Don't have to care about that too deeply. The thing you should care about essentially is just making sure that you can select from this because at the end it creates a table within snowflake for you to query from.
Um, yeah. So here are the full logs for that, but essentially I would take something like this and then run it over here.
I'm gonna run this again. And then I get the exact statistical details and profiling I cared about in the first place and all that literally with a single line.
See? Yeah, that's about it. See ya.
More than 21 million people across 200,000 companies choose Loom
For Mac, Windows, iOS, and Android
My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.

Erica GoodellCustomer Success, Pearson
Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.

Tyson QuickCEO, Postclick
My new daily email habit. Begin writing an email. Get to the second paragraph and think 'what a time suck.' Record a Loom instead. Feel like π.

Kieran FlanaganVP of Marketing, HubSpot
Loom amplifies my communication with the team like nothing else has. It's a communication tool that should be in every executive's toolbox.

David OkuinevCo-CEO, Typeform
My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.

Erica GoodellCustomer Success, Pearson
Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.

Tyson QuickCEO, Postclick