Skip toΒ content

dbt Python Models: Fake Data Demo

3 mins

Snowpark Python Packages Allowed: https://repo.anaconda.com/pkgs/snowflake/

View Comments and Reply

Transcript

Show Transcript

Hey folks, this is Sun speaking here, and I'm gonna give a demo of creating fake data using D P T Python models.

And so let's figure out what problem this is solving for in the first place. And that's, there's sometimes where I want to, you know, simulate unit testing in D B T or, um, play around with different scenarios or even instead of having, you know, different CSV seeds and hard coding that information, what if I wanna make that programmatic at the Python level and let Python do the heavy lifting for me to generate a hundred fake rows versus having to do that manually in Excel or CSV and then importing that directly here.

And so overall, I'm just gonna click this build button and I'll explain what's happening while this is running. Step one, I'm importing the faker package and you're probably wondering how do I know that this is working with, uh, Snow Park in general?

That's where I double checked the Snow Park libraries Anaconda canonical list, and it tells me exactly where it exists and that I can use it generates fake data.

Okay, I'm gonna close out of that. Go back here and then I'll show you the logs for what's going on over here.

It completed successfully created this fake data example. It created this respective table and it created a, a stored kind of standard procedure, or not standard procedure, but a stored procedure in order to make this come to life.

And so overall step by step is it creates, I create this helper function that generates fake data and this is where the determining the number of rows comes into play.

In addition to that, I import that at the top level and at the D B T config context level so that D B T can recognize this as it's, um, performing.

Its its python compilations. And then from there I make sure to throw it into a data frame. I insert this function to create a hundred rows.

You could easily adjust this to a thousand or whatever number you want, and then it returns that data frame. And then from here, you'll notice on the right hand side, I already ran this once, but you'll notice because I just ran it, all these names and all this fake data will be refreshed to something new and shows am instead of that Jacob name, you see Kyle instead.

And now you're probably wondering, so well, how exactly is this useful? Oh, it's because instead of referencing a seed like you see here with this generic JSON example, instead going forward, I can do something that looks like this.

I'm gonna make a new file example sql, I'm gonna create that. And all I have to do is select start from, and I'm gonna ref that fake data example.

Nice. It was auto completed for me. I'm gonna click save And it should update my lineage to reference that fake data example.

And now I can filter for however I want or test different scenarios. And imagine it could be really cool once you start adding environment variables to things like this, where instead of the number 100, it's an environment variable.

So it's a thousand rows, a million rows, what have you, or even this, even these parameters for generating that fake data in the first place.

Uh, so in summary, if you want to generate fake data and you don't want to use CSVs and seeds to make that happen, you can do that directly with Python and hopefully it opens your imagination for what unit testing could look like for what different scenario, uh, scenario pipelines you can run, uh, in the future.

All right, Have fun folks.

Transcript

More than 25 million people across 400,000 companies choose Loom

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick

My new daily email habit. Begin writing an email. Get to the second paragraph and think 'what a time suck.' Record a Loom instead. Feel like 😎.
Kieran Flanagan

Kieran FlanaganVP of Marketing, HubSpot

Loom amplifies my communication with the team like nothing else has. It's a communication tool that should be in every executive's toolbox.
David Okuinev

David OkuinevCo-CEO, Typeform

My teammates and I love using Loom! It has saved us hundreds of hours by creating informative video tutorials instead of long emails or 1-on-1 trainings with customers.
Erica Goodell

Erica GoodellCustomer Success, Pearson

Loom creates an ongoing visual and audible experience across our business and enables our employees to feel part of a unified culture and company.
Tyson Quick

Tyson QuickCEO, Postclick