Generating Test Data with Faker

Generating Test Data with Faker

When it came to setup an project environment (e.g. populating the database), or setting up data for a specific test case, I really often felt a great pain. Most of the times I started either with generating data on my own, or copying from my team mates database. Both lead to bad and/or unforeseen results sometimes, because you cannot rely on that data.

So I started to evaluate a way how to create accurate and reliable test data in a comfortable way, and finally stumbled upon Faker.

What is Faker?

It’s a library you can use to generate test data for some given predefined subjects (e.g. Address, Person, Internet), as well as for any other reasonable subject you can think of – it’s extensible via custom providers. I made a presentation for my team mates which contains a brief overview.

I presented the following examples to the guys which show you the easy approach of Faker, and the capabilities (e.g. localization and using of custom providers).

Example with default locale, provider and formatter:

Example with specific locale:

Example with custom provider JobOffer:

Example with seeding the randomizer:

Everything else is written on the GitHub page. Actually it’s not much more. It’s simple and powerful.

Drawbacks and further steps

Faker supports you in generating proper test data (e.g. an accurate user-name), but it lacks capability to define cases for large amounts of data, where you want to define specific scenarios and control the generation of the data. For instance you need 10.000 blocked users in the database, or 4.000 users who’d never logged in.

Finally I started an own project on GitHub that is based on Faker, and that should close that gap. At least for me. 😉 I will write about Phpteda (PHP Test Data) in a couple of days a little more, because it’s under heavy development currently.