BDgen: A Universal Big Data Generator

Published in IDEAS, 2017

This paper introduces BDgen, a generator of Big Data targeting various types of users, implemented as a general and easily extensible framework. It is divided into a scalable backend designed to generate Big Data on clusters and a frontend for user-friendly definition of the structure of the required data, or its automatic inference from a sample data set. In the first release we have implemented generators of two commonly used formats (JSON and CSV) and the support for general grammars. We have also performed preliminary experimental comparisons confirming the advantages and competitiveness of the solution.

Recommended citation: Tomáš Faltín, Michal Hanzeli, Vojtěch Šípek, Jan Škvařil, Dušan Variš, and Irena Holubová Mlýnková. 2017. BDgen: A Universal Big Data Generator. In Proceedings of the 21st International Database Engineering & Applications Symposium (IDEAS '17). Association for Computing Machinery, New York, NY, USA, 200–208. https://doi.org/10.1145/3105831.3105847
Download Paper