My photo

About me, and why the big data matters

I am a software engineer by education, IT architect and management consultant by profession. I have worked on IT scalability for a few large accounts, I authored one book, a number of talks and I happen to run the company GridwiseTech. Here you can find my short profile.

If you ended up here, you must be wondering why this blog was started. So here you go. Since the late 1990s I have been working on scalable systems. In 2003 I set up GridwiseTech, a knowledge company addressing scalability challenges for corporations and goverments worldwide. I was soon joined by a team of excellent engineers. Together, we have built collective knowledge on designing and building distributed, robust, and scalable IT systems, capable of handling large volumes of data and large processing.

A system is scalable when it can grow together with business. A challenge of any successful corporation is the growing amount of data. In this article (2007) I explained why this is natural and inevitable. Corporations building scalable systems are often stuck at the level of database, data provisioning and data processing. In our experience with buidling scalable systems, designing scalability at the data level is the least trivial task. There are numerous ways to address the problem and the devil is in the detail.

Over the years, we came up with our own methodology for data scalability. As the technology progressed, we learned to live with it and systematically evaluate new solutions with each new project. This ongoing research often goes beyond the project need, to satisfy our own curiosity. Some of the findings and thoughts never see daylight. I thought we could share some of these, using BigDataMatters as the channel.

That's really it. I hope this altogether will be useful to people. Next months will show. Feedback and discussion highly encouraged.

Pawel Plaszczak

About contributing authors

Most people contributing to BigDataMatters either work with me at GridwiseTech or worked with us with certain project. However, if you feel like contributing, please do contact me – as long as you have something meaningful to say, related to the subject list:

Data scalability, database scalability, extreme transaction processing, dealing with large volumes of data. technologies and best practices to design and implement robust databases and scalable systems in general.

Contributing authors are:

Wolfgang GentzschWolfgang Gentzsch is a Senior Strategist with GridwiseTech. He is also the dissemination advisor for DEISA, and a member of the Board of the Open Grid Forum. In 2004 – 2008, he was a member of the US President’s Council of Advisors for Science and Technology (PCAST), has directed the German D-Grid Initiative, and he was managing director of MCNC Grid Computing & Networking Services in North Carolina, supported in several projects by GridwiseTech. Before, Wolfgang was the senior director of grid computing at Sun Microsystems in California, with GridwiseTech as a Sun partner.

* Mateusz Kyc is a Software Engineering Consultant with experience in development and integration of puzzle-like enterprise-size systems and PM of challenging projects. Experienced in designing distributed cache&grid applications, data provisioning systems. His extended knowledge and experience also includes performance optimization of various technologies including relational databases and scalable architectures.

* Sebastian Czechowski is a Software Engineering Consultant. His interests in the field of computer science include database aware application development, performance testing and database optimization. His recent projects include web applications and web services development and database scalability testing.

* Kevin Glenny is a Consultant Software Engineer with experience of developing distributed high performing systems. Current interests lie within the fields of distributed cache&grid applications and fine tuning of database performance.

* Chris Wilk is a Software Engineering Consultant with broad experience and practical expertise in scalable heterogeneous distributed systems, including grid computing and HPC. He has years of experience with multiple operating systems. His fields of interest range from general-purpose programming languages, through scripting languages to software testing and solving performance issues.

* Dominik Szwiec is a Software Engineering Consultant. His specialization is increasing database performance and scalability by using large variety of existing solutions. He has been working in GridwiseTech database-related projects and goes far towards Momentum (tm). Except for databases he is Linux and scripting languages enthusiast.