Home > Blog > Data-orientation vs. Document-orientation 1: Introduction

Data-orientation vs. Document-orientation 1: Introduction

Posted by jimcarls on September 5, 2011

This series of posts helps you move from being a "document oriented" user of word-processors and spreadsheets to a "data oriented" user of a database like FF&EZ. Databases can make the documentation creation process much easier to manage. You can take advantage of that by understanding how data is organized in a database and losing your document-oriented thinking habits.

Being "data oriented" is really just thinking about the pieces of information you deal with daily and finding the most efficient and best-organized way to use them. It means that when you need to produce something with your data, you think about patterns, planning for reusability and looking for economies of scale instead of focusing only on the immediate result (producing a document).  It means backing away from the simple "git 'er done" approach to see larger relationships and patterns of use.  It's why people bake twelve muffins at a time instead of one, use lawn mowers instead of grass shears — and buy filing cabinets and labels instead of piling everything in a corner. 

One key concept: Computers process all data as lists.  At its most basic, computers process lists of things and lists of things to do to those things. Further, computers can process lists of relationships between things, such as when an object is "related" to a room by the fact that it is placed in it and conversely, when a room is related to an object by containing it — because if they are processing a list of objects that are also pointing to specific rooms, it becomes possible to process data from the rooms as part of the logic.  If those objects are pointing into a product spec list and the specs are pointing into  vendor list, then all that information is available, too.  Even further (and this may surprise you) what computers produce are also lists. Setting aside the obvious example of reports and screen tables, even the most dramatic computer-graphics animation is a sequential list of screen pixels to light (so quickly that it appears to be a continuous, moving image).

And so here is one way to distinguish between data-orientation and document-orientation:  A database system can accurately produce many different documents, even after revisions are made, because the data only exists in a specific source list, not in the resulting documents. Document-oriented thinking tends to see the data as "existing" on each document — where it must be maintained — whereas data-oriented thinking expects that the data only exists on the source list (or data "table") and need only be changed there. Data is displayed in all other places by simply looking it up on the source.   For software to process data efficiently, we only need to enter that data one time but in an organized way that makes it easy to find and process it according to established rules.

To see the problems with thinking of data only in terms of final documents, let's start with one extreme example of a document: Imagine one containing all the information about the FF&E that goes into a hotel. It's a long list and because our first step in organizing it is to figure out what objects (finished products) we want in each room, we set it up as a spreadsheet organized (sorted) by project area and room. As products are selected for the objects in the rooms, we add their detail to each row as needed, like...this:

project_spreadsheet.png
Whoa! Not exactly easy to read, is it? And this doesn't even include shipping and ordering addresses! This is called a "flat" file format: one "table" of rows and columns in which each row contains everything about one item (here, one FF&E object occurring in a specific room or room type).  This arrangement is easy to sort by location or product or vendor because every column contains the information needed to do so. However, it also becomes very hard to print and worse, to maintain — because there is a lot of repeated information, including the Area IDs, Room IDs and names and the room count. If one were to sort this spreadsheet by object Tag, one would also find that all the product information is being repeated for objects used in more than one room. As a matter of fact, if we color code the columns that can contain repeated (copied) information in them by the type of data they contain, we get something like this:

project_spreadsheet-foreign_data.png

The yellow columns often repeat information about the room and the project area in which the room is located.  The green columns contain information about the product (which is copied if the product is used in more than one room), the light blue contains the full name of the vendor whose Vendor ID appears in the green product column next to it, and the grey column on the right is a calculated field dependent on the room count and the object quantity per room.  In fact, there are only three columns that are not colored: "Room ID," "Object Tag" and "Quantity" (take note, we'll get back to those in the following posts).  It should not be hard to imagine that this information can be a pain to maintain when changes are made, because the changes must be repeated in each cell where original information was copied.

Now, there are advanced spreadsheet techniques that allow you to move the room and product detail to separate sheets in the workbook and use a type of formula element called an "absolute reference" in each room- or product-related cell to automatically display related information in the main spreadsheet instead of actually typing (or copying) it in. However, this takes quite a bit of effort to set up and only functions until you give your complicated spreadsheet to someone who doesn't know how it works and disaster follows.

Another common approach is to add columns for each type of room and enter the quantity needed for that type of room instead of having each room's contents listed sequentially:

project_spreadsheet-room_columns.png

This approach eliminates the need to duplicate the specification detail within each grouped room. However, it often leaves various items in a room separated from the rest, with some columns containing many blank cells.  This format still makes it very hard to print a usable report in which all of anything can be printed on a single page, much less a simple list of what goes in a particular room.

Interestingly enough, many designers solve some of the problems described above by taking an approach that is actually a start along the path to good data organization — and "data orientation".  I will discuss this method in the next segment.

Bookmark and Share

Comments: