Texts about BPMS tend to focus at first hand on business process notations and diagrams. User interfaces to process tasks and web portal functionality are usually next topics, followed by business rules and integration. Such topics as data modeling at the design phase and data manipulation during the process execution are often out of sight.
Yet process model and data model are the key aspects of any process application architecture. Therefore, process modeling and data modeling are equally important core competencies of a process engineer.
Any process is executed not in a vacuum but in the context of process attributes. For any serious process application it’s not a plain list of strings and numbers but rather complex data structure containing reference and master data, 1:M relationships etc. Hence a process engineer must not only know a business process modeling notation (BPMN is the most common choice today), but also be a professional database developer. It’s not always articulated not because it’s unimportant but because it’s implied.
Anyway, this is a blind spot worth to investigate. Different BPM Suites utilize significantly different approaches to data modeling, affecting the resulting architecture of a process application, so it’s better to know in advance what to expect from a particular BPMS.
Now what are these approaches?
1. A flat list of attributes
A list of attributes can be specified for a process template - integers, strings, date/times, lists of values…
This approach is not used in modern BPMS because of its primitiveness but it can be found in legacy workflow systems. Such a system however may be positioned as a BPMS by the vendor.
2. XML document
Custom data types and complex data structures can be defined within a process application. E.g. a client’s request can be modeled as an object referring to a directory of clients and including a multiline part containing the list of goods and quantities. At the physical level all process attributes are rolled up into an XML document which is stored in a TEXT field of a relational DBMS.
This approach is implemented in BPMS from IBM, Oracle, TIBCO. It looks very attractive at the first sight: a comfortable graphical modeling environment, complex data structures, custom data types. One can model process data rapidly for a demonstration or a pilot project and present a working prototype to the customer.
However this approach may lead to serious problems in production:
1) Performance issues
Let’s consider a bank statement processing as an example. When the bank informs us about money income the current process should identify an instance of sales process and send it a message “payment received.” The right instance of the sales process should be searched by the invoice number stored as the sales process attribute - it must match the invoice number of the income record.
How would this search work at the physical level? The XML document containing process attributes should be extracted from the database, then parsed and finally the “invoice” element should be compared to the target. Now repeat for every active process instances. It’s OK if there are ten of them but how would it work if there are tens of thousands?
Just compare it with the standard way a relational DBMS does the job: an index on the field “invoice” and almost instant results - no sequential search, the elapsed time depends only slightly on the number of records.
2) Referential integrity
Uniform and consistent data base is the “holy grail” of business applications developers. Data access may be restricted by administrative rights, but technically all information is properly interlinked. Unified database grants that if, say, the sales process refers to a customer’s record, the latter won’t be removed.
When storing the data as XML documents placer the referential integrity is not guaranteed.
3) Proprietary Interfaces
A relational database stores data in a standard way, by rows and columns, and is fully open for access and manipulation (technically yet may be not by administrative rights). So you may be sure that you would be able to grant access to the process data e.g. from the corporate BI system if necessary - a connector will be there for sure.
You may be told that a BPMS provides API for the data access. It’s true, but utilizing API means writing a program code - compare it with configuring a connector by simply filling a few fields: server type and address, database name, login and password.
Secondly, a third-party system may be a “black box”. Such system most likely will support a standard mechanism of data access like e.g. ODBC but won’t provide ability to program arbitrary data source access.
It is known that any arbitrarily complex data can be presented as three-column table:
- object identifier
- attribute identifier
- value of the specified attribute belonging to the specified object
The benefit is the same as XML storage: one can expand and change the data model without touching the database schema.
Once again, the production use of this approach may lead to serious problems. For example, it’s relatively easy to implement an SQL join linking a dozen of tables. By contrast, it’s much harder to write similar request for EAV storage and performance of the request would degrade fast with a growing number of entities involved.
4. The database by reference attribute
Faced with the challenges outlined above, developers finally come to the conclusion that there is nothing better than native relational database storage and voluntarily abandon the richness of data modeling capabilities provided by BPMS.
The most consistent refusal is as follows: all attributes are transferred to a relational database tables, keeping a single attribute referencing the root record in a relational table dedicated for the process.
Such migration resolves the issues mentioned above but unfortunately create others - this time related to user interfaces.
The root cause is user interface development tools built-in into BPMS - they are designed primarily to interact with process attributes, not arbitrary data stored in DBMS. Therefore data transformation results in UI development slow-down at least. In the worst case one has to abandon built-in UI development tool and switch to some third-party web development framework.
And it’s not just about UI development complexity - instead of one project, seamlessly integrating processes, data, and user interfaces - you’ve got three projects that need to be synchronized. For example, when one changes the process schema, appropriate changes should be made simultaneously in the database and user forms. Given that different projects are usually managed by different people, it’s hard to fully avoid inconsistencies.
Transfer of a project from development to production environment also becomes “fun”. As long as built-in BPMS functionality is used, it’s a one-button operation. One can appreciate this functionality best when it’s not available.
All this is not good from the standpoint of methodology: instead of a BPM project, we fall into process-oriented software application development.
One may ask, what’s the difference? It’s business involvement and pace.
BPMS software is specifically designed to support both: it’s a logical, closed-loop environment that link processes with data and user interfaces in a natural way. It doesn’t mean that business will do all the job (this is naive yet not uncommon belief) - it’s enough for them to understand what analysts and developers are doing and to be able to criticize, make corrections and generate ideas on how we really should do business. That’s the essence of BPM, its strength.
We will lose these benefits when built-in data modeling and forms development are abandoned.
If you delegated a process work to IT, then it’s IT who manages your business processes. Not management, not the process owner - programmers! With all due respect, one can’t expect more than employee productivity improvements based on more comfortable and ergonomically designed business applications.
Does it relate to the bottom line? Hardly so. The business target can be reached from other direction - by a systematic view of how business units interact with each other and the company as a whole with a customer in the context of end-to-end business process. Only the business people - management and key specialists - can and should tell how the company should do business to meet customers’ expectations and win the competition. Do not expect it from programmers.
In addition, business involvement is impossible without maintaining the pace of the process development. One may count on business ownership of the BPM project only if the time from an idea of process improvement suggested by the business to the implementation takes days or weeks at maximum. Months is the same as never because the enthusiasm of the business will inevitably fade away.
Simplicity is one aspect of business involvement, pace is another. We lose both by replacing the rapid development tools built into BPMS with more labor-intensive ones.
5. Native relational database
Storing attributes in XML or EAV is a kind of “database over the database” approach. It raises the obvious question: why not using a relational database natively?
Bizagi BPM Suite is an example of such approach. As any BPMS it provides built-in data modeling and UI development tools. Business objects are designed in the usual way by defining attributes and relations, then attributes are put at the canvas to compose a screen form.
Everything is quite straightforward at the physical level too: entities correspond to relational tables, attributes to columns. Each process template is linked to a database table and a process instance corresponds to a row in this table. Apart from these “process” tables there are master data, reference data… in short, everything that we normally see in a corporate applications database.
Advantages of this approach double when one goes from the first process to second, third, etc. In case of data stored as XML, access to another process’ data should be programmed and the amount of programming is growing because the more processes, the more interactions between them. In the case of Bizagi, on the contrary, the addition of the following process becomes cheaper because it reuses existing master data and reference tables.
With this approach developers of a process application don’t meet any of the above-mentioned issues.
I guess Bizagi developers did have some issues: they had to build a database design tool into the product (ER diagramming tool, to be precise), implement mapping from logical design to physical database layout and automate schema changes transfer from development to production preserving operational data. Anyway they did it, everything works stable enough.
To me personally this solution looks so straightforward and beneficial that I don’t understand why all BPMS developers didn’t choose this way.
This post expressed the author’s view of approaches to process and data interaction based on experience with nearly a dozen of BPMS products. Yet the experience with some products was not very extensive so the author doesn’t pretend to be an expert towards them and secondly, this sample may not represent the full spectrum of BPMS. Therefore the author would be extremely grateful for corrections and alternative views from experts in various BPMS or links to such opinions.
But anyway, the method of processes interaction with data is one of the most important aspects of BPMS architecture. Evaluate the pros and cons of the chosen solutions in advance - there is always one way or another to compensate a deficiency. The worst case is when architectural problems hit you in the middle of a project, forcing a developer team to reengineer the application and putting the project at gross risk of failure.