Building the data pipeline - A Guide for Project Managers

How do you leverage engineering metrics to track business performance KPIs (do they have to be linked or can they be totally different things)? Anyone go through such an experience? What's your story there?

This was the question on the discussion board within the Ryerson University - Innovation Boost Zone (IBZ). This is how I took the opportunity to answer this question:

My current process around this is to use JIRA for both the development tasks (Code Assignments) as well as the user stories. I also use JIRA for testing against benchmarks for GO / NO GO decisions. For a recent task, I took a Google Form and generated User Test Cases to evaluate the experience of onboarding a new customer and walk through the key product features. This was used in facilitation of 3 x 6 person testing sessions where we presented the software, issued activation keys, and responded to real-time interactions with the trail participants. The issues from this testing session are entered into a JIRA workflow for resolution.

JIRA is a great tool, but there are several important things missing / not available in JIRA when it comes to measuring team performance. JIRA provides a great snapshot in time and allows for visualized reports on current state. Where it becomes tricky is measuring trend over time. In order to do this, you need to extract the data on a regular basis...

I extract this data, along with data from other testing trials on a weekly basis. This data is extracted from JIRA using the XML format (this format includes built in relationships difficult to extract from JIRA using other formats, one of the most important is the issue comments with their timestamps - allowing metrics such as "mean time to respond" or MTTR). Using a simple set of XSLT instructions, I transform the data and load them into my database (more on that later). The choice of DBs is really dependent on the nature of the individual projects. I frequently use PostgreSQL, MySQL, or Access.

We're not quite done yet... Databases are great tools for building statistical models. Databases, however, are not very user friendly... So to allow people to understand the data and explore the data to generate more meaning and understanding, we need to visualize this data and allow people to navigate through the data structure...

From this point, I build out database views of the data and perform data transformations so that drill down analytics becomes easier to create. If your using a network setup running PostgreSQL, you can simple generate permissioned views and make direct database queries available over the network to enable visualization tools to download the required data. Once the data is transformed (I sometimes refer to this as cubing my data or building data cubes). I use these data cubes to enable features in Tableau or PowerBI reporting tools....

For software engineering teams, you can track story points for estimation. Once you track your development time effort on each task, you can compare your estimation with your actuals.

As a quick note, I follow largely this same process when working with project data from MS Project - extracting weekly (either CSV or XML) and generating DB driven weekly reporting.

Because this is such a repetitive task, I sometimes automate this Extract - Transform - Load step (depends on how "big" the data is). Because I generally work on Linux system command lines, I usually use Ansible for this task. This allows you to program the collection and processing of data, even when other machines do not have control software. Ansible provides a simple configuration language (YAML) for secure shell (SSH), command line interface (CLI) access to the invoke Linux Shell commands across the network and can easily be configured with network certificates - it is also extensible using Python scripting.

Another way to measure performance is to build in this measurement into your Test Driven Development (TDD) strategy. This strategy allows you to build in test cases within your development code. You can then determine how often a scenario arises that causes your code to fail and work to track this failure to a specific piece of code. This helps to identify areas of your software platform that are overly complex and generating downstream issues - helping to prioritize code refactoring efforts.

Additionally, you may want to implement automated testing scripts such as Selenium WebDriver. This software provides a simple coding framework (python library available) for navigating a web pages document object model (DOM) and invoking user actions such as button presses, text entry and form submissions. You can then track the  number of failed test case and map them back to code deployments.

I personally don't do this, but others also parse the Git repository to extract and analyze bug defects against code submissions - this is an emerging field of research.

I would also look at companies like Snyk.IO who are growing huge in secure development operations (SecDevOps) and may have some worthwhile tools for a new startup to evaluate when building out a software based startup teams.

Of course, in an agile startup, you will probably want to pick and choose how and what you use from the above, but hope this experience helps you...


Popular posts from this blog

Building AI Muscle

IETech - The First 10 Years...