Friday, May 30, 2014

Release and Code Branching Process

1.     Terminologies

Main Dev Branch:  this is the main development branch. All feature development for user case will be carried out on this branch. In TFS server the version control path for it will like  $/TeamProject/Main.  This branch will be maintained and worked on during the project life cycle.  All other branches are created from the main branch

Release Branch:  this is the branch associated with a specific iteration. (e.g. Iteration 1)  At any point of time, there could be none, one or a few more release branches.  It is to our interest to keep the number of it to minimum. (Not more than 2). The life span of release branches is limited. Normally it is created on the official deployment of the iteration at the end of the iteration or the starting time of the following iteration. It ends at the time when the subsequence iteration is released.  The decision point on if we need to have a branch vs. if we simply create a label on the main code branch is: do we need to fix bugs, defects in the release branch without including the new development effort in the subsequence iteration.  From overall manpower cost of view, you want to touch the release branch as less as possible. However, in the software development life cycle point of view, sometime it is necessary to provide bug fix on the release branch, especially when the system is in production. So the common practices are creating it at the time of release, and try not to touch as much as possible.

Replication: when work has been done on the release branch (e.g. bug fix), it is common expectation that those fixes need to be included in the subsequence release of the main development branch.  At the same time, because the main development branch is also an active branch, it is more than often simple merge from release branch to the main development branch would not work.  The file touched in release branch may be moved or updated in the main development branch; maybe the changes made in release branch are no longer needed in the main development branch.   The solution of the same bug fix in release branch and in main development branch might be different due to the priority and timing difference. It is common practices to provide quick fix in release branch and get the testing or production operation continue and provide more involved design over haul in the main development branch.  However, the bug fix needs to be included in the main development branch. The process to include the bug fix (or any other activities) in the main development branch is called Replication. However, there are also scenarios where simple merging works just fine.

Work item and state of the work item: work item as the name suggest, is an item that need some work. It is similar to WBS in MS project.   Each work item could have various states depends on the methodology the project team practicing. In MS agile process a work item have Active/Open and Closed / Completed states. In other processes more states are included. However the work item template can be customized to provide more sophisticated state transition.

An example of such customized state transition could be: New-> Active-> Resolved -> Reviewed -> Deployed -> Tested -> Closed. Another example could be <New> -> <In Progress> -> <Completed> -> <Reviewed> -> <Deployed> -> <Tested> ->< Closed>

Build Master: is the person or group who is in charge of making the deployment build and publish the build to the target region. In some project, it is called as configuration manager, in some project it is called as Configuration manager. Some call it Build Master. Basic they are the person who pushes the code from version control to the running environment. In some organization, for security reason, this role is isolated from development team.  In .Net platform, Strong name and assembly signing are provided to ensure this separation.

2.     Iteration Releasing


 At the end of each iteration, a build / deployment process will be carried out to deploy the code base with the newly developed features to a targeted region (IT/QA/UAT/CIT).   The following principles are observed by many development projects:

1.       The pre-condition of any release is that the current state of the code base is stable to be built and published. The project manager, development team, and build master need to be coordinated and communicated well to ensure the success of the release and the integrity of the release.

2.       It is advisable to schedule the deployment / release activities during less active period, e.g. after office hour or in the weekend when no or few check-ins are expected.

3.       When that is not possible or when the build process does not take much time.  The release can be scheduled in normal business hour when the development team members are working on the code branch. In this situation, The build master  should  make announcement before the build process started, ( 10 -15 minutes prior the build starts) to ask all development team members to check in whatever they think need to be checked in and hold off what they think should be held off.

4.       When it is the time to start the build, the first action of the build process is “Get Latest”, and start the build and until the build / release process is completed, no non release related check-ins should be expected.

5.       All deployment activities is to take the code source from version control repository (it does not matter what version control tools the project is using), none is from other source. (Not from server to server, not from database to database).

6.       All different components of the system must be in sync in the version control. An example of violation of this principle could be taking one component from one version and taking another component from another)

7.       Specifically, it is important to treat the database schema in the same way other source code is treated.  The database schema needs to be in the same label and same branch where the other source code lives.

Upon the completion of the release / build / deployment,   a label to be created on where the release is made from (for example, the main development branch) . (Not before the build is made, not many labels) and the label should cover all source code involved in making the build including database schema.

After creating the label, the build master shall send another announces to the team to notify the completion of the release and the team can resume checking activities. This is to ensure the integrity of the build. 

 If it is the main development branch or subsequence minor release is expected on it, a code branch is to be created from the branch (e.g. main development branch) based on the label made.

At this point of time, it is advisable to query the TFS work items based on iteration path and state to generate release note and also update the state of the involved work items from Resolved to Deployed. (This can be done manually or automatically by writing some small program using TFS API).  To further automate the process, customized report could be developed using SQL/ Server Report Service by accessing the data store behind the TFS Server)

3.     Minor Release (Intermediate release)


The process of handling intermediate release is very similar to the way iteration release is handled. The only difference is that intermediate does not need to create branch in most of time.  Other than that, labeling and the respective principles are the same.

4.     Labeling vs. Branching 


The purpose of labeling is to enable us to make and deploy the identical build into different regain at different time.  (An example will be you first made a build and deployed to IT region, when all tests are good, the same codebase is approved to be deployed to QA then UAT then production. As one of the version control best practices, you do not want to version control the generated files like .exe or .dll files, but you also want to be able deliver the same codebase at different time. The solution is labeling.  At any point of time, you can pull the code base out from the version control based on the specific label and make a build to produce the same binary files for deployment. 

The purpose of branching is to enable code change on a released branch for a minor release after the iteration is concluded.  If the project does not foresee such need, branch can be omitted.

5.     Development process in release branch


When you are given a task to work on, you want to know for sure, the work is to be done on which branch (if there are more than one branch). If project manages  development activities using TFS work item tracking, you want to make sure the work item you are given or you are about to create is in the correct iteration path. Then based on the iteration path you work on the code changes in its respective code branch.  Iteration path of the work item is very important. One should be very careful about it before making code changes. Even you do not use TFS or use some other tools or even you want to manage your project using excel file. You want to be very sure that iteration is an attribute for all WBSs.

When a bug fix  in the release branch ( iteration 1)  is completed and tested and accepted by the user or client, a related work item (bug) is to be created in the main development iteration ( iteration 2) and assigned it to himself or herself.  And the original work item is to be closed 

6.     Replication process

 

The project manager or project scheduler will make some adjustment on the resource allocation, it is considered more effective that the related work item to be handled by the same person who provided the fix in the release branch, but it can be different due to all kind of considerations.

With this arrangement, the replication process is been managed in the same way the other development activity. The only difference is the development resource can refer back to the original work items for descriptions, for test cases and for code changes. 

The time frame of the replication process for a specific work item is from the work item be accepted in release branch to the time the subsequence iteration is released.  The project manager / scheduler or individual developer could use some judgment call on case by case basis.

7.     QA process of the replicated big fixes


At the time of the release, the Release Manager or Build Master could update the state of the work items based on iteration path and state from resolved to Deployed. And generate the release note from it.
At the any point of time, QA team can simply query the work items in TFS based on their state, iteration path to manage its testing activities. For example, before release is made, the QA manager can query the work item in a specific iteration to get the scope of the projected release.  He or she also can query the resolved work items in the projected iteration to plan its testing activities.  After the deployment, the work items will be changed to Resolved to Deployed.

8.     Branch retiring


Other than the main development branch, release branches only in active mode for a short period of time.  The maximum of it is the duration of the subsequence iteration or the duration of 2 iterations.
However, up to the time when the subsequence iteration is about to be release, there might be some open bugs filed on the release branch yet to be resolved. In this situation, all open work item shall be migrated to the future iteration and to be handled then. (Simply change the iteration path, this is also can be done programically with TFS API)

When the work item   migration is done, the code branch can be retired. One should publish some kind of notification to inform the team. General recommendation is to keep the retired code branch in version control for tractability. However, they could be purged after certain cold down period defined by the team.

9.     General guidelines related to Release / Branching process


a.       The fewer branches lead to less replication activities. The more branches leads to better capability on provide fixes based on certain release. The project team needs to balance between these 2 conflict goals.   General practices are that for any release that could be deployed to for production operation, you want to create a branch for it. This is to prepare the code base for production support. In another cases, you want to negotiate with your clients as hard as you can for not to provide fixes in release branches. However, there are situations where the exit criteria of the iteration are associated to certain numerical figure of open defects in each category. In these situation , the project manager is motivated to let the team touch on the release branch and get these bugs / defects resolved in order to meet the exit criteria and secure the payment from the client.

b.      For each major release branch the code base based on label. Always label first and branch based on label.

c.       Try not to work on bug fix in the release branch as much as you can. The decision point is : do we need to provide another build for the bug fix on the original code base.  A example could be like this: you completed iteration one and processed to iteration two, then a bug is identified in iteration one deployment.  Your question will be: do you want to see the solution implemented and deployed before iteration two is completed and Do I want to make a build with the bug fix implemented in iteration two and deployed to a region before iteration two is completed.  In most of time, your answer to the second question is No, and the answer you might get from the project management or client for the first question could be yes.   If you get Yes, No answers for these 2 questions you will end up touch the code in release branch, in any other situation you will not.

d.      Pay attention to the priority difference when you work on the work item on release branch and on main development branch. In support branch, you need to get it resolved and resolved fast. You want to leave less fingerprint possible, while in the main development branch, you can take your time to do some design and have larger fingerprints.

e.      In main development branch, it is highly advisable to conduct certain level regression prior to releasing for client facing test or production operation.
f.     As it is replicated individually based on work item, the developer can conduct unit test in reference to the original bug, the QA also can conduct the test in reference to its original test case. The quality of the replication will be improved as comparing to batch replication based on change sets. on the other end, it will take a bit more effort as comparing batch replication based on change set.    

Monday, May 19, 2014

Anti-pattern for January - Shiny Toy

The anti-pattern for January is Shiny Toy. http://deviq.com/shiny-toy
 
 
The elaboration for the anti-pattern is: Much better than last weeks’ model, So what if it doesn’t work with anything else?
 
The description for the anti-pattern is: The latest, coolest or technology available. With its boundless potential, time is taken to apply the technology, but none is taken to ensure it is tested and proven to be reliable and compatible.
 
The further elaboration from the web site is read as the following:
 
The Shiny Toy anti-pattern refers to the practice of always thinking today's problems can all be solved by the latest bleeding-edge tool, technique, or library. While it's true, the software industry is constantly trying to improve and evolve, new approaches almost always bring new headaches along with any advances they make in functionality or productivity. Depending on your application and organization's tolerance for risk, it may make sense to wait to integrate new libraries into your application until they've been proven in production.
 
The quote on the calendar is:
 
The future is already here, it’s just not evenly distributed
 
 --WILLIAM GIBSON
 
 
In my decades of writing software, I have been bitten many times by this kind of “Shiny toy”. From far, they look so attempting. But when you are in it, you could find you are in the situation you do not want to be. This kind of mistake is especially easy to be made in the Goggle search time as we are now.  About a month ago, when I was writing my windows store app, I need a way to default the enter key to a click of a button. This is not provided by XAML without doing event handling. In the process of doing that, I found a class on the web. I was so excited and without spending much time I put the class into my project. Guess what it works in most of case, but when the button is disabled, the logic still invoke the logic behind the button when I push the enter key. Fortunately, I did not publish the code to the store yet. To fix that problem it took me a week or so…
From the time to time, when I took the code form Microsoft SDK samples, and put them in my project, to my surprise, there are bugs in the code… these bugs delayed the publishing on my app for a month or so…
 
Lesson learned here is: verify, verify and verify, when you found something making excited.

logic delete and data integrity in RDBMS


Something came to my mind on this topic and gave it some thought and I would like to share with you.. Please jump in with your take on it, I am do not think I have the golden solution for it.  Okay let me get it started.

 In the database world, data integrity is second to none in term of importance. More specifically there are Entity integrity and Referential integrity are the two most vital integrities of any database.

Entity integrity ensure there is no duplicated rows in the table, in a relational database this is enforced by defining primary key and unique constraint.

Referential integrity ensure that the reference value are valid value in the master table.  Take an example, you want to ensure that the state code in the address table is a value state code in the State table.  In a relational database this is enforced by defining foreign key.

Those are working well.  Wait, they working well before logic delete is implemented. They stop working as soon as logic delete gets into the picture.

Consider, this case, in our database we have a table call Course which lists all courses offered by the college.  From business point of view, the course code is the unique of the table. There is a table call Student, which stores all students if the college. There is a table call [Course Election] which stored all courses elected by all students.

Before we logic delete is implemented, the primary key for Course table is [Course Code], the primary key for Student is [Student Id], the primary key for [Course Election] is [Student Id] and [Course Code] combine .   Then we define the foreign keys in [Couse Election] for [Course Id] in reference [Course] table and define foreign key in [Course Election] for [Student Id] in reference [Student] table.

With this design, no row could be added to [Course Election] without valid [Student Id] or [Course Code], for each [Student Id] and [Course Code] combine, there is only one row is allowed in [Course Election] Table. These are all good. But let’s add logic delete flag in Student table and Course table.

This is how we design it to work, when user update the information for a student, the original row will be marked as Deleted and a new row is to be inserted. The similar logic is implemented for Course Table.  From the surface, it looks neat. I have one table storing the current data as well as history data.  But if you take a closer look you would find you created more problems than you plan to solve.  First of all, what would be the primary key for course and Student tables? Just   [Student Id] and [Course Code] would not work, because the same student may have many rows, with many deleted rows and one active row.  How about added the active flag as part of primary key? That wouldn’t work either. Because there could be many deleted row for the same student.  How about adding a creation TimeStamp column into the table and make it part of primary key? Well the table now accept many deleted row for the same person as long the rows are not created at the same time, but it also accept multiple active row for the same student as long as there are not created at the same time. This is not what you want to see.

Enough about the problem on primary key. The situation on foreign key is not in any way better. Let’s say we have student got deleted because he or she has transferred to another college, the row is still in the table for the student, it is just that the Deleted Flag is true.  From foreign key point of view, there is no problem we added an election for a specific course for the transferred student.  But from business point of view, that is against the business rule.

The conclusion is that as soon as logic delete is implemented, our hope of using primary key, and foreign key to enforce Entity integrity and Referential integrity is gone for good.

The question still, the business requirement needs to store the history information on these important tables, how do we do it?  It is a valid question for any data model designer.  Well, this is my solution to the business problem.

I will define the [Course] Table and [Student] Table as they are, no logic delete flag, and I will design the [Course Election] Table as pre normal. This is so that the primary key and foreign keys are working as expected. Then I will design [Couse Audit]  and [Student Audit] Table. These audit tables have 2 set of columns, one for current values and one for previous values and with a set of columns recording, who, when and what kind of information. In these tables, the primary key will be a guid or an identity column. No foreign keys no unique constraint.

Then I create a trigger for the table to handle AFTER DELETE, INSERT, UPDATE to insert into the Audit table with the following query


Select *
From inserted I
FULL OUTER JOIN Deleted on I.Id = D.Id

With this design, the data history is preserved, the data integrity is enforced by the database schema, the coding (whether it is  .Net coding or store procedure coding ) are simplified

Historically speaking, logic delete was one of the legacy carried form Mainframe system in the time there was no relational database yet. If we carry this legacy item into our modern database design without fully understanding of the impact of such legacy, we might got ourselves in trouble… be careful. You want your modern database designed as modern database which data integrity is enforced by the database engine.      

Tuesday, May 13, 2014

Anti-pattern for May-- Death by Planning

I will use this week's writing for Anti-pattern of May

This is how it goes:  The title of the anti-pattern is Death by Planning, the sub title is the meeting will continue until morale improves
This is the link to the calendar page http://deviq.com/death-by-planning

The elaboration for the anti-pattern is: The act of over-planning for software development projects, which by nature are very chaotic and often run into problems that cannot be plan for.

The further elaboration from the web site is read as the following:

Although planning is an essential part of building quality software, keep in mind that shipping is a feature<http://deviq.com/shipping-is-a-feature>. Keep short term plans as close to the actual implementation of your software as possible, while keeping long term plans as vague and flexible as possible.
Also, keep in mind the expense associated with meetings. If six people who make an average of $60,000 in salary meet for an hour, and their benefits, etc. costs another 30% above that, the meeting costs the company about $225 (assuming 40-hour work weeks, etc.). You can install a meeting cost app<https://www.google.com/search?q=meeting+cost+app&oq=meeting+cost+app> if you're interested in tracking just how expensive your meetings are.
It's a good idea to minimize how much time is spent on meetings. You can't eliminate them entirely, and the larger your organization, the more time is required to keep everybody on the same page. However, you can keep meetings shorter by having specific agendas, using standing meetings where possible, and minimizing who needs to attend meetings. Just these three simple techniques can make a significant difference. Starting on time and ending on time are also a big help in limiting how much time is wasted by meetings.

Quote for the anti-pattern is "Everybody has a plan until they get punched in the face"   -- Mike Tyson

 In my 15 years working in the US IT industry, I have had 10 jobs. They are from in-house IT department to IT consulting firm; from small shop total headcount 10 people to large MNC. Throughout the time working with these companies, I observed one thing: that is the number of meetings, the less chance the project would be success. There are individuals I have seen are very special. For one they are very busy, for two they are all busy in attending meetings.  As if their job is attending meetings, no deliverables before the meeting, no action items after meeting. From 9:00 to 5:00 they simply jump from one meeting to another.
I also have seen some individuals who try to avoid meetings, because they believe they are too busy to attend meetings. They have some more important deliverables to produce.  Offer I see these 2 group of people do not work well together.
It is given, that meetings are necessary in any software development project. For all attendances, for them to be motivated to attend meetings, they need to feel be benefitted. Similar to documentations, for these who produces documentation, they need to believe the documentation will help them in their work.
The following are some possible benefits for attending meetings:
1.      I have some ideas I want to explain to others so that they accept my ideas and work in the way I prefer them to work
2.      Someone has some ideas to explain to me so that I can produce the result that works with everyone else
3.      Something to taught or something to learn
4.      There are some disagreement, we want to establish some common understanding through the meeting, so that the result we produce can work together well.

 From the list you can see, meetings are not the starting points, neither end points. You either bring your value to the meeting or take away some value from the meeting to do your work or both. As all know, the most effective communication is face-to-face conversation. For meeting organizer, to make a group of people into one room have face to face conversation is the most effective way of communication, but it is also most expensive way of burning man-hours. Use it with the cost in mind. For meeting participants, bring your value to the meeting and make the meeting valuable to your work.  For those who are meeting goers, I suggest you realize your value beyond meetings.