Most recent post

Saturday, November 10, 2007

Advice - SDLC content search

A freeware tool (with VB.NET source) that I wrote is available that can upload the artifacts discussed below. It is available in another post. e.g. It can upload e-mails from PST, Public Folders, and Exchange Mailboxes and place on a file system in MSG format (using Redemption). Additionally it can extract Visual Source Safe (VSS) documents, SharePoint Lists and Wikis (extracted to HTML) and QualityCentre Test Cases (Extracted to HTML), The extracted files are compatible to SharePoint naming conventions.

I've found that a wide variety of tools get used for Software Development and they are often not searchable as a whole, examples of tools in a single Microsoft Development Enterprise are :-

  • Visual Source Safe and/or Team Foundation Server - Code and document repository and versioning.
  • SharePoint (WSS 3.0) - User issues and new requirements log, Wiki for tips and tricks, and document repository for sharing with users.
  • HP Quality Centre - Test Cases and Test Plans
  • E-mails - for Support communication, project management etc
  • Development Bug Tracking system - Often implemented as a custom built database solution, or using TFS.
  • Online Help - Where your users are told about everything in the software (sigh...which they demand, but never use!)

I've seen a lot of teams struggle to find anything historical as there are just too many places to look. The dream of bringing all these items together in a single tool isn't that realistic seeing that a single tool which does everything above (and do it well) will cost a fortune - documents, e-mail etc are always going to be outside of most SDLC tools.

Example - An ongoing Enterprise Development team is still developing several years after the first release. Years have passed and a lot of the original team have moved on and unfortunately with them all previous decisions. Being a professional team everything is well documented, however they have left behind 50,000 artifacts across e-mails, documents, bug tasks, test cases, user issues etc. The team decides that a field needs to be altered, but have no idea when it was introduced or more importantly why. Team members start searching across all repositories, but each tool has its own searching syntax and some systems like e-mail, online help and even source control have either no, or extremely poor quick content searching capabilities out of the box.

Solution
Wouldn't it be great, and much simpler if all content could be searched and additionally the search facility provided was quick and powerful enough to choose which areas to search across. Obviously you can go and purchase an Enterprise searching solution with plug-ins to the many systems, but this creates an administrative mess in having to divide up the content for different purposes.


Click picture to view in better detail

I believe there is a much simplier solution by using the native features in Windows Sharepoint Services (WSS 3.0) which comes standard on Windows 2003. You can create an SDLC content searching facility by exporting the content from the above systems in a HTML, DOC, or MSG format and using SharePoint's indexing facilities to search across the content. Obviously data will be duplicated, but the SharePoint content can be seen as read-only and thus synchronised daily/weekly from the source systems. Disk is relatively cheap and most of the content here is small, however numerous.

There are 2 parts to this solution 1) the Searching Portal which the SharePoint site with a SharePoint Web Part and 2) a tool to extract and load the content into SharePoint. Point 2 will be explained in a future post

How the SDLC Content searching portal works
WSS 3.0 provides a search page to conduct searching over content. It is in the format of http://<siteurl>/_layouts/searchresults.aspx?k=<KEYWORDS>&u=<SEARCHSUBSITE>
The query string allows for standard search test and additionally the entry of search properties. e.g. instead of just 'k=Functional Specification' you can add the syntax 'k=FileExtension:"DOC" Functional Specification'. Some useful search properties I've found in my travels for WSS 3.0
  • Title - This is the document title, extracted from the document meta data in Office documents, or from the <title> tag in HTML documents.
  • FileName - This is the document filename.
  • Size - This is the document size
  • Write - This is the date the file was last written to.
  • Author - This is the document author
  • FileExtension - This is extension of the documents. e.g. FileExtension:xls
  • VPath - The virtual path to the item.

The last Property is the one that allows us to restrict to certain content areas. These properties get used by entering +<property>:<value>.

Steps
1. IFilters need to be installed on the Sharepoint server to scan the content uploaded from the various systems. e.g. if you have PDFs you'll need to install a PDF IFilter. For a long time a Microsoft MSG IFilter was missing for some time, however this is now available (http://blog.gavin-adams.com/2007/10/09/enabling-the-inbuilt-msg-ifilter-on-sharepoint-even-64bit/)
2. On a SharePoint site create a Document Libraries for each of the content you plan to extract. These provide a container for content and using 'VPATH' property field provide a way to differentiate content.
3. Add to the SharePoint site a 'Form Web Part' and select Source Editor.
4. Download the code and insert into the Source Editor.

This code is using check boxes to create vpath entries that are fed into the SharePoint search. Sections have been re-produced below - you need to replace the red code below with your own Sharepoint URL and the green code with your own document libraries, descriptions and code.

Click picture to view in better detail


5. Optionally I suggest adding a Content Editor Web Part with the following Content to help users :-

Use the Search facility above to search for documents and/or e-mails and/or SDLC artifacts. Enter search text and restrict by specific areas. Selecting no areas will search all areas. Additionally search via document properties. Syntax : e.g. Title:"Load Balancing"Properties available :

Title - This is the document title, extracted from the document meta data in Office documents, or from the <title> tag in markup documents.
FileName - This is the document filename.
Size - This is the document size
Write - This is the date the file was last written to.
Author - This is the document author
FileExtension - This is extension of the documents. e.g. FileExtension:xls

That's it!

0 Comments:

Post a Comment

<< Home