How Does Simple Text Markup Differ Across The Office 2007 Suite

February 4, 2010 by

 

Our theme recently is things that need to be made more consistent in the office products in order to make document generation development more efficient for developers.  This time around we will focus on difference between the way text is marked up in Word and PowerPoint.

I have found that there are a number of subtle but important differences in the way text is written to the Open XML standard.  This is then reflected in the Open XML SDK’s API.

Examples of these differences are apparent in features such as text color, bolding, underlining and bulleted lists.  The main difference seems that the Word team seems to have taken a more object approach and conversely the PowerPoint team seems to favor attributes.  The result is that the PowerPoint definition ends up with more moving parts.  To illustrate this let’s take a look at the setting of text color and underlining a group of text.

Text color is handled in Word simply by applying a Color object to the run properties.  PowerPoint requires a SolidFill object and child SchemeColor and LuminanceModulation objects to get the same effect.

The differences in the way that text is bolded is very similar.  In Word the run is assigned a bold object, but PowerPoint it is a boolean attribute of the run properties.

So what is the impact to the developer.  Code reuse for those of us who have to generate both documents and presentations is next to nothing.  On top of that the learning curve is practically doubled.

I realize that the two products have evolved through separate paths and isolated team, but time for that type of development is long past.  The Open XML standard should be unified where ever possible across the Office Applications and allow for greater interaction between the products.  Ultimately the synchronizing of these tools will lead to greater adoption.

What Makes CustomXml More appealing Than Content Controls

February 3, 2010 by

Word 2007 has two built-in methods for tagging content.  If you go to the developers tab you will find the ribbon has a section for Controls and a section for XML.  The Controls are also referred to as Content Controls.  The XML section allows you to define schemas that can be applied to your document and is sometimes called Custom XML.

Both of these constructs can be used when you are coding an application which needs to identify a part of a document and take some action on it.  The Content Controls are represented by SdtBlocks and SdtRuns in the Open XML SDK 2.  Where as, the Custom XML is represented by CustomXMLBlocks and CustomXMLRuns.

The Content Controls made for a lot of confusion when I first investigated using them since there are multiple types.  For my purposes only one really makes sense and that is the Rich Text.  It can be applied to any area of a document regardless of content.  Even with this capability I found that Content Controls didn’t fit my needs.

I realize that Microsoft views Custom Controls as more reliable because they are not tied to a schema that defines an strict document, but that isn’t how we use the schema.  Our schemas are a list of custom identifiers which the users can reliably apply to a document.

As an example, the name of a company may need to be inserted 100 times in random locations throughout a document.  This doesn’t fit a tightly structured schema, but it is also a risky to have the person editing the template enter an identifier by hand for each Content Control.

Accurate naming is just as important if not more important than identifying the type of objects contained in a tag when processing a template.  Add to this the fact that you may need to identify sections by business usage and can contain multiple child objects such as charts, tables, images and paragraphs.

Between law suites and a change in approach by Microsoft it seems that Custom XML is going away.  Microsoft needs to supply more guidance as to what approaches should be taken for marking up templates in a consistent fashion.  They also need to come up with a solution for what I see as major a major weakness of Content Controls for this type of usage. 

Lastly, both features are only available in Word.  We do more template work with PowerPoint than we do Word which puts our development at a disadvantage.  People want this data to “present” and therefore need be able to generate “presentations”.  Getting similar functionality in PowerPoint is essential.  Hopefully we will see these improvements going forward.

The Challenges of Inconsistent Implementation and Office Document Generation

January 29, 2010 by

I have spent the last several months developing solutions with Office 2007 and the Office Open XML SDK 2.  Our client has requirements that cross the suite from PowerPoint Presentations to Word Documents.  The Open XML standard which define the structure of these documents is very powerful.  My biggest frustration is the lack of consistent capabilities between the products. 

Since we are doing document generation based on templates it is very important we that the code can consistently identify any part of a document, whether that is a section of text, a chart, a table or an image.  While Word 2007 has Content Controls and Custom XML (2007 only) which can be used for marking up a document, similar features are not available in PowerPoint.  This is a major issue for us since the majority of our templated work is in PowerPoint.

A key to a successful solution for me is that a markup needs to be consistent in the way it is implemented in all of the Office applications.  It should also have a way that an end user can add a tag to a document without the risk of it being mislabeled because of human error.  This is one of the drawbacks of Content Controls.  Another thing that makes CustomXml more attractive is that you can use just one type of control to encapsulate content (more on Content Controls versus CustomXml in the next few days).  There are a variety of content controls that are tightly typed.  In other situations this may be a plus, but if anything the developer should be able to define the type of objects used for tagging.

Further, the fact that something as simple as a Text object being in a different namespace even within the same document type means that we have to write duplicate code for dealing with text in charts, document paragraphs and embedded spreadsheets.  If I were to design it, this shared functionality would be abstracted to its own namespace.  I want to be able to write clean, reusable code.

Ultimately the teams within the Office suite need to start working together the way that the language teams have begun to do within Visual Studio.  The same tagging tools should be available in Word, PowerPoint, Excel and OneNote and they should be represented the same in the XML that is rendered.

Retrieving A List of SdtBlocks for A Tag Value Using LINQ

December 24, 2009 by

If you are using a template document and replacing text programmatically using the Office Open XML SDK 2 API you will need a way to identify the target to be replaced.  One option is to use a Content Control and setting the tag value the same for all of the controls that need to be substituted with a single value.  After some trial and error and a lot of digging through the DocumentReflector I came up with the following LINQ query to get a list of all blocks with the same tag name.

var blocks = from s in part.MainDocumentPart.Document.Descendants<SdtBlock>()
              where (
                      s.GetFirstChild<SdtProperties>().GetFirstChild<Tag>() != null
                     && s.GetFirstChild<SdtProperties>().GetFirstChild<Tag>().Val.Value.Equals(placeholder)
                      )
              select s;

 

 

The nice thing is that all Word document content controls have the tag property as an option so this works whether you are searching for text, rich text or quick parts.  If you are searching for images and some other content controls you may have to search for SdtRuns instead of SdtBlocks, but the query is essentially the same.

del.icio.us Tags: ,,,

 

Dealing With Shared Strings In OOXML

December 23, 2009 by

Shared strings are the way that Excel reduced redundant data in a worksheet.  They are also important if you are working with charts in Word documents or PowerPoint slide decks.  Instead of inserting string constants into a cell you give it the index of the string from the SharedStringTable and mark the cell as having a shared string reference.

So what does it take to work with the SharedStringTable

The first thing you need to do is retrieve the existing shared string table.  This is a fairly simple operation as is demonstrated below.

SharedStringTablePart sharedStrings = tempSpreadsheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();

 

Another operation that you will want to do is finding strings within the table since the entire purpose of using it is to eliminate redundancy.  I handled this by creating a simple method after converting the SharedStringItems of the table into a List<T>.

private static int GetIndexOfSharedString(List<spreadsheet.SharedStringItem> items, string text){    int result = -1;

    for (int index = 0; index < items.Count(); index++)    {        if (items[index].Descendants<spreadsheet.Text>().First().Text.Trim() == text.Trim())        {            return index;        }    }

    return result;}

 

Lastly you will want to add new strings to the table.  This is fairly straight forward, but note the last three lines.  If you don’t update the counts you will find your new items never save.

sharedItem = new spreadsheet.SharedStringItem();sharedStrings.SharedStringTable.AppendChild<spreadsheet.SharedStringItem>(sharedItem);sharedStrings.SharedStringTable.Count++;sharedStrings.SharedStringTable.UniqueCount++;sharedStrings.SharedStringTable.Save();

 

That is really all there is to it.

(Note: for those who have complained about the code formatting I hope this is better)

Welcome to PSC’s Coding the Document Blog

December 23, 2009 by

So the adventure begins.  We will use this blog to discuss development with the prevailing document standards: Office Open XML (OOXML) and the Open Document Format (ODF).  The perspective will be mainly from a Microsoft .NET development environment, but where possible we will inject as much information as we can about alternative approaches and platforms.

As a starting point we would like to introduce your hosts and then in the coming days we will begin posting more instructional topics.

Tim Murphy is a Technical Specialist at PSC Group and has been an IT consultant since 1999 specializing Microsoft technologies.  He is a leader of the Chicago Architects Group and a speaker on software architecture and Microsoft technologies.  He was also a contributing author on “The Definitive Guide to the Microsoft Enterprise Library”.

Andy Schwantes is a Technical Consultant at PSC Group specializing in custom .NET development since 2001.  His current focus is Office Open XML and other related document generation technologies.

Find out more about PSC Group, LLC.

del.icio.us Tags: ,,