Author Archive

Alternate OOXML Document Generation Approach

March 31, 2011

Eric White has put out a document generation example which uses XPath and Word Content Controls.  I applaud Eric for the amount of work he has done with his exploration of different ways to perform template base generation.  This is a subject that is challenging and we need as many ideas as we can get.  There are a couple of areas that I see room for improvement in this XPath design that I would like to bring up. 

The first is that Eric has chosen to put his document generation in the document itself.  I see this as a maintenance and reusability issue.  Architecturally I would prefer to have my code external to the document so that I can write and maintain it centrally in a generic fashion and tie it to a rules engine.

Another place I see that this approach falls down is that it is good for simple text replacement, but it doesn’t handle formatting, replacing images or working with charts.  This doesn’t mean that it can’t handle them, but I think it would lose the simplicity which looks to be it’s appeal.

Lastly, Content Controls are currently a Word only feature.  It would be great if we could come up with a mark-up technique that was universal to all Office document types.  Hey, we can dream, right?

Personally, I prefer a more meta data driven approach based on my experience with solutions which had output that was more marketing material quality.  That being said, but approach is an interesting idea to add to the design arsenal.  Thanks for the thoughts Eric.


Update Since Microsoft/PSC Office Open XML Case Study

December 16, 2010

In 2009 Microsoft released a case study about a project that we had done using the OOXML SDK 1.0 for Research Directors Inc.  Since that time Microsoft has released version 2.0 of the SDK and PSC has done significant development with it.  Below are some of the mile stones we have reached since the original case study.

At the time of the original case study two report types had been automated to output as PowerPoint presentations.  Now that the all the main products have been delivered we have added three reports with Word document outputs and five more reports with PowerPoint outputs.

One improvement we made over the original application was to create a PowerPoint Add-In which allows the users to tag a slide.  These tags along with the strongly typed SDK 2.0 allows for the code to use LINQ to easily search for slides in the template files.  This allows for a more flexible architecture base on assembling a presentation from copied slide extracted from the template.

The new library we created also enabled us to create two new Word based reports in two weeks.  The library we created abstracts the generation of the documents from the business logic and the data retrieval.  The key to this is the mark up.  Content Controls are a good method for identifying sections of a template to be modified or replaced.  Join this with the concept of all data being generically either scalar or two dimensional and the code becomes more generic.

In the end we found the OOXML SDK 2.0 to be a great tool for accelerating document generation development and creating happy clients. 

Experience OOXML In Person

April 21, 2010

The Chicago Code Camp is coming up on May 1st.  I will be presenting on the essentials of document generation with OOXML.  The code I will be showing will leverage the 2.0 SDK.  Join us and come with your questions about Open XML.

Dealing With Table Borders In OOXML

April 5, 2010

Formatting tables in a document programmatically can be a very complex task.  This is the major reason which we start our document generation projects with templates instead of building components in a document by hand.

Borders are on aspect of a table that you may want to fomat.  Borders are used to make certain content in a table stand out.  If you need to conditionally set and remove borders there is something that you need to be aware of.  Even in OOXML you have the concepts of styles, inheriting styles and overriding styles.

When Word defines a table it will reference a global style such as “TableGrid”.  This style will include the borders for the table.  Specifically the InsideHorizontalBorder and InsideVerticalBorder define the borders for the cells.  These can be overridden by the TableCellBorders collection of a particular cell.  Adding a double right border on a cell is as easy as the couple of lines of code below.

wordprocessing.TableCellBorders borders = new wordprocessing.TableCellBorders();

borders.RightBorder = new RightBorder(){Val = BorderValues.Double, Color = "000000", ThemeColor = ThemeColorValues.Text1, Size = (UInt32Value)4U, Space = (UInt32Value)0U };


If I want to revert back to the table’s style for cell borders I simply need to remove all children from the TableCellBorders collection.  It is like removing a class identifier from a TD tag in HTML.  The style in the parent object takes back over.

With the knowledge of how the borders work you can take the concept and apply it to other effects of styles.

Open XML SDK 2 Released

March 24, 2010

This post is a little late since the SDK was released about a week ago.  At PSC we have been using the Open XML SDK 2 since its earliest beta.  It is a very powerful tool for generating documents without using the Office DLLs.  It is also the main technology that I have been working with for the last six months.  I would suggest giving it a try. 

Stay tuned here.  In the near future I will be presenting at different locations on this and other document generation technologies.

Download the Open XML SDK here.

Copying A Slide From One Presentation To Another

March 20, 2010

There are many ways to generate a PowerPoint presentation using Open XML.  The first way is to build it by hand strictly using the SDK.  Alternately you can modify a copy of a base presentation in place.  The third approach to generate a presentation is to build a new presentation from the parts of an existing presentation by copying slides as needed.  This post will focus on the third option.

In order to make this solution a little more elegant I am going to create a VSTO add-in as I did in my previous post.  This one is going to insert Tags to identify slides instead of NonVisualDrawingProperties which I used to identify charts, tables and images.  The code itself is fairly short.

SlideNameForm dialog = new SlideNameForm();

Selection selection = Globals.ThisAddIn.Application.ActiveWindow.Selection;


if(dialog.ShowDialog() == DialogResult.OK)




Zeyad Rajabi has a good post here on combining slides from two presentations.  The example he gives is great if you are doing a straight merge.  But what if you want to use your source file as almost a supermarket where you pick and chose slides and may even insert them repeatedly?  The following code uses the tags we created in the previous step to pick a particular slide an copy it to a destination file.

using (PresentationDocument newDocument = PresentationDocument.Open(OutputFileText.Text,true))


    PresentationDocument templateDocument = PresentationDocument.Open(FileNameText.Text, false);


    uniqueId = GetMaxIdFromChild(newDocument.PresentationPart.Presentation.SlideMasterIdList);

    uint maxId = GetMaxIdFromChild(newDocument.PresentationPart.Presentation.SlideIdList);


    SlidePart oldPart = GetSlidePartByTagName(templateDocument, SlideToCopyText.Text);


    SlidePart newPart = newDocument.PresentationPart.AddPart<SlidePart>(oldPart, "sourceId1");


    SlideMasterPart newMasterPart = newDocument.PresentationPart.AddPart(newPart.SlideLayoutPart.SlideMasterPart);


    SlideIdList idList = newDocument.PresentationPart.Presentation.SlideIdList;


    // create new slide ID


    SlideId newId = new SlideId();

    newId.Id = maxId;

    newId.RelationshipId = "sourceId1";



    // Create new master slide ID


    SlideMasterId newMasterId = new SlideMasterId();

    newMasterId.Id = uniqueId;

    newMasterId.RelationshipId = newDocument.PresentationPart.GetIdOfPart(newMasterPart);



    // change slide layout ID







The GetMaxIDFromChild and FixSlideLayoutID methods are barrowed from Zeyad’s article.  The GetSlidePartByTagName method is listed below.  It is really one LINQ query that finds SlideParts with child Tags that have the requested Name.

private SlidePart GetSlidePartByTagName(PresentationDocument templateDocument, string tagName)


    return (from p in templateDocument.PresentationPart.SlideParts



                    <DocumentFormat.OpenXml.Presentation.Tag>().First().Name ==


            select p).First();


This is what really makes the difference from what Zeyad posted.  The most powerful thing you can have when generating documents from templates is a consistent way of naming items to be manipulated.  I will be show more approaches like this in upcoming posts.

Naming PowerPoint Components With A VSTO Add-In

March 11, 2010

Sometimes in order to work with Open XML we need a little help from other tools.  In this post I am going to describe  a fairly simple solution for marking up PowerPoint presentations so that they can be used as templates and processed using the Open XML SDK.

Add-ins are tools which it can be hard to find information on.  I am going to up the obscurity by adding a Ribbon button.  For my example I am using Visual Studio 2008 and creating a PowerPoint 2007 Add-in project.  To that add a Ribbon Visual Designer.  The new ribbon by default will show up on the Add-in tab.

Add a button to the ribbon.  Also add a WinForm to collect a new name for the object selected.  Make sure to set the OK button’s DialogResult to OK. In the ribbon button click event add the following code.

ObjectNameForm dialog = new ObjectNameForm();

Selection selection = Globals.ThisAddIn.Application.ActiveWindow.Selection;


dialog.objectName = selection.ShapeRange.Name;


if (dialog.ShowDialog() == DialogResult.OK)


    selection.ShapeRange.Name = dialog.objectName;


This code will first read the current Name attribute of the Shape object.  If the user clicks OK on the dialog it save the string value back to the same place.

Once it is done you can retrieve identify the control through Open XML via the NonVisualDisplayProperties objects.  The only problem is that this object is a child of several different classes.  This means that there isn’t just one way to retrieve the value.  Below are a couple of pieces of code to identify the container that you have named.

The first example is if you are naming placeholders in a layout slide.

foreach(var slideMasterPart in slideMasterParts)


    var layoutParts =  slideMasterPart.SlideLayoutParts;

    foreach(SlideLayoutPart slideLayoutPart in layoutParts)


        foreach (assmPresentation.Shape shape in slideLayoutPart.SlideLayout.CommonSlideData.ShapeTree.Descendants<assmPresentation.Shape>())


            var slideMasterProperties =

                from p in shape.Descendants<assmPresentation.NonVisualDrawingProperties>()

                where p.Name == TokenText.Text

                select p;


            if (slideMasterProperties.Count() > 0)

                tokenFound = true;




The second example allows you to find charts that you have named with the add-in.

foreach(var slidePart in slideParts)


    foreach(assmPresentation.Shape slideShape in slidePart.Slide.CommonSlideData.ShapeTree.Descendants<assmPresentation.Shape>())


        var slideProperties = from g in slidePart.Slide.Descendants<GraphicFrame>()

            where g.NonVisualGraphicFrameProperties.NonVisualDrawingProperties.Name == TokenText.Text

            select g;


        if(slideProperties.Count() > 0)


            tokenFound = true;




Together the combination of Open XML and VSTO add-ins make a powerful combination in creating a process for maintaining a template and generating documents from the template.

Bolding and Underlining Text In Word Documents

February 16, 2010

In the templates that I have processed with Open XML there are usually a number of tables.  Some times we have to add an extra paragraph to a cell and we want to keep the formatting of the text already in the cell.  In this post I will go over how to apply bold and underline formatting to text as well as how to steal it from existing text and apply it to a new paragraph or run.

In order to apply an underline format to a paragraph by hand you have to start with the ParagraphProperties.  To that you append a ParagraphMarkRunProperties object to which you have appended an Underline object.  It isn’t that complicated.  There are just a lot of objects that have to be instantiated to format a paragraph.  Below is an example of what I have just described.

Paragraph newParagraph = new Paragraph();

ParagraphProperties newProperties = new ParagraphProperties();


ParagraphMarkRunProperties markRunProperties = new ParagraphMarkRunProperties();

Underline newUnderline = new Underline { Val = UnderlineValues.Single };





wordprocessing.Run newRun = new wordprocessing.Run();

wordprocessing.Text newText = new wordprocessing.Text("Text for the new paragraph");


In order to make a paragraph or run bold you append a Bold object instead of or as well as the Underline object.

If you have an existing paragraph that you can steal from the code gets a lot easier.  The code below does just that by cloning the ParagraphProperties from the existing paragraph and appending it to the new paragraph.

ParagraphProperties properties = (ParagraphProperties)oldParagraph.ParagraphProperties.Clone();


Of course which approach you take will depend on what you situation is.  If you are allowing a user to define formatting through an interface other than a template you will have to use the first example.  This is one more reason to use a base file as a template.  Either way you should be able to accomplish the goal of formatting your text.

How Does Simple Text Markup Differ Across The Office 2007 Suite

February 4, 2010


Our theme recently is things that need to be made more consistent in the office products in order to make document generation development more efficient for developers.  This time around we will focus on difference between the way text is marked up in Word and PowerPoint.

I have found that there are a number of subtle but important differences in the way text is written to the Open XML standard.  This is then reflected in the Open XML SDK’s API.

Examples of these differences are apparent in features such as text color, bolding, underlining and bulleted lists.  The main difference seems that the Word team seems to have taken a more object approach and conversely the PowerPoint team seems to favor attributes.  The result is that the PowerPoint definition ends up with more moving parts.  To illustrate this let’s take a look at the setting of text color and underlining a group of text.

Text color is handled in Word simply by applying a Color object to the run properties.  PowerPoint requires a SolidFill object and child SchemeColor and LuminanceModulation objects to get the same effect.

The differences in the way that text is bolded is very similar.  In Word the run is assigned a bold object, but PowerPoint it is a boolean attribute of the run properties.

So what is the impact to the developer.  Code reuse for those of us who have to generate both documents and presentations is next to nothing.  On top of that the learning curve is practically doubled.

I realize that the two products have evolved through separate paths and isolated team, but time for that type of development is long past.  The Open XML standard should be unified where ever possible across the Office Applications and allow for greater interaction between the products.  Ultimately the synchronizing of these tools will lead to greater adoption.

What Makes CustomXml More appealing Than Content Controls

February 3, 2010

Word 2007 has two built-in methods for tagging content.  If you go to the developers tab you will find the ribbon has a section for Controls and a section for XML.  The Controls are also referred to as Content Controls.  The XML section allows you to define schemas that can be applied to your document and is sometimes called Custom XML.

Both of these constructs can be used when you are coding an application which needs to identify a part of a document and take some action on it.  The Content Controls are represented by SdtBlocks and SdtRuns in the Open XML SDK 2.  Where as, the Custom XML is represented by CustomXMLBlocks and CustomXMLRuns.

The Content Controls made for a lot of confusion when I first investigated using them since there are multiple types.  For my purposes only one really makes sense and that is the Rich Text.  It can be applied to any area of a document regardless of content.  Even with this capability I found that Content Controls didn’t fit my needs.

I realize that Microsoft views Custom Controls as more reliable because they are not tied to a schema that defines an strict document, but that isn’t how we use the schema.  Our schemas are a list of custom identifiers which the users can reliably apply to a document.

As an example, the name of a company may need to be inserted 100 times in random locations throughout a document.  This doesn’t fit a tightly structured schema, but it is also a risky to have the person editing the template enter an identifier by hand for each Content Control.

Accurate naming is just as important if not more important than identifying the type of objects contained in a tag when processing a template.  Add to this the fact that you may need to identify sections by business usage and can contain multiple child objects such as charts, tables, images and paragraphs.

Between law suites and a change in approach by Microsoft it seems that Custom XML is going away.  Microsoft needs to supply more guidance as to what approaches should be taken for marking up templates in a consistent fashion.  They also need to come up with a solution for what I see as major a major weakness of Content Controls for this type of usage. 

Lastly, both features are only available in Word.  We do more template work with PowerPoint than we do Word which puts our development at a disadvantage.  People want this data to “present” and therefore need be able to generate “presentations”.  Getting similar functionality in PowerPoint is essential.  Hopefully we will see these improvements going forward.