Author Archives: Eric Garlic

Python Module & Notebook to Flatten MS Project and Create Excel Output

In this quick post I am using the function demonstrated in this post.  I have wrapped it into a module and used it in a notebook to read a MS Project file, then output the results of the flattened MS Project to excel workbooks which I then formatted using a quick and dirty self made excel formatting module (code can be found here).

The input MS Project looks like this:


And produces out like this:


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.


Flatten Microsoft Project to Pandas DataFrame

This is an example of a function I use a lot.  The scenario is that many team members that need to see their tasks in MS Project either do not have MS Project and/or don’t want to use MS Project but are quite happy to use excel spreadsheets.  I use this function, alongside a couple of others that I will publish later, to “Flatten” an MS Project file, place the contents in a Python Pandas DataFrame, manipulate the Pandas DataFrame to get subsets of tasks I want to publish and output these to excel (typically) or to word or PDF.  By “Flatten” I mean to collapse the summary tasks above a single to one line as follows:


MS Project File

…to Pandas Table….


The following Jupyter Notebook performed this transformation:


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

It looks like a big file but only because I have a number of comments added.  Let’s break it down.

I won’t go over the first three notebook cells, I would be repeating earlier blogs on using win32com and generation of constants.

Let’s get to the function “create_project_data_frame” and the points to note:

  • Function Arguments
    • I have explained these in the notebook above
  • Initialise some variables to used by the function
    • projectDataFrame = pd.DataFrame(columns=headers)
      • Creates an empty Pandas DataFrame with column headers (from the KeyWord “header”).  There are some mandatory headers:  I see these as outputs required from any Project file.
        • UniqueID
          • Note I have used UniqueID rather than task ID.  That’s because they never change
        • SummaryTask
          • This is the header where the summary level tasks to the task being output are added e.g. “Level 1 > Level 2 > Level 3”
        • Name
          • Name of the task
        • Start“,
        • Finish“,
        • % Complete
      • Other headers can be added but you must use the field names.  IN the example above I added
        "Resource Names", "Notes", "Predecessors", "Text1"
    • summary_tasks_to_task = []
      • This list stores the current summary level e.g. [“Level 1”, “Level 2”, “Level 3]
    • task_collection= project.Tasks
      • Tasks is a collection object from the project application.  We use it to hold all the tasks in the projects that we then iterate over.
  • We then iterate over the collection of tasks with “for t in task_collection:
  • We only want to capture information on tasks, not summary tasks. “if (not t.Summary) & ~(t.UniqueID in UniqueIDs_to_Ignore). On occassions I need to ignore some tasks.  The function is provide a list of UniqueIDs to ignore, hence “~(t.UniqueID in UniqueIDs_to_Ignore)“.
  • Collecting Dependencies:


    • This steps through the TaskDependencies collection and appends them to a list called dep
  • Similarly, a task can have many resources associated with it and this routine iterates through the object “Assignments” and appends the resources to a list called “res


  • Sometimes tasks mid way through a project have no summary tasks.  Not good, but it happens.  This routine clears the summary_task_to_task list to empty


  • This code then starts the creation of a temporary list of strings that will create a task row in the pandas dataframe:


  • This code then iterates over the list of headers requested and appends the values to the temp list above.  The last bit of code appends the string to the pandas dataframe


  • The next code handles what to do with a task that is a summary task (t.Summary).  If the summary task is at a level greater than the previous is appended to the list summary_tasks_to_task, if it is smaller the last summary task is popped from the list.


  • The last bit of code in this notebook cell sets the index of the dataframe to the UniqueID and then converts the Finish and Start columns to datatime types (rather than text).  The dataframe is then returned.


  • The last cell create the headers list, creates is of UniqueIDs to be ignored and then calls the routine above and assigns the returned Pandas DataFrame to the variable “frame

Calling the Function

Once you have a Pandas data it is easy to manipulate the data and create alternative dataframes with subsets of data that can be output to excel spreadsheets.  For example I produce filtered dataframe by resource name then create a workbook for each resource that contains only their tasks.  It is possible to create separate worksheets for each day or week or month.

Reading and Interpreting MSDN Object Model Reference

My initial months working with win32com saw me searching and reading MSDN Object Model References, reading other peoples win32com python code, and trawling VBA snippets whilst bashing away in a python IDE trying to get the function I was after. So here are my tips in reading the MSDN object references with the intent of replicating the functionality using win32com in Python.

The structure is roughly:



Let’s start simple and working from where this article left off but with constants.

Step 1: Import modules and constants

import win32com.client
from MSOconstants.MSO import constants as MSO
from MSOconstants.MSPPT import constants as MSPPT

Step 2: Create and set-up references to a powerpoint application, presentation, slide, add a diamond and save the presentation.

pptApp= win32com.client.gencache.EnsureDispatch (“Powerpoint.Application”)
presentation = pptApp.Presentations.Add()
slide = presentation.Slides.Add(1, MSPPT.ppLayoutBlank)
myDiamond = slide.Shapes.AddShape(MSO.msoShapeDiamond, Top=100,Left=100, Width=20, Height=20)
presentation.SaveAs(“_myPowerPoint”, MSPPT.ppSaveAsPresentation)

So far so good. I have a powerpoint slide with diamond thus:


Let’s say I want to add a bar to the slide that represents timeline between two dates such that I can place myDiamond on the timeline at a place that represents that point in time along the timeline; say 1/4 along the timeline.

I need to:

  • Determine Slide Dimension
  • Draw a box the width of the slide using the slide dimension
  • place myDiamond 1/4 away along the box

Step 3: Determine Slide Dimension

A slide is dimensioned in pixels. I want to get the dimensions of my slide. I google “msdn powerpoint slide width” and get this link as the first entry.


So we need an object that represents the presentation “Application.ActivePresentation” so that we can call the PageSetup objects SlideWidth property. In our case we have a reference to the presentation in the variable “presentation“. So we call the presentation.PageSetup.SlideWidth property


for me it is 960 pixels.

Note: we can’t use “with” statements in Python.

Step 4: Draw a box the width of the slide using the slide dimension

I google “msdn powerpoint add shape” and get this link first


And hyperlink off to MsoAutoShapeType ctrl+f to find msoShapeRoundedRectangle


create a rectangle from left=0, 50 pixels down from the top, 960 pixels wide and 10 pixels in height:

RRect = slide.Shapes.AddShape(MSO.msoShapeRoundedRectangle, 0, 50, 960, 10)


Step 5: Place myDiamond 1/4 away along the box

Google “msdn powerpoint shape position” and get this first link

scroll down the page and you’ll see that it has a property called “left” and “top”

I want myDiamond 1/4 along 960 pixels and in the centre of 10 pixels high box that is 50 pixels down.

myDiamond.Top = 50-10/2




Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

How to Generate Python Modules of MS Office Constants

I spent ages trying to work out how to create a .py module that contained the MS Office enumeration constants as referenced in this article s-anands-anand-1

I solved it with reference to this entry in stackoverflow

The snippet below creates four python module for each of Microsoft Office, Excel, Word and Powerpoint in the “MSOconstants” directory of my Jupyter Lab notebook location (I’m using Jupyter Lab notebooks.  You can reference any folder location of your choice).


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

I hope this save you some time and frustration.

Note:  if you have Microsoft Project then you can create the equivalent by referencing “Microsoft Office 14.0 Object Library” (14.0 is the version I happen to have, you may have a different version).  This is important as in a later blog I will outlining how to use win32com with MS Project.

Using win32com To Work with Microsoft Office Applications

Requisites: Installing win32com

References:  Comprehensive Documentation on win32Com  This is a great reference but takes some reading

The following code will open a new PowerPoint document, add a diamond and save it.

import win32com.client
pptApp= win32com.client.gencache.EnsureDispatch (“Powerpoint.Application”)
presentation = pptApp.Presentations.Add()
slide = presentation.Slides.Add(1, 12)
myDiamond = slide.Shapes.AddShape(4, Top=100,Left=100, Width=20, Height=20)

So let’s break this down:

import win32com.client

Here we are importing the key module that will allow us to access Microsoft Office applications through the COM object. 

But what is a COM object?  It stands for Component Object Model, which, to me at least, means it gives , through an interface (COM object), to the Microsoft Office applications and their associated methods and attributes.

pptApp = win32com.client.gencache.EnsureDispatch (“Powerpoint.Application”)

In this line we are creating a reference (“pptApp“, you can call it whatever you want) to the powerpoint application

presentation = pptApp.Presentations.Add()

This line adds a presentation to the Presentations collection.

Slide = presentation.Slides.Add(1, 12)

This adds a slide to the Slides collection of the presentation.

The numbers  1 & 12 refer the slide index and layout respectively; in this example index=1 and layout = “blank”. 

The “12” may appear a bit cryptic, and it is.  12 is the number which if you look in MSDN enumeration represents a blank layout: ppslidelayout-enumeration-powerpoint

It is possible to use numbers and “google” the MSDN enumeration however, using constants makes for much readable code.  See this blog for creating a list of constants How to Generate Python Modules of MS Office Constants:

E.G: Slide = presentation.Slides.Add(1, MSOconstants.MSPPT.ppLayoutBlank).  The constant is ppLayoutBlank (the MSOconstants.MSPPT is the folder and python module where I have the constants held)

myDiamond = slide.Shapes.AddShape(4, Top=100,Left=100, Width=20, Height=20)

This lines creates a diamond (enumeration is the number 4; MSOconstants.MSO.msoShapeDiamond), 100 pixles from the top, 100 pixels from the left and 20×20 pixels in size.


This line saves the presentation as “myPowerPoint” and the 1 is enumeration of “powerpoint” format.



There are a number of variations but this works (you can read more here pywin32-docs-win32com-QuickStartClientCom)

See Reference Section for source material I found useful.


The Journey Begins

Thanks for joining me!

This blogging business is hard work, at least harder than I thought is was going to be.  It is only once I started writing my first blog page that I realised how much I knew, took for granted and/or I felt I needed to explain.  I find it helps to have a picture of the reader in mind.  My reader is someone like myself; knows a little something about the Python language, hunts around for quick fixes (and gets frustrated that they can’t remember them from one week to the next) and believes Python and general script automation should be able to address some of the long winded or repetitive tasks encountered work and, specifically in my case, a project manager.

I have no special talents. I am only passionately curious. — Einstein



Let the journey begin…Posts Page