Flatten Microsoft Project to Pandas DataFrame

This is an example of a function I use a lot.  The scenario is that many team members that need to see their tasks in MS Project either do not have MS Project and/or don’t want to use MS Project but are quite happy to use excel spreadsheets.  I use this function, alongside a couple of others that I will publish later, to “Flatten” an MS Project file, place the contents in a Python Pandas DataFrame, manipulate the Pandas DataFrame to get subsets of tasks I want to publish and output these to excel (typically) or to word or PDF.  By “Flatten” I mean to collapse the summary tasks above a single to one line as follows:

MSP_snag_1

MS Project File

…to Pandas Table….

Pandas_table_1

The following Jupyter Notebook performed this transformation:



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

It looks like a big file but only because I have a number of comments added.  Let’s break it down.

I won’t go over the first three notebook cells, I would be repeating earlier blogs on using win32com and generation of constants.

Let’s get to the function “create_project_data_frame” and the points to note:

  • Function Arguments
    • I have explained these in the notebook above
  • Initialise some variables to used by the function
    • projectDataFrame = pd.DataFrame(columns=headers)
      • Creates an empty Pandas DataFrame with column headers (from the KeyWord “header”).  There are some mandatory headers:  I see these as outputs required from any Project file.
        • UniqueID
          • Note I have used UniqueID rather than task ID.  That’s because they never change
        • SummaryTask
          • This is the header where the summary level tasks to the task being output are added e.g. “Level 1 > Level 2 > Level 3”
        • Name
          • Name of the task
        • Start“,
        • Finish“,
        • % Complete
      • Other headers can be added but you must use the field names.  IN the example above I added
        "Resource Names", "Notes", "Predecessors", "Text1"
    • summary_tasks_to_task = []
      • This list stores the current summary level e.g. [“Level 1”, “Level 2”, “Level 3]
    • task_collection= project.Tasks
      • Tasks is a collection object from the project application.  We use it to hold all the tasks in the projects that we then iterate over.
  • We then iterate over the collection of tasks with “for t in task_collection:
  • We only want to capture information on tasks, not summary tasks. “if (not t.Summary) & ~(t.UniqueID in UniqueIDs_to_Ignore). On occassions I need to ignore some tasks.  The function is provide a list of UniqueIDs to ignore, hence “~(t.UniqueID in UniqueIDs_to_Ignore)“.
  • Collecting Dependencies:

dependencies

    • This steps through the TaskDependencies collection and appends them to a list called dep
  • Similarly, a task can have many resources associated with it and this routine iterates through the object “Assignments” and appends the resources to a list called “res

Resources

  • Sometimes tasks mid way through a project have no summary tasks.  Not good, but it happens.  This routine clears the summary_task_to_task list to empty

No_Summary

  • This code then starts the creation of a temporary list of strings that will create a task row in the pandas dataframe:

OutputTask_1

  • This code then iterates over the list of headers requested and appends the values to the temp list above.  The last bit of code appends the string to the pandas dataframe

OutputTask_2.PNG

  • The next code handles what to do with a task that is a summary task (t.Summary).  If the summary task is at a level greater than the previous is appended to the list summary_tasks_to_task, if it is smaller the last summary task is popped from the list.

OutlineSummary

  • The last bit of code in this notebook cell sets the index of the dataframe to the UniqueID and then converts the Finish and Start columns to datatime types (rather than text).  The dataframe is then returned.

ClosingRoutine

  • The last cell create the headers list, creates is of UniqueIDs to be ignored and then calls the routine above and assigns the returned Pandas DataFrame to the variable “frame

Calling the Function

Once you have a Pandas data it is easy to manipulate the data and create alternative dataframes with subsets of data that can be output to excel spreadsheets.  For example I produce filtered dataframe by resource name then create a workbook for each resource that contains only their tasks.  It is possible to create separate worksheets for each day or week or month.

12 thoughts on “Flatten Microsoft Project to Pandas DataFrame

  1. Pingback: Python Module & Notebook to Flatten MS Project and Create Excel Output | Project Managers Toolkit

  2. taw

    I am getting this error. after running the code
    # Generate Constants
    MSP= “Microsoft Project 14.0 Object Library”
    where = “./MSPRJ.py”
    from win32com.client import makepy
    import sys
    sys.argv=[”,”-o”,where, MSP]
    makepy.main()
    import MSPRJ
    g = globals()
    for c in dir(MSPRJ.constants):
    g[c] = getattr(MSPRJ.constants,c)
    Could not locate a type library matching ‘Microsoft Project 14.0 Object Library’
    —————————————————————————
    AttributeError Traceback (most recent call last)
    in
    10 import MSPRJ
    11 g = globals()
    —> 12 for c in dir(MSPRJ.constants):
    13 g[c] = getattr(MSPRJ.constants,c)
    AttributeError: module ‘MSPRJ’ has no attribute ‘constants’

    Like

    Reply
      1. taw

        that is the file path for my start up of ms project
        C:\ProgramData\Microsoft\Windows\Start Menu\Programs
        it was installed as part of office 365 suite.Version 2016.

        Like

      2. Eric Garlic Post author

        hi, I’m not at my pc but i hope this helps.

        1. you need to own a copy of ms project and have it installed.

        2. if you do, then you need to check the object library reference. the image looks like this:

        https://goo.gl/images/FhKyST

        i usually do this via the developer tab. there maybe other ways. trying googling.

        Like

  3. Tim Newman

    This is ace thanks!

    I think in 2016 there is now just one common office reference. I.e:
    MSP= “Microsoft Project 14.0 Object Library” -> MSP= “Microsoft Office 16.0 Object Library”

    Still getting an error on the constants bit though. There is certainly a class constants in the MSPRJ.py file. Any thoughts?

    Like

    Reply
  4. Tim Newman

    AHHHHH Found it. Turns out my project install is corrupted and I have “Microsoft Project . Object Library” which, as you might imagine, it doesn’t like 🙂

    Like

    Reply
      1. Tim Newman

        Ha thanks. I solved *that* issue. They changed a bunch of stuff in 2019.

        Turns out that that is the correct library for 2019 but pywin32 doesn’t seem to be able to reference it the same way.

        I managed to get the makepy to work, I think, by referencing the file direct but now the Dispatcher won’t work. Because it can’t run makepy. I can’t work out how to get it to reference the py I’ve created. If you have any tips, great, if not I’ll let you know if/when I get it working.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s