Category Archives: Pandas

Python Module & Notebook to Flatten MS Project and Create Excel Output

In this quick post I am using the function demonstrated in this post.  I have wrapped it into a module and used it in a notebook to read a MS Project file, then output the results of the flattened MS Project to excel workbooks which I then formatted using a quick and dirty self made excel formatting module (code can be found here).

The input MS Project looks like this:

MSProject

And produces out like this:



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

 

Flatten Microsoft Project to Pandas DataFrame

This is an example of a function I use a lot.  The scenario is that many team members that need to see their tasks in MS Project either do not have MS Project and/or don’t want to use MS Project but are quite happy to use excel spreadsheets.  I use this function, alongside a couple of others that I will publish later, to “Flatten” an MS Project file, place the contents in a Python Pandas DataFrame, manipulate the Pandas DataFrame to get subsets of tasks I want to publish and output these to excel (typically) or to word or PDF.  By “Flatten” I mean to collapse the summary tasks above a single to one line as follows:

MSP_snag_1

MS Project File

…to Pandas Table….

Pandas_table_1

The following Jupyter Notebook performed this transformation:



Loading

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

It looks like a big file but only because I have a number of comments added.  Let’s break it down.

I won’t go over the first three notebook cells, I would be repeating earlier blogs on using win32com and generation of constants.

Let’s get to the function “create_project_data_frame” and the points to note:

  • Function Arguments
    • I have explained these in the notebook above
  • Initialise some variables to used by the function
    • projectDataFrame = pd.DataFrame(columns=headers)
      • Creates an empty Pandas DataFrame with column headers (from the KeyWord “header”).  There are some mandatory headers:  I see these as outputs required from any Project file.
        • UniqueID
          • Note I have used UniqueID rather than task ID.  That’s because they never change
        • SummaryTask
          • This is the header where the summary level tasks to the task being output are added e.g. “Level 1 > Level 2 > Level 3”
        • Name
          • Name of the task
        • Start“,
        • Finish“,
        • % Complete
      • Other headers can be added but you must use the field names.  IN the example above I added
        "Resource Names", "Notes", "Predecessors", "Text1"
    • summary_tasks_to_task = []
      • This list stores the current summary level e.g. [“Level 1”, “Level 2”, “Level 3]
    • task_collection= project.Tasks
      • Tasks is a collection object from the project application.  We use it to hold all the tasks in the projects that we then iterate over.
  • We then iterate over the collection of tasks with “for t in task_collection:
  • We only want to capture information on tasks, not summary tasks. “if (not t.Summary) & ~(t.UniqueID in UniqueIDs_to_Ignore). On occassions I need to ignore some tasks.  The function is provide a list of UniqueIDs to ignore, hence “~(t.UniqueID in UniqueIDs_to_Ignore)“.
  • Collecting Dependencies:

dependencies

    • This steps through the TaskDependencies collection and appends them to a list called dep
  • Similarly, a task can have many resources associated with it and this routine iterates through the object “Assignments” and appends the resources to a list called “res

Resources

  • Sometimes tasks mid way through a project have no summary tasks.  Not good, but it happens.  This routine clears the summary_task_to_task list to empty

No_Summary

  • This code then starts the creation of a temporary list of strings that will create a task row in the pandas dataframe:

OutputTask_1

  • This code then iterates over the list of headers requested and appends the values to the temp list above.  The last bit of code appends the string to the pandas dataframe

OutputTask_2.PNG

  • The next code handles what to do with a task that is a summary task (t.Summary).  If the summary task is at a level greater than the previous is appended to the list summary_tasks_to_task, if it is smaller the last summary task is popped from the list.

OutlineSummary

  • The last bit of code in this notebook cell sets the index of the dataframe to the UniqueID and then converts the Finish and Start columns to datatime types (rather than text).  The dataframe is then returned.

ClosingRoutine

  • The last cell create the headers list, creates is of UniqueIDs to be ignored and then calls the routine above and assigns the returned Pandas DataFrame to the variable “frame

Calling the Function

Once you have a Pandas data it is easy to manipulate the data and create alternative dataframes with subsets of data that can be output to excel spreadsheets.  For example I produce filtered dataframe by resource name then create a workbook for each resource that contains only their tasks.  It is possible to create separate worksheets for each day or week or month.