Performancewarning: dataframe is highly fragmented. this is usually the result of calling `frame.insert` many times, which has poor performance. consider joining all columns at once using pd.concat(axis=1) instead. to get a de-fragmented frame, use `newframe = frame.copy()`

Performance Warning:
Dataframe is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance.
Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy().

Let’s understand the warning in detail with an example.

Imagine you have a dataframe called frame with the following columns:

     column_A | column_B | column_C
    -------------------------------
        1     |    2     |   3
        4     |    5     |   6
        7     |    8     |   9
  

If you need to insert a new column column_D in between column_B and column_C,
you might be tempted to use the frame.insert method like this:

    frame.insert(2, "column_D", [10, 11, 12])
  

While this works, it can lead to a highly fragmented dataframe when called repeatedly. Thus, it is not efficient for performance.
Instead, a better approach is to join all columns at once using pd.concat(axis=1). Here’s an example:

    newframe = pd.concat([frame, pd.Series([10, 11, 12], name="column_D")], axis=1)
  

This way, the dataframe stays intact without any fragmentation issues.

In summary, the warning suggests avoiding frequent use of frame.insert to maintain good performance.
Instead, utilize pd.concat(axis=1) to join all columns at once and create a de-fragmented dataframe.
Remember to make a copy of the original frame if necessary using newframe = frame.copy().

Same cateogry post

Leave a comment