Performance Warning:
Dataframe is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance.
Consider joining all columns at once using pd.concat(axis=1)
instead. To get a de-fragmented frame, use newframe = frame.copy()
.
Let’s understand the warning in detail with an example.
Imagine you have a dataframe called frame
with the following columns:
column_A | column_B | column_C ------------------------------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9
If you need to insert a new column column_D
in between column_B
and column_C
,
you might be tempted to use the frame.insert
method like this:
frame.insert(2, "column_D", [10, 11, 12])
While this works, it can lead to a highly fragmented dataframe when called repeatedly. Thus, it is not efficient for performance.
Instead, a better approach is to join all columns at once using pd.concat(axis=1)
. Here’s an example:
newframe = pd.concat([frame, pd.Series([10, 11, 12], name="column_D")], axis=1)
This way, the dataframe stays intact without any fragmentation issues.
In summary, the warning suggests avoiding frequent use of frame.insert
to maintain good performance.
Instead, utilize pd.concat(axis=1)
to join all columns at once and create a de-fragmented dataframe.
Remember to make a copy of the original frame if necessary using newframe = frame.copy()
.