5👍
✅
First we use Series.shift
with Series.cumsum
to make a group indicator for each consecutive label
value.
Then we use groupby.agg
with min
and max
.
label_groups = bins_df['label'].ne(bins_df['label'].shift()).cumsum()
df = (
bins_df.groupby(label_groups).agg({'start':'min', 'end':'max', 'label':'first'})
.reset_index(drop=True)
)
start end label
0 2020-01-12 00:00:00 2020-01-13 00:00:00 t3
1 2020-01-13 00:00:00 2020-01-13 07:00:00 t2
2 2020-01-13 07:00:00 2020-01-13 15:30:00 t1
3 2020-01-13 15:30:00 2020-01-14 07:00:00 t2
4 2020-01-14 07:00:00 2020-01-14 15:30:00 t1
5 2020-01-14 15:30:00 2020-01-15 07:00:00 t2
6 2020-01-15 07:00:00 2020-01-15 15:30:00 t1
7 2020-01-15 15:30:00 2020-01-16 07:00:00 t2
8 2020-01-16 07:00:00 2020-01-16 15:30:00 t1
9 2020-01-16 15:30:00 2020-01-17 07:00:00 t2
10 2020-01-17 07:00:00 2020-01-17 15:30:00 t1
11 2020-01-17 15:30:00 2020-01-19 00:00:00 t2
Source:stackexchange.com