Getting valid time spans from timestamped data using Python Pandas.

December 17, 2017

Assume we have a Panda's DataFrame indexed by a time.

2017-12-15 00:00:00 .....

2017-12-15 00:00:15 .....

2017-12-15 00:00:29 .....

2017-12-15 00:00:46 .....

2017-12-15 03:31:01 .....

2017-12-15 03:31:17 .....

2017-12-15 03:31:29 .....

2017-12-15 03:32:16 .....

And if we want to get time spans from the data under condition that if rows' intervals is less than threashold, make it one span.

start end
2017-12-15 00:00:00 2017-12-15 00:00:46

2017-12-15 03:31:01 2017-12-15 03:32:16

you can do it by

import pandas as pd
from datetime import timedelta

threashold = timedelta(seconds=30)

df = pd.read_csv("original_data.csv")
intervals = df_origin.index[1:] - df_origin.index[:-1]
df[HEAD][1:] = intervals > threshold
df[TAIL][:-1] = intervals > threashold
df[HEAD][0] = True
df[TAIL][-1] = True

_df = df[df["HEAD"] ^ df["TAIL"]]

result = pd.DataFrame({"start":_df[_df["HEAD"]].index, "end":_df[_df["TAIL"]].index}, columns=["start","end"])

Point is to mark rows after long interval "Head" and rows before long interval "TAIL", then remove both false (interval data) or both true (isolated data).

Search This Blog

Maker wannabe.

Getting valid time spans from timestamped data using Python Pandas.

Comments

Popular posts from this blog

Calling OpenCV functions via Cython from Python 3.X.

Showing CPU/Memory usage on tmux status bar(tmuxのステータスバーにCPUとMemoryの使用状況を表示する)

Using cheap Intel ComputeStick STCK1A8LFC with Lubuntu 18.04

2017-12-15 00:00:00	.....
2017-12-15 00:00:15	.....
2017-12-15 00:00:29	.....
2017-12-15 00:00:46	.....
2017-12-15 03:31:01	.....
2017-12-15 03:31:17	.....
2017-12-15 03:31:29	.....
2017-12-15 03:32:16	.....

start	end
2017-12-15 00:00:00	2017-12-15 00:00:46
2017-12-15 03:31:01	2017-12-15 03:32:16