Getting valid time spans from timestamped data using Python Pandas.

Assume we have a Panda's DataFrame indexed by a time.
2017-12-15 00:00:00.....
2017-12-15 00:00:15.....
2017-12-15 00:00:29.....
2017-12-15 00:00:46.....
2017-12-15 03:31:01.....
2017-12-15 03:31:17.....
2017-12-15 03:31:29.....
2017-12-15 03:32:16.....
And if we want to get time spans from the data under condition that if rows' intervals is less than threashold, make it one span.
2017-12-15 00:00:002017-12-15 00:00:46
2017-12-15 03:31:012017-12-15 03:32:16
you can do it by
import pandas as pd
from datetime import timedelta

threashold = timedelta(seconds=30)

df = pd.read_csv("original_data.csv")
intervals = df_origin.index[1:] - df_origin.index[:-1]
df[HEAD][1:] = intervals > threshold
df[TAIL][:-1] = intervals > threashold
df[HEAD][0] = True
df[TAIL][-1] = True

_df = df[df["HEAD"] ^ df["TAIL"]]

result = pd.DataFrame({"start":_df[_df["HEAD"]].index, "end":_df[_df["TAIL"]].index}, columns=["start","end"])
Point is to mark rows after long interval "Head" and rows before long interval "TAIL", then remove both false (interval data) or both true (isolated data).


Popular posts from this blog

Subclassing and Signal connect on a same widget crashes PySide application on exit.

Calling OpenCV functions via Cython from Python 3.X.

Showing CPU/Memory usage on tmux status bar(tmuxのステータスバーにCPUとMemoryの使用状況を表示する)