Getting valid time spans from timestamped data using Python Pandas.

Assume we have a Panda's DataFrame indexed by a time.
2017-12-15 00:00:00.....
2017-12-15 00:00:15.....
2017-12-15 00:00:29.....
2017-12-15 00:00:46.....
2017-12-15 03:31:01.....
2017-12-15 03:31:17.....
2017-12-15 03:31:29.....
2017-12-15 03:32:16.....
And if we want to get time spans from the data under condition that if rows' intervals is less than threashold, make it one span.
startend
2017-12-15 00:00:002017-12-15 00:00:46
2017-12-15 03:31:012017-12-15 03:32:16
you can do it by
import pandas as pd
from datetime import timedelta

threashold = timedelta(seconds=30)

df = pd.read_csv("original_data.csv")
intervals = df_origin.index[1:] - df_origin.index[:-1]
df[HEAD][1:] = intervals > threshold
df[TAIL][:-1] = intervals > threashold
df[HEAD][0] = True
df[TAIL][-1] = True

_df = df[df["HEAD"] ^ df["TAIL"]]

result = pd.DataFrame({"start":_df[_df["HEAD"]].index, "end":_df[_df["TAIL"]].index}, columns=["start","end"])
Point is to mark rows after long interval "Head" and rows before long interval "TAIL", then remove both false (interval data) or both true (isolated data).

Comments

Popular posts from this blog

Calling OpenCV functions via Cython from Python 3.X.

Showing CPU/Memory usage on tmux status bar(tmuxのステータスバーにCPUとMemoryの使用状況を表示する)

Subclassing and Signal connect on a same widget crashes PySide application on exit.