Dec 17, 2017

Getting valid time spans from timestamped data using Python Pandas.

Assume we have a Panda's DataFrame indexed by a time.
2017-12-15 00:00:00.....
2017-12-15 00:00:15.....
2017-12-15 00:00:29.....
2017-12-15 00:00:46.....
2017-12-15 03:31:01.....
2017-12-15 03:31:17.....
2017-12-15 03:31:29.....
2017-12-15 03:32:16.....
And if we want to get time spans from the data under condition that if rows' intervals is less than threashold, make it one span.
startend
2017-12-15 00:00:002017-12-15 00:00:46
2017-12-15 03:31:012017-12-15 03:32:16
you can do it by

import pandas as pd
from datetime import timedelta

threashold = timedelta(seconds=30)

df = pd.read_csv("original_data.csv")
intervals = df_origin.index[1:] - df_origin.index[:-1]
df[HEAD][1:] = intervals > threshold
df[TAIL][:-1] = intervals > threashold
df[HEAD][0] = True
df[TAIL][-1] = True

_df = df[df["HEAD"] ^ df["TAIL"]]

result = pd.DataFrame({"start":_df[_df["HEAD"]].index, "end":_df[_df["TAIL"]].index}, columns=["start","end"])
Point is to mark rows after long interval "Head" and rows before long interval "TAIL", then remove both false (interval data) or both true (isolated data).

No comments: