Getting valid time spans from timestamped data using Python Pandas.
Assume we have a Panda's DataFrame indexed by a time.
And if we want to get time spans from the data under condition that if rows' intervals is less than threashold, make it one span.
2017-12-15 00:00:00 ..... 2017-12-15 00:00:15 ..... 2017-12-15 00:00:29 ..... 2017-12-15 00:00:46 ..... 2017-12-15 03:31:01 ..... 2017-12-15 03:31:17 ..... 2017-12-15 03:31:29 ..... 2017-12-15 03:32:16 .....
you can do it by
start end 2017-12-15 00:00:00 2017-12-15 00:00:46 2017-12-15 03:31:01 2017-12-15 03:32:16
import pandas as pd from datetime import timedelta threashold = timedelta(seconds=30) df = pd.read_csv("original_data.csv") intervals = df_origin.index[1:] - df_origin.index[:-1] df[HEAD][1:] = intervals > threshold df[TAIL][:-1] = intervals > threashold df[HEAD][0] = True df[TAIL][-1] = True _df = df[df["HEAD"] ^ df["TAIL"]] result = pd.DataFrame({"start":_df[_df["HEAD"]].index, "end":_df[_df["TAIL"]].index}, columns=["start","end"])Point is to mark rows after long interval "Head" and rows before long interval "TAIL", then remove both false (interval data) or both true (isolated data).
Comments