i using python calculate time intervals between 2 events. each event has 'beginning time' , 'ending time.' have found difference between 2 in new column, 'interval', have negative values when beginning , ending time on different days (for instance begin 23:46:00 , end 00:21:00 gives -23:25:00). create if-statement run through 'interval' column , add 24 hours negative values. however, have had problems adding 24 hours 'interval' values. 'interval' dtype=timedelta64[ns].
here little bit of table clarify problem:
calldate beginningtime endingtime interval 75 1/8/2009 1900-01-01 07:49:00 1900-01-01 08:19:00 00:30:00 76 1/11/2009 1900-01-01 14:37:00 1900-01-01 14:59:00 00:22:00 77 1/9/2009 1900-01-01 09:29:00 1900-01-01 09:56:00 00:27:00 78 1/11/2009 1900-01-01 09:20:00 1900-01-01 10:13:00 00:53:00 79 1/16/2009 1900-01-01 15:11:00 1900-01-01 15:50:00 00:39:00 80 1/17/2009 1900-01-01 22:52:00 1900-01-01 23:26:00 00:34:00 81 1/19/2009 1900-01-01 05:48:00 1900-01-01 06:32:00 00:44:00 82 1/20/2009 1900-01-01 23:46:00 1900-01-01 00:21:00 -23:25:00 83 1/20/2009 1900-01-01 21:29:00 1900-01-01 22:08:00 00:39:00 84 1/23/2009 1900-01-01 07:33:00 1900-01-01 07:55:00 00:22:00 85 1/30/2009 1900-01-01 19:33:00 1900-01-01 20:01:00 00:28:00
update: here code had led me point
df['beginningtime']=pd.to_datetime(df['beginningtime'], format='%h:%m') df['endingtime']=pd.to_datetime(df['endingtime'], format='%h:%m') df['interval']=df['endingtime']-df['beginningtime'] df[['calldate','beginningtime','endingtime','interval']]
if want add 1 day timedelta if negative:
df['interval']=df['interval'].apply(lambda x: x + timedelta(days=1) if x < 0 else x)
if safe make assumption end time within 24 hours, can check see if end time earlier start time , use timedelta add day end time rather interval time.
from datetime import datetime, timedelta d1 = datetime.strptime( "1900-01-01 23:46:00", "%y-%m-%d %h:%m:%s" ) d2 = datetime.strptime( "1900-01-01 00:21:00", "%y-%m-%d %h:%m:%s" ) if d2 < d1: d2 += timedelta(days=1) print d2 - d1 # 0:35:00
with pandas can this:
import pandas pd pandas import timedelta d = { "calldate": [ "1/8/2009", "1/11/2009", "1/9/2009", "1/11/2009", "1/16/2009", "1/17/2009", "1/19/2009", "1/20/2009", "1/20/2009", "1/23/2009", "1/30/2009" ], "beginningtime": [ "1900-01-01 07:49:00", "1900-01-01 14:37:00", "1900-01-01 09:29:00", "1900-01-01 09:20:00", "1900-01-01 15:11:00", "1900-01-01 22:52:00", "1900-01-01 05:48:00", "1900-01-01 23:46:00", "1900-01-01 21:29:00", "1900-01-01 07:33:00", "1900-01-01 19:33:00" ], "endingtime": [ "1900-01-01 08:19:00", "1900-01-01 14:59:00", "1900-01-01 09:56:00", "1900-01-01 10:13:00", "1900-01-01 15:50:00", "1900-01-01 23:26:00", "1900-01-01 06:32:00", "1900-01-01 00:21:00", "1900-01-01 22:08:00", "1900-01-01 07:55:00", "1900-01-01 20:01:00" ] } df = pd.dataframe(data=d) df['beginningtime']=pd.to_datetime(df['beginningtime'], format="%y-%m-%d %h:%m:%s") df['endingtime']=pd.to_datetime(df['endingtime'], format="%y-%m-%d %h:%m:%s") def interval(x): if x['endingtime'] < x['beginningtime']: x['endingtime'] += timedelta(days=1) return x['endingtime'] - x['beginningtime'] df['interval'] = df.apply(interval, axis=1)
in [2]: df out[2]: beginningtime calldate endingtime interval 0 1900-01-01 07:49:00 1/8/2009 1900-01-01 08:19:00 00:30:00 1 1900-01-01 14:37:00 1/11/2009 1900-01-01 14:59:00 00:22:00 2 1900-01-01 09:29:00 1/9/2009 1900-01-01 09:56:00 00:27:00 3 1900-01-01 09:20:00 1/11/2009 1900-01-01 10:13:00 00:53:00 4 1900-01-01 15:11:00 1/16/2009 1900-01-01 15:50:00 00:39:00 5 1900-01-01 22:52:00 1/17/2009 1900-01-01 23:26:00 00:34:00 6 1900-01-01 05:48:00 1/19/2009 1900-01-01 06:32:00 00:44:00 7 1900-01-01 23:46:00 1/20/2009 1900-01-01 00:21:00 00:35:00 8 1900-01-01 21:29:00 1/20/2009 1900-01-01 22:08:00 00:39:00 9 1900-01-01 07:33:00 1/23/2009 1900-01-01 07:55:00 00:22:00 10 1900-01-01 19:33:00 1/30/2009 1900-01-01 20:01:00 00:28:00