Find most recent observation

i have 2 sets of (sorted) posixct time series this:

set.seed(123) ll = sort(strptime("16/07/2015", format="%d/%m/%y") + 10*3600 + 1:3600 + round(rnorm(3600), digits=3)) tt = sort(strptime("16/07/2015", format="%d/%m/%y") + 10.2*3600 + 1:180*10 + round(rnorm(180), digits=3)) tplus = 0:180

where ll in reality has 10^5 observations, tt 10^3 - 10^4 , tplus has length 10^3. tt construct matrix of timestamps tt1 adding tplus each observation in tt:

tt1 = t(sapply(tt, function(x) x+tplus))

for each of these timestamps want know recent observation of ll (as index of ll). can calculate as:

tt2 = apply(tt1, c(1,2), function(x) max(which(ll <= x)))

but slow , have kind of calculation 10^3 times how can speed up? given ll sorted , and tt1 sorted both along columns , rows hoping might exist.

here in data:

> head(ll) [1] "2015-07-16 10:00:00.440 cest" "2015-07-16 10:00:01.769 cest" "2015-07-16 10:00:04.071 cest" "2015-07-16 10:00:04.559 cest" [5] "2015-07-16 10:00:05.128 cest" "2015-07-16 10:00:06.734 cest" > head(tt1[,1:4])            [,1]       [,2]       [,3]       [,4] ... [1,] 1437034330 1437034331 1437034332 1437034333 [2,] 1437034341 1437034342 1437034343 1437034344 [3,] 1437034350 1437034351 1437034352 1437034353 [4,] 1437034359 1437034360 1437034361 1437034362 [5,] 1437034371 1437034372 1437034373 1437034374 [6,] 1437034381 1437034382 1437034383 1437034384

and expected output:

> head(tt2)      [,1] [,2] [,3] [,4] ... [1,]  729  729  731  732 [2,]  740  741  742  743 [3,]  748  749  751  752 [4,]  759  760  760  762 [5,]  770  772  773  774 [6,]  780  781  783  785

just use findinterval:

array(findinterval(tt1,ll),dim(tt1)) #head(array(findinterval(tt1,ll),dim(tt1))[,1:4])      #     [,1] [,2] [,3] [,4] #[1,]  729  729  731  732 #[2,]  740  741  742  743 #[3,]  748  749  751  752 #[4,]  759  760  760  762 #[5,]  770  772  773  774 #[6,]  780  781  783  785

WIKI

Search This Blog

Find most recent observation - R -

Comments

Post a Comment