UP | HOME |

Emparejamiento de tablas mediante el método FIFO con R

Emparejamiento de tablas mediante el método FIFO con R

El método FIFO empareja dos tablas de datos a partir de una variable cuantitativa (real o entera). A medida que se cumple el cupo en cada registro, pasa al siguiente registro de la tabla correspondiente de tal forma que se va equilibrando dicha variable cuantitativa. En este ejemplo se presenta una aproximación didáctica para un caso sencillo, pero poco realista para una situación general.

Aquí desarrollamos una función de emparejamiento (merge_fifo) que enlaza dos tablas de datos mediante el método FIFO, válida para casos generales (como por ejemplo, cantidades grandes o no enteras).

suppressMessages(library(data.table))
source(file.path("graphics","merge_fifo.r"))

datos <- fread(text="
time,quantity,value
2021-12-31T15:40:00Z,-59,3099.57
2021-12-29T15:10:00Z,-667,9749.57
2021-12-29T14:51:00Z,-93,3560.98
2021-12-20T16:36:00Z,215,-3997.45
2021-12-20T16:33:00Z,100,-2006.34
2021-12-17T15:14:00Z,-252,8291.5
2021-12-16T19:36:00Z,-12,1006.8
2021-12-15T20:59:00Z,49,-1089.47
2021-12-15T19:40:00Z,70,-2978.58
2021-12-14T19:51:00Z,54,-987.53
2021-12-14T19:05:00Z,354,-1991.66
2021-12-14T19:01:00Z,60,-999.73
2021-12-13T16:48:00Z,160,-2986.36
2021-12-13T16:38:00Z,307,-3009.12
2021-12-13T16:36:00Z,629,-3001.6
2021-12-09T20:35:00Z,412,-3042.66
2021-12-06T14:42:00Z,-189,8725.79
2021-12-06T14:38:00Z,-5,465.18
2021-12-03T15:37:00Z,59,-2994.76
2021-12-03T15:34:00Z,375,-3031.54
2021-11-30T16:35:00Z,384,-1000.94
2021-11-26T17:23:00Z,33,-998.62
2021-11-17T20:04:00Z,93,-3032.84
2021-11-15T16:23:00Z,46,-1974.04
2021-11-15T14:48:00Z,123,-2000.93
2021-11-12T14:56:00Z,5,-447.2
")

buy <- datos[quantity > 0]
sell <- datos[quantity < 0]
result <- merge_fifo(buy,sell,fifo="quantity",orderby="time")
head(result)

Este resultado muestra en primer lugar la columna fifo que indica cuánta cantidad se equilibró en ese registro. El resto de las columnas corresponden a los datos originales de las dos tablas que dieron lugar a ese registro en concreto.

A partir de esta tabla de resultados podemos obtener numerosa información en función de nuestros intereses. Si por ejemplo, queremos calcular el saldo por cada día, creamos unas variables que recojan el día y el beneficio realizado en cada registro y sumamos los beneficios de cada día.

result$date.y <- as.Date(result$time.y)
result$profit <- result$fifo*(result$value.x/result$quantity.x - result$value.y/result$quantity.y)
result[,.(profit=sum(profit)), by=date.y][!is.na(date.y)]

Para los curiosos que no deseen descargar la función desde aquí, les dejo el código de merge_fifo para que lo visualicen.

## +-*- ispell-local-dictionary:"british" -*-
## #+TITLE:
## #+AUTHOR: Emilio Torres Manzanera
## #+DATE: Time-stamp: <2022-01-21 17:10 emilio on emilio-XPS-15-9570>
## #+TAGS:
## #+PROPERTY: header-args :results output :exports both :session

##' Merge two data frames by using a FIFO method.
##'
##' Merge two data frames using the FIFO method.
##' First, the data frames are ordered by a specific column (coltime).
##' Then they are matched according to the quantities
##' of a column (colquantity). If the quantity of register of
##' the left data frame or of the right data frame is
##' used, then create a new match with the next register.
##' Additionally, get a value of that register.
##' @title Merge two data frames using a FIFO method.
##' @param x A data frame
##' @param y A data frame
##' @param fifo A column name which the fifo method will be applied.
##' @param fifo.x A column name in 'x' to apply the fifo method.
##' @param fifo.y A column name in 'y' to apply the fifo method.
##' @param orderby Vector of columns names to order the data.
##' @param orderby.x Vector of columns names of 'x' to order x.
##' @param orderby.y Vector of columns names of 'y' to order y.
##' @param suffixes  A ‘character(2)’ specifying the suffixes to be used for making column names.
##' ##' the values of this column.
##' @return A new object of the same class as 'x' with the first column showing the fifo result.
##' The rest of columns are the same as the original data input.
##' @author emilio
##' @export
##' @examples
##' suppressMessages(library(data.table))
##' datos <- fread(text="
##'  time,quantity,value
##'  2021-12-31T15:40:00Z,-59,3099.57
##'  2021-12-29T15:10:00Z,-667,9749.57
##'  2021-12-29T14:51:00Z,-93,3560.98
##'  2021-12-20T16:36:00Z,215,-3997.45
##'  2021-12-20T16:33:00Z,100,-2006.34
##'  2021-12-17T15:14:00Z,-252,8291.5
##'  2021-12-16T19:36:00Z,-12,1006.8
##'  2021-12-15T20:59:00Z,49,-1089.47
##'  2021-12-15T19:40:00Z,70,-2978.58
##'  2021-12-14T19:51:00Z,54,-987.53
##'  2021-12-14T19:05:00Z,354,-1991.66
##'  2021-12-14T19:01:00Z,60,-999.73
##'  2021-12-13T16:48:00Z,160,-2986.36
##'  2021-12-13T16:38:00Z,307,-3009.12
##'  2021-12-13T16:36:00Z,629,-3001.6
##'  2021-12-09T20:35:00Z,412,-3042.66
##'  2021-12-06T14:42:00Z,-189,8725.79
##'  2021-12-06T14:38:00Z,-5,465.18
##'  2021-12-03T15:37:00Z,59,-2994.76
##'  2021-12-03T15:34:00Z,375,-3031.54
##'  2021-11-30T16:35:00Z,384,-1000.94
##'  2021-11-26T17:23:00Z,33,-998.62
##'  2021-11-17T20:04:00Z,93,-3032.84
##'  2021-11-15T16:23:00Z,46,-1974.04
##'  2021-11-15T14:48:00Z,123,-2000.93
##'  2021-11-12T14:56:00Z,5,-447.2
##'  ")
##'
##' buy <- datos[quantity > 0]
##' sell <- datos[quantity < 0]
##' merge_fifo(buy,sell,fifo="quantity",orderby="time")
merge_fifo <- function(x,y,fifo=NULL,fifo.x=fifo, fifo.y=fifo,orderby=NULL,orderby.x= orderby, orderby.y=orderby,suffixes = c(".x",".y")){
    require(data.table)

    ## ============================================================
    ## We use data.table.
    ## We save the original class of 'x'
    class_x <- class(x)
    if(!is.data.frame(x)){
        m <- paste0("Argument 'x' must be a data frame.")
        stop(m)
    } else if(!is.data.table(x)){
        x <- as.data.table(x)
    }
    if(!is.data.frame(y)){
        m <- paste0("Argument 'y' must be a data frame.")
        stop(m)
    } else if(!is.data.table(y)){
        y <- as.data.table(y)
    }

    if ((x0 <- length(x) == 0L) | (y0 <- length(y) == 0L))
        warning("You are trying to join data frames where ",
                if (x0 & y0)
                    "'x' and 'y' arguments are"
                else if (x0 & !y0)
                    "'x' argument is"
                else if (!x0 & y0)
                    "'y' argument is", " 0 columns data frame.")
    if ((x0 <- nrow(x) == 0L) | (y0 <- nrow(y) == 0L))
        warning("You are trying to join data frames where ",
                if (x0 & y0)
                    "'x' and 'y' arguments are"
                else if (x0 & !y0)
                    "'x' argument is"
                else if (!x0 & y0)
                    "'y' argument is", " 0 rows data frame.")

    if (any(duplicated(names(x))))
        stop("Argument 'x' has some duplicated column name(s): ",
             paste(names(x)[duplicated(names(x))],collapse = ","),
             ". Please remove or rename the duplicate(s) and try again.")
    if (any(duplicated(names(y))))
        stop("Argument 'y' has some duplicated column name(s): ",
             paste(names(y)[duplicated(names(y))], collapse = ","),
             ". Please remove or rename the duplicate(s) and try again.")

    ## fifo column
    if ((!is.null(fifo.x) || !is.null(fifo.y)) && length(fifo.x) !=
        length(fifo.y))
        stop("`fifo.x` and `fifo.y` must be of same length.")
    if (!missing(fifo) && !missing(fifo.x))
        warning("Supplied both `fifo` and `fifo.x/fifo.y`. `fifo` argument will be ignored.")
    if (!is.null(fifo.x)) {
        if (length(fifo.x) != 1L || !is.character(fifo.x) || !is.character(fifo.y))
            stop("A column name is required for `fifo.x` and `fifo.y`.")
        if (!all(fifo.x %chin% names(x)))
            stop("Elements listed in `fifo.x` must be valid column names in x.")
        if (!all(fifo.y %chin% names(y)))
            stop("Elements listed in `fifo.y` must be valid column names in y.")
        fifo  <-  fifo.x
    }
    else {
        if (is.null(fifo))
            fifo = intersect(names(x), names(y))
        if (length(fifo) != 1L || !is.character(fifo))
            stop("A column name for `fifo` is required.")
        if (!all(fifo %chin% intersect(colnames(x), colnames(y))))
            stop("Elements listed in `fifo` must be valid column names in x and y")
        fifo  <-  unname(fifo)
        fifo.x  <- fifo.y  <-  fifo

    }
    if(anyNA(x[[fifo.x]])){
        stop(paste0("Column '",fifo.x,"' of '",deparse(substitute(x)),
                    "' contains missing values (Arguments 'fifo.x' and 'x', respectively)."))
    }
    if(anyNA(y[[fifo.y]])){
        stop(paste0("Column '",fifo.y,"' of '",deparse(substitute(y)),
                    "' contains missing values (Arguments 'fifo.y' and 'y', respectively)."))
    }
    if(!is.numeric(x[[fifo.x]])){
        stop(paste0("Column '",fifo.x,"' of '",deparse(substitute(x)),
                    "' must be numeric (Arguments 'fifo.x' and 'x', respectively)."))
    }
    if(!is.numeric(y[[fifo.y]])){
        stop(paste0("Column '",fifo.y,"' of '",deparse(substitute(y)),
                    "' must be numeric (Arguments 'fifo.y' and 'y', respectively)."))
    }
    if(any(x[[fifo.x]] < 0) && any(x[[fifo.x]]>0)){
        stop(paste0("Column '",fifo.x,"' of '",deparse(substitute(x)),
                    "' contains positive and negative values (Arguments 'fifo.x' and 'x', respectively).",
                                        " You can only use non-positive values or non negative values."))
    }
    if(any(y[[fifo.y]] < 0) && any(y[[fifo.y]]>0)){
        stop(paste0("Column '",fifo.y,"' of '",deparse(substitute(y)),
                    "' contains positive and negative values (Arguments 'fifo.y' and 'y', respectively).",
                    " You can only use non-positive values or non negative values."))
    }

    ## ============================================================
    ## Order data.

    if (!missing(orderby) && !missing(orderby.x))
        warning("Supplied both `orderby` and `orderby.x`. `orderby` argument will be ignored.")
    if (!is.null(orderby.x)) {
        if (length(orderby.x) == 0L || !is.character(orderby.x))
            stop("A non-empty vector of column names is required for `orderby.x`.")
        if (!all(orderby.x %chin% names(x)))
            stop("Elements listed in `orderby.x` must be valid column names in x.")
        oox <- data.table:::forderv(x, orderby.x)
        if(length(oox) == 0L) { # It is ordered.
            oox <- 1L:nrow(x)
        }
    } else {
        oox <- 1L:nrow(x)
    }

    if (!missing(orderby) && !missing(orderby.y))
        warning("Supplied both `orderby` and `orderby.y`. `orderby` argument will be ignored.")
    if (!is.null(orderby.y)) {
        if (length(orderby.y) == 0L || !is.character(orderby.y))
            stop("A non-empty vector of column names is required for `orderby.y`.")
        if (!all(orderby.y %chin% names(y)))
            stop("Elements listed in `orderby.y` must be valid column names in y.")
        ooy <- data.table:::forderv(y, orderby.y)
        if(length(ooy) == 0L) { 
            ooy <- 1L:nrow(y)
        }
    } else {
        ooy <- 1L:nrow(y)
    }

    ## ============================================================
    ## El número máximo de registros posibles es dos veces
    ## los registros de `x' más los de `y'.
    ## fifoglobal will be the column with the final result.
    n <- 2L*(nrow(x) + nrow(y))
    if(is.integer(x[[fifo.x]]) && is.integer(y[[fifo.y]])){
        fifoglobal <- integer(n)
        zero <- 0L
    } else {
        fifoglobal <- numeric(n)
        zero <- 0.0
    }

    registerx <- rep(NA_integer_,n)
    registery <- rep(NA_integer_,n)
    quantityx <- abs(x[[fifo.x]][oox])
    quantityy <- abs(y[[fifo.y]][ooy])
    remainx <- remainy <- zero

    ## Merge.
    pos <- 0L 
    i <- 1L
    j <- 1L 
    if(i <= nrow(x)) remainx <- quantityx[i]
    if(j <= nrow(y)) remainy <- quantityy[j]
    ## Recorremos los registros de x and y 
    while(i <= nrow(x) && j <= nrow(y) && pos < n){
        pos <- pos + 1L
        registerx[pos] <- oox[i]
        registery[pos] <- ooy[j]
        if(remainx > remainy){
            fifoglobal[pos] <- remainy
            remainx<- remainx - remainy
            remainy <- zero
            ## Como la venta j ya está finiquitada,
            ## pasamos a la siguiente.
            j <- j + 1L
            if(j <= nrow(y)){
                remainy <- quantityy[j]
            }
        } else if(remainx < remainy){
            ## Lo mismo que en el caso anterior, pero para las ventas
            fifoglobal[pos] <- remainx
            remainy <- remainy - remainx
            remainx <- zero
            i <- i + 1L
            if(i <= nrow(x)){
                remainx <- quantityx[i]
            }
        } else {
            ## Hemos liquidado tanto las ventas como las compras.
            fifoglobal[pos] <- remainx
            remainx <- zero
            remainy <- zero
            j <- j + 1L
            if(j <= nrow(y)){
                remainy <- quantityy[j]
            }
            i <- i + 1L
            if(i <= nrow(x)){
                remainx <- quantityx[i]
            }
        }
    }
    while(j <= nrow(y) && pos < n){
        pos <- pos + 1L
        registery[pos] <- ooy[j]
        fifoglobal[pos] <- remainy
        j <- j + 1L
        if(j <= nrow(y)){
            remainy <- quantityy[j]
        }
    }
    while(i <= nrow(x) && pos < n){
        pos <- pos + 1L
        registerx[pos] <- oox[i]
        fifoglobal[pos] <- remainx
        i <- i + 1L
        if(i <= nrow(x)){
            remainx <- quantityx[i]
        }
    }
    ## Eliminamos las filas que no hemos usado
    ## Join the x and y data sets.
    result <- cbind(data.table(fifo=fifoglobal[min(1L,pos):pos]),
                    x[registerx[min(1L,pos):pos]],
                    y[registery[min(1L,pos):pos]])
    ## ============================================================
    ## Column names.
    start <- names(x)
    end <- names(y)
    dupnames  <-  intersect(start, end)
    if (length(dupnames)) {
        start[chmatch(dupnames, start, 0L)] = paste0(dupnames,
                                                     suffixes[1L])
        end[chmatch(dupnames, end, 0L)] = paste0(dupnames, suffixes[2L])
    }
    setattr(result, "names", c("fifo",start,end))
    ## ============================================================
    if(n == 0L){ # Both 'x' and 'y' are empty tables.
        result <- result[FALSE] # Return a empty table.
    }
    ## ============================================================
    ## Original class
    setattr(result, "class", class_x)
    resultdupnames  <-  names(result)[duplicated(names(result))]
    if (length(resultdupnames)) {
        warning("Column names ", paste0("'", resultdupnames,
                                        "'", collapse = ", "), " are duplicated in the result.")
    }
    ## ============================================================
    result
}