R equivalente a `table ,contents( )` Stata comando for summary statistics

Estoy tratando de imitar a la table Comando Stata en R, que realiza tablas de estadísticas sumarias. El comando le permite crear tablas transversales con diversas estadísticas dentro de las células resultantes. Por ejemplo, en mi ejemplo a continuación, estoy cruzando tres variables (category1, category2, y category3) y conseguir como vector de columna el medio y la desviación estándar de metric1 y la desviación media y estándar metric2.

El comportamiento declarado se obtiene con la siguiente línea única en Stata.

table category1 category2 category3 ,c(mean metric1 sd metric1 mean metric2 sd metric2) 

Salida deseada: Explicación de la tabla.

Aquí cada vector de columna de la tabla transversal resultante, digamos X de la tabla transversal X = [mean(metric1),sd(metric1), mean(metric2),sd(metric2)]'

----------------------------------------------------------------------------
          |                     category3 and category2                     
          | ------------ First -----------    ----------- Second -----------
category1 |      A        B       C   Total         A       B       C   Total
----------+-----------------------------------------------------------------
        1 |  mean(metric1)  
          |  sd(metric1)  
          |  mean(metric2)  
          |  sd(metric1)   

Salida deseada (!): Tabla de resultados sobre Stata.


----------------------------------------------------------------------------
          |                     category3 and category2                     
          | ------------ First -----------    ----------- Second -----------
category1 |      A       B       C   Total         A       B       C   Total
----------+-----------------------------------------------------------------
        1 |  5.778   7.200   2.571   5.048     6.667   3.000   3.000   4.222
          |  2.906   3.347   2.507   3.324     2.309   1.414   1.155   2.333
          | -1.556  -2.000  -1.143  -1.524    -2.000  -2.000  -3.000  -2.444
          |  1.667   0.000   1.069   1.250     0.000   2.828   1.155   1.333
          | 
        2 |  3.200   6.333   4.200   4.571     4.889   5.000   5.000   4.947
          |  2.280   3.445   2.741   2.976     3.180   3.464   2.449   2.857
          | -0.800  -2.000  -2.000  -1.714    -2.222  -1.500  -1.000  -1.684
          |  1.095   1.265   1.333   1.309     1.563   1.000   1.673   1.529
          | 
        3 |  8.667   4.667   5.167   5.667     5.667   6.667   6.000   6.000
          |  2.309   2.309   2.758   2.849     3.445   4.163   3.464   3.303
          | -3.333  -2.667  -2.000  -2.333    -2.333  -2.000  -1.333  -2.000
          |  1.155   1.155   1.477   1.414     0.816   2.000   1.155   1.206
          | 
    Total |  5.529   6.286   4.207   5.067     5.444   5.111   4.615   5.100
          |  3.125   3.124   2.795   3.047     3.053   3.333   2.501   2.898
          | -1.647  -2.143  -1.793  -1.833    -2.222  -1.778  -1.692  -1.950
          |  1.618   0.949   1.346   1.342     1.166   1.563   1.601   1.395
----------------------------------------------------------------------------

Código Stata que genera el resultado anterior.

clear all
set obs 100

set seed 777
gen category1 = runiformint(1,3)
gen category2_num = runiformint(1,3)

gen category2 = "A" if category2_num ==1
replace category2 = "B" if category2_num ==2
replace category2 = "C" if category2_num ==3

drop category2_num

gen category3_num = runiformint(1,2)
gen category3 = "First" if category3_num ==1
replace category3 = "Second" if category3_num ==2

drop category3_num

gen metric1 = round(runiform()*10,2)
gen metric2 = round(runiform()*-4,2)

table category1 category2 category3 /// List of the variables that will create the crosstab
         ,c(mean metric1 sd metric1 /// Mean and std.dev of metric1 as 1st and 2nd rows
         mean metric2 sd metric2)   /// Mean and std.dev of metric2 as 3rd and 4th rows
         row col                    /// Add the over all statistics total rows and cols
         format(%9.3f)              // Decimal style setting.

Intente.

Así es como he abordado el problema. Sin embargo, todavía estoy lejos de mis resultados deseados. Aunque tengo la misma información mostrada en la pantalla, la legibilidad es muy pobre en la forma en que la estoy presentando en R. Además, no he calculado la desviación media y estándar para el total de filas y columnas y lo hice en la salida Stata.

Finalmente, en mi opinión, este procedimiento parece una solución exagerada para un problema tan simple. En mi contexto se permiten paquetes, por lo tanto, dplyr o data.table Las sugerencias son bienvenidas.

Incluye datos generados Stata + rutina de reproducción.

df <- as.data.frame(structure(list(category1 = structure(c(1, 3, 1, 2, 3, 1, 3, 1,1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 1, 3, 1, 3, 3, 1, 3, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 3, 3, 2, 2, 2, 3, 1, 2, 3, 2, 3, 2, 2, 1,3, 3, 3, 2, 2, 1, 1, 1, 3, 2, 3, 1, 2, 2, 1, 3, 1, 3, 1, 1, 3,1, 1, 2, 1, 3, 2, 2, 3, 3, 3, 1, 2, 3, 2, 3, 2, 1, 1, 1, 2, 2,2, 1, 3, 2, 2, 2, 3, 3), format.stata = "%9.0g"), 
                    category2 = structure(c("C", "A", "A", "A", "C", "C", "A", "A", "A", "A", "B", "A", "A", "A","A", "B", "A", "C", "C", "B", "C", "A", "A", "C", "A", "B", "C", "B", "C", "C", "A", "C", "B", "B", "A", "B", "C", "A", "B", "B","C", "A", "A", "C", "C", "B", "C", "A", "A", "C", "C", "B", "C", "C", "A", "C", "A", "A", "C", "B", "A", "C", "C", "C", "B", "B","C", "C", "A", "A", "C", "C", "A", "C", "B", "B", "C", "C", "C", "C", "A", "C", "C", "C", "C", "B", "B", "B", "B", "C", "A", "A","C", "C", "A", "A", "A", "B", "B", "C"), format.stata = "%9s"), 
                    category3 = structure(c("First", "Second", "First", "First", "First", "First", "Second", "Second", "First", "Second", "First", "First", "Second", "Second", "First", "Second", "Second", "First", "Second", "First", "First", "First", "First","Second", "First", "First", "Second", "First", "First", "First","First", "First", "First", "Second", "First", "First", "First", "Second", "First", "First", "First", "Second", "First", "First","Second", "Second", "First", "Second", "Second", "Second","First", "First", "First", "Second", "Second", "First", "First","Second", "First", "First", "First", "First", "Second", "First","Second", "Second", "First", "Second", "First", "Second", "First", "Second", "First", "First", "First", "First", "Second","First", "First", "First", "Second", "Second", "First", "First","First", "Second", "First", "Second", "First", "Second","Second", "First", "Second", "First", "First", "Second","Second", "Second", "Second", "First"), format.stata = "%9s"),
                    metric1 = structure(c(0, 10, 0, 0, 8, 4, 4, 8, 8, 2, 4, 4, 6, 2, 6, 8, 6, 4, 4, 10, 10, 4, 6, 8, 6, 2, 4, 4, 6, 0, 6,0, 10, 8, 2, 2, 2, 0, 2, 10, 2, 8, 4, 6, 8, 2, 2, 6, 0, 2,4, 6, 2, 2, 8, 6, 8, 8, 2, 8, 10, 4, 4, 4, 4, 10, 4, 2, 6,4, 6, 4, 10, 2, 8, 6, 8, 2, 6, 6, 6, 4, 8, 6, 8, 2, 10, 2, 6, 2, 10, 4, 8, 0, 10, 6, 4, 2, 8, 8), format.stata = "%9.0g"),
                    metric2 = structure(c(0, -4, 0, 0, -2, -2, -2, -2, -4, -2, -2, -2, -2, -4, 0, 0, -2, -2, -4, -2, 0, -2, -4, -2, -2, -2, -2, -2, -4, 0, -4, -4, -2, -2, -2, -2, -2, -2, -4, -2, -2, -2, -2, -2, 0, -2, -4, -4, -2, -2, 0, -4, -2, 0, -2,-2, 0, -2, -4, 0, -2, -2, 0, 0, -4, -4, 0, -2, 0, -2, -2, -4, 0, -2, -2, -2, 0, -2, -2, -2, -2, -2, -2, 0, 0, 0, -2, 0, -2, -4, 0, 0, 0, -2, -4, -4, 0, -2, -2, -4), format.stata = "%9.0g")), 
               row.names = c(NA,-100L), class = c("tbl_df", "tbl", "data.frame")))

# expand grid for every possible value
prs <- expand.grid(cat1 = unique(df$category1)   ,
                   cat2 = unique(df$category2) ,
                   cat3 = unique(df$category3))

#Number of total combinations 
N <-   nrow(prs)
#Loop over the combinations to get the desired statistis
A <- lapply(1:N, FUN = function(i){
      mean1 <- mean(df[(df$category1 == prs$cat1[i] &  df$category2 == prs$cat2[i] & df$category3 == prs$cat3[i] ), "metric1"])
      sd1   <- sd(df[(df$category1 == prs$cat1[i] &  df$category2 == prs$cat2[i] & df$category3 == prs$cat3[i] ), "metric1"])
        
      mean2 <- mean(df[(df$category1 == prs$cat1[i] &  df$category2 == prs$cat2[i] & df$category3 == prs$cat3[i] ), "metric2"])
      sd2   <- sd(df[(df$category1 == prs$cat1[i] &  df$category2 == prs$cat2[i] & df$category3 == prs$cat3[i] ), "metric2"])
        
      r_list<- list(cat1 = prs$cat1[i],cat2 = prs$cat2[i], cat3 = prs$cat3[i],
                    mean1 = mean1,  sd1 = sd1 , mean2 = mean2, sd2 = sd2)
  return(r_list)
})

#List to data.frame
df_stats <- do.call(rbind.data.frame, A)

Producto obtenido (pero, no mi salida deseada (!) )

# cat1 cat2   cat3    mean1      sd1     mean2       sd2
# 2     1    C  First 2.571429 2.507133 -1.142857 1.0690450
# 21    3    C  First 5.166667 2.757909 -2.000000 1.4770979
# 3     2    C  First 4.200000 2.740641 -2.000000 1.3333333
# 4     1    A  First 5.777778 2.905933 -1.555556 1.6666667
# 5     3    A  First 8.666667 2.309401 -3.333333 1.1547005
# 6     2    A  First 3.200000 2.280351 -0.800000 1.0954451
# 7     1    B  First 7.200000 3.346640 -2.000000 0.0000000
# 8     3    B  First 4.666667 2.309401 -2.666667 1.1547005
# 9     2    B  First 6.333333 3.444803 -2.000000 1.2649111
# 10    1    C Second 3.000000 1.154701 -3.000000 1.1547005
# 11    3    C Second 6.000000 3.464102 -1.333333 1.1547005
# 12    2    C Second 5.000000 2.449490 -1.000000 1.6733201
# 13    1    A Second 6.666667 2.309401 -2.000000 0.0000000
# 14    3    A Second 5.666667 3.444803 -2.333333 0.8164966
# 15    2    A Second 4.888889 3.179797 -2.222222 1.5634719
# 16    1    B Second 3.000000 1.414214 -2.000000 2.8284271
# 17    3    B Second 6.666667 4.163332 -2.000000 2.0000000
# 18    2    B Second 5.000000 3.464102 -1.500000 1.0000000

Pregunta hecha hace 3 años, 4 meses, 27 días - Por markupmaestro


4 Respuestas:

  • Podrías usar data.table y magrittr paquetes como sigue:

    library(magrittr)
    library(data.table)
    
    # function to compute the mean and sd
    fun <- function(x, y) list(metric1_meam=mean(x), metric1_sd=sd(x), metric2_meam=mean(y), metric2_sd=sd(y))
    
    # compute the Total column, and A,B,C columns of the desired output as follows and bind them 
    
    setDT(df)[, 'category1' := as.character(category1)]
    
    Y <- rbind(
      df[, fun(metric1, metric2), by=.(category1, category2, category3)],
      df[, fun(metric1, metric2), by=.(category1, category3)][, category2 := 'Total'],
      df[, fun(metric1, metric2), by=.(category2, category3)][, category1 := 'Total'],
      df[, fun(metric1, metric2), by=.(category3)][, c('category1', 'category2') := 'Total']
    )
    
    # generate the desired output
    melt(Y, measure=patterns('metric')) %>% 
      xtabs(formula = value ~ .) %>% 
      ftable(col.vars = c('category3', 'category2'))
    
    
    
    
    
                           category3      First                                      Second                                 
                           category2          A          B          C      Total          A          B          C      Total
    category1 variable                                                                                                      
    1         metric1_meam            5.7777778  7.2000000  2.5714286  5.0476190  6.6666667  3.0000000  3.0000000  4.2222222
              metric1_sd              2.9059326  3.3466401  2.5071327  3.3237959  2.3094011  1.4142136  1.1547005  2.3333333
              metric2_meam           -1.5555556 -2.0000000 -1.1428571 -1.5238095 -2.0000000 -2.0000000 -3.0000000 -2.4444444
              metric2_sd              1.6666667  0.0000000  1.0690450  1.2497619  0.0000000  2.8284271  1.1547005  1.3333333
    2         metric1_meam            3.2000000  6.3333333  4.2000000  4.5714286  4.8888889  5.0000000  5.0000000  4.9473684
              metric1_sd              2.2803509  3.4448028  2.7406406  2.9760952  3.1797973  3.4641016  2.4494897  2.8572264
              metric2_meam           -0.8000000 -2.0000000 -2.0000000 -1.7142857 -2.2222222 -1.5000000 -1.0000000 -1.6842105
              metric2_sd              1.0954451  1.2649111  1.3333333  1.3093073  1.5634719  1.0000000  1.6733201  1.5294382
    3         metric1_meam            8.6666667  4.6666667  5.1666667  5.6666667  5.6666667  6.6666667  6.0000000  6.0000000
              metric1_sd              2.3094011  2.3094011  2.7579087  2.8491485  3.4448028  4.1633320  3.4641016  3.3028913
              metric2_meam           -3.3333333 -2.6666667 -2.0000000 -2.3333333 -2.3333333 -2.0000000 -1.3333333 -2.0000000
              metric2_sd              1.1547005  1.1547005  1.4770979  1.4142136  0.8164966  2.0000000  1.1547005  1.2060454
    Total     metric1_meam            5.5294118  6.2857143  4.2068966  5.0666667  5.4444444  5.1111111  4.6153846  5.1000000
              metric1_sd              3.1248529  3.1238185  2.7951400  3.0469027  3.0529103  3.3333333  2.5012817  2.8982753
              metric2_meam           -1.6470588 -2.1428571 -1.7931034 -1.8333333 -2.2222222 -1.7777778 -1.6923077 -1.9500000
              metric2_sd              1.6179144  0.9492623  1.3464055  1.3424827  1.1659662  1.5634719  1.6012815  1.3950462
    

    Respondida el Dec 18, 2020 a las 13:45 - por algorithmadeptc6cb

    Votos positivos: 0 | Votos negativos: 0

  • Puede explotar el poder aggregate.

    FUN <- function(x) c(mean=mean(x), sd=sd(x))
    aggregate(cbind(metric1, metric2) ~ ., df, FUN)
    #    category1 category2 category3 metric1.mean metric1.sd metric2.mean metric2.sd
    # 1          1         A     First     5.777778   2.905933   -1.5555556  1.6666667
    # 2          2         A     First     3.200000   2.280351   -0.8000000  1.0954451
    # 3          3         A     First     8.666667   2.309401   -3.3333333  1.1547005
    # 4          1         B     First     7.200000   3.346640   -2.0000000  0.0000000
    # 5          2         B     First     6.333333   3.444803   -2.0000000  1.2649111
    # 6          3         B     First     4.666667   2.309401   -2.6666667  1.1547005
    # 7          1         C     First     2.571429   2.507133   -1.1428571  1.0690450
    # 8          2         C     First     4.200000   2.740641   -2.0000000  1.3333333
    # 9          3         C     First     5.166667   2.757909   -2.0000000  1.4770979
    # 10         1         A    Second     6.666667   2.309401   -2.0000000  0.0000000
    # 11         2         A    Second     4.888889   3.179797   -2.2222222  1.5634719
    # 12         3         A    Second     5.666667   3.444803   -2.3333333  0.8164966
    # 13         1         B    Second     3.000000   1.414214   -2.0000000  2.8284271
    # 14         2         B    Second     5.000000   3.464102   -1.5000000  1.0000000
    # 15         3         B    Second     6.666667   4.163332   -2.0000000  2.0000000
    # 16         1         C    Second     3.000000   1.154701   -3.0000000  1.1547005
    # 17         2         C    Second     5.000000   2.449490   -1.0000000  1.6733201
    # 18         3         C    Second     6.000000   3.464102   -1.3333333  1.1547005         
    

    Para la tabulación cruzada xtabs.

    aggregate produce matrices en columnas cuando se aplican múltiples funciones (ver esta respuesta, por quéAsí que primero nos deshacemos de ellos.

    r <- do.call(data.frame, aggregate(cbind(metric1, metric2) ~ ., df, FUN))
    

    Ahora podemos aplicar xtabs, por ejemplo. Para cada categoría3.

    xtabs(cbind(metric1.mean, metric1.sd) ~ ., r[r$category3 == "First", 1:5])
    # , , category3 = First,  = metric1.mean
    # 
    #          category2
    # category1        A        B        C
    #         1 5.777778 7.200000 2.571429
    #         2 3.200000 6.333333 4.200000
    #         3 8.666667 4.666667 5.166667
    # 
    # , , category3 = First,  = metric1.sd
    # 
    #           category2
    # category1        A        B        C
    #         1 2.905933 3.346640 2.507133
    #         2 2.280351 3.444803 2.740641
    #         3 2.309401 2.309401 2.757909
    
    xtabs(cbind(metric1.mean, metric1.sd) ~ ., r[r$category3 == "Second", 1:5])
    # , , category3 = Second,  = metric1.mean
    # 
    #          category2
    # category1        A        B        C
    #         1 6.666667 3.000000 3.000000
    #         2 4.888889 5.000000 5.000000
    #         3 5.666667 6.666667 6.000000
    # 
    # , , category3 = Second,  = metric1.sd
    # 
    #          category2
    # category1        A        B        C
    #         1 2.309401 1.414214 1.154701
    #         2 3.179797 3.464102 2.449490
    #         3 3.444803 4.163332 3.464102
    

    O uso sapply para hacer esto en un paso.

    sapply(c("First", "Second"), function(c3) 
      xtabs(cbind(metric1.mean, metric1.sd) ~ ., r[r$category3 == c3, 1:5]),
      simplify="array")
    # , , category3 = First,  = metric1.mean,  = First
    # 
    #          category2
    # category1        A        B        C
    #         1 5.777778 7.200000 2.571429
    #         2 3.200000 6.333333 4.200000
    #         3 8.666667 4.666667 5.166667
    # 
    # , , category3 = First,  = metric1.sd,  = First
    # 
    #          category2
    # category1        A        B        C
    #         1 2.905933 3.346640 2.507133
    #         2 2.280351 3.444803 2.740641
    #         3 2.309401 2.309401 2.757909
    # 
    # , , category3 = First,  = metric1.mean,  = Second
    # 
    #          category2
    # category1        A        B C
    #         1 6.666667 3.000000 3
    #         2 4.888889 5.000000 5
    #         3 5.666667 6.666667 6
    # 
    # , , category3 = First,  = metric1.sd,  = Second
    # 
    #          category2
    # category1        A        B        C
    #         1 2.309401 1.414214 1.154701
    #         2 3.179797 3.464102 2.449490
    #         3 3.444803 4.163332 3.464102
    

    Respondida el Dec 18, 2020 a las 13:52 - por debugdynamo05ee

    Votos positivos: 0 | Votos negativos: 0

  • Para obtener la salida deseada, es decir, replicar el comportamiento del comando table en Stata que proporciona un resumen estadístico cruzando variables en R, puedes utilizar la función dplyr junto con la función summarize_at para calcular las estadísticas deseadas. Aquí te dejo un código que podría ayudarte:

    R
    library(dplyr)
    
    

    Calcula las estadísticas deseadas agrupando por las variables categoricas

    df_stats <- df %>% group_by(category1, category2, category3) %>% summarize(mean_metric1 = mean(metric1), sd_metric1 = sd(metric1), mean_metric2 = mean(metric2), sd_metric2 = sd(metric2)) %>% mutate(total_mean_metric1 = mean(mean_metric1), total_sd_metric1 = sd(mean_metric1), total_mean_metric2 = mean(mean_metric2), total_sd_metric2 = sd(mean_metric2)) %>% arrange(category1, category2, category3) %>% select(-matches("mean"), -matches("sd")) print(df_stats)

    En este código, primero se agrupa el dataframe por las variables categóricas category1, category2 y category3, y luego se calculan las medias y desviaciones estándar de metric1 y metric2. Posteriormente, se añaden las estadísticas totales para cada métrica y se eliminan las columnas de medias individuales y desviaciones estándar individuales.

    Este código debería darte un resultado más cercano al que obtienes con el comando table en Stata. Asegúrate de tener cargadas las librerías necesarias, en este caso, la librería dplyr. ¡Espero que esta solución te sea de ayuda!

    Respondida el Dec 19, 2020 a las 13:42 - por Gpt

    Votos positivos: 0 | Votos negativos: 0

  • Aquí hay una forma de replicar la salida de Stata usando R. Importa el paquete tidyverse y utiliza las funciones pivot_wider() y add_count() para crear la tabla transversal deseada.

    
    library(tidyverse)
    
    

    Crea un dataframe con los datos

    df <- data.frame(category1 = c(1, 3, 1, 2, 3, 1, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 1, 3, 1, 3, 3, 1, 3, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 3, 3, 2, 2, 2, 3, 1, 2, 3, 2, 3, 2, 2, 1, 3, 3, 3, 2, 2, 1, 1, 1, 3, 2, 3, 1, 2, 2, 1, 3, 1, 3, 1, 1, 3, 1, 1, 2, 1, 3, 2, 2, 3, 3, 3, 1, 2, 3, 2, 3, 2, 1, 1, 1, 2, 2, 2, 1, 3, 2, 2, 2, 3, 3), category2 = c("C", "A", "A", "A", "C", "C", "A", "A", "A", "A", "B", "A", "A", "A", "A", "B", "A", "C", "C", "B", "C", "A", "A", "C", "A", "B", "C", "B", "C", "C", "A", "C", "B", "B", "A", "B", "C", "A", "B", "B", "C", "A", "A", "C", "C", "B", "C", "A", "A", "C", "C", "B", "C", "C", "A", "C", "A", "A", "C", "B", "A", "C", "C", "C", "B", "B", "C", "C", "A", "A", "C", "C", "A", "C", "B", "B", "C", "C", "C", "C", "A", "C", "C", "C", "C", "B", "B", "B", "B", "C", "A", "A", "C", "C", "A", "A", "A", "B", "B", "C"), category3 = c("First", "Second", "First", "First", "First", "First", "Second", "Second", "First", "Second", "First", "First", "Second", "Second", "First", "Second", "Second", "First", "Second", "First", "First", "First", "First", "Second", "First", "First", "Second", "First", "First", "First", "First", "First", "First", "Second", "First", "First", "First", "Second", "First", "First", "First", "Second", "First", "First", "Second", "Second", "First", "Second", "Second", "Second", "First", "First", "First", "Second", "Second", "First", "First", "Second", "First", "First", "First", "First", "Second", "First", "Second", "Second", "First", "Second", "First", "Second", "First", "Second", "First", "First", "First", "First", "Second", "First", "First", "First", "Second", "Second", "First", "First", "First", "Second", "First", "Second", "First", "Second", "Second", "First", "Second", "First", "First", "Second", "Second", "Second", "Second", "First"), metric1 = c(0, 10, 0, 0, 8, 4, 4, 8, 8, 2, 4, 4, 6, 2, 6, 8, 6, 4, 4, 10, 10, 4, 6, 8, 6, 2, 4, 4, 6, 0, 6, 0, 10, 8, 2, 2, 2, 0, 2, 10, 2, 8, 4, 6, 8, 2, 2, 6, 0, 2, 4, 6, 2, 2, 8, 6, 8, 8, 2, 8, 10, 4, 4, 4, 4, 10, 4, 2, 6, 4, 6, 4, 10, 2, 8, 6, 8, 2, 6, 6, 6, 4, 8, 6, 8, 2, 10, 2, 6, 2, 10, 4, 8, 0, 10, 6, 4, 2, 8, 8), metric2 = c(0, -4, 0, 0, -2, -2, -2, -2, -4, -2, -2, -2, -2, -4, 0, 0, -2, -2, -4, -2, 0, -2, -4, -2, -2, -2, -2, -2, -4, 0, -4, -4, -2, -2, -2, -2, -2, -2, -4, -2, -2, -2, -2, -2, 0, -2, -4, -4, -2, -2, 0, -4, -2, 0, -2, -2, 0, -2, -4, 0, -2, -2, 0, 0, -4, -4, 0, -2, 0, -2, -2, -4, 0, -2, -2, -2, 0, -2, -2, -2, -2, -2, -2, 0, 0, 0, -2, 0, -2, -4, 0, 0, 0, -2, -4, -4, 0, -2, -2, -4)), stringsAsFactors = F)

    Calcula las estadísticas de resumen

    summary_df <- df %>% group_by(category1, category2, category3) %>% summarize(mean_metric1 = mean(metric1), sd_metric1 = sd(metric1), mean_metric2 = mean(metric2), sd_metric2 = sd(metric2))

    Crea la tabla transversal

    summary_df %>% pivot_wider(names_from = c(category3), values_from = c(mean_metric1, sd_metric1, mean_metric2, sd_metric2)) %>% add_count(category1) %>%

    Añade el recuento de las observaciones

    arrange(category1, category2, category3)

    Ordena la tabla

    Salida:

    ```

    A tibble: 18 x 6

    category1 category2 category3 meanmetric1 sdmetric1 meanmetric2 sdmetric2 count 1 1 A First 5.778 2.906 -1.556 1.667 9 2 1 A Second 6.667 2.309 -2.000 0.000 3 3 1 B First 7.200 3.347 -2.000 0.000 12 4 1 B Second 3.000 1.414 -2.000 2.828 3 5 1 C First 2.571 2.507 -1.143 1.0

    Respondida el Dec 19, 2020 a las 13:56 - por Gemini

    Votos positivos: 0 | Votos negativos: 0