Introduction to R

for Environmental Sciences

Ismail SEZEN

Istanbul Technical University

Eurasia Institute of Earth Sciences

December 12, 2023

Resources

How to get help from R?

help.start()
help(lm)
?lm
example(lm)
?help

Homeworks & Data

Notes:

  • Please, email your answers privately to sezenismail at gmail com.
  • Answers/codes should be in a text file named hwX_your_full_name.py.
  • For each answer, write a comment line like # A1 and write your answer below.
  • Write your conclusions as # R comments in the text file.
  • Do not use for-loop unless stated otherwise.

Preliminary

R commands;

  • are Case Sensitive
  • can be separated either by a semi-colon (‘;’), or by a newline.
  • Hash mark (‘#’) means Comment (is it required? absolutely YES)
# this is a comment
# Define a variable
A = 5; a = 10 # R is case sensitive
print(paste("A is", A))
[1] "A is 5"
print(paste("a is", a))
[1] "a is 10"
cat("A and a are equal? = ", A == a)
A and a are equal? =  FALSE
cat("My name is", Sys.info()["user"])
My name is isezen

SEE ALSO

?print | ?paste | ?cat

Data permanency and removing objects

The entities that R creates and manipulates are known as objects.

Object: variables, arrays of numbers, character strings, functions or more general structures built from such components

ls() # list objects in the current session
[1] "a"               "A"               "has_annotations"
rm(a) # remove object named a
ls() # list again to see what we have
[1] "A"               "has_annotations"
print(A) # print a to the console
[1] 5


If you indicate that you want to do this, the objects are written to a file called .RData in the current directory, and the command lines used in the session are saved to a file called .Rhistory.

SEE ALSO

?ls | ?rm

Simple Manipulations

  • Assignment
  • Vector arithmetic
  • Generate regular sequences
  • Repeat an object
  • Logical Vectors
  • Missing Values
  • Character vectors
  • Indexing, selecting and modifying

Assignment

Use always -> or <- symbol combination if you intend to an assignment

# Assignment
x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
y <- c(x, 0, x) # c is abbreviation for combine
print(y)
 [1] 10.4  5.6  3.1  6.4 21.7  0.0 10.4  5.6  3.1  6.4 21.7
1/x
[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295

SEE ALSO

?c | ?assign

Vector arithmetic

The elementary arithmetic operators: +, -,_*_, / and ^

x ^ 2 # take the square
sqrt(y) # square root
x/y
v <- 2*x + y + 1
length(v) # what is the length of v? why?

Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector.

SEE ALSO

??"arithmetic operations" | ?log | ?exp | ?sin | ?cos | ?tan | ?sqrt

Vector arithmetic

anything else? (max, min, sum, mean, var, std)

sum(x) # sum of values in x vector
sum(x)/length(x) # calculate mean
mean(x) # easier mean calculation
min(x); max(x)

SEE ALSO

?var | ?sd | ?range | ?sort | ?order | ?mean | ?sum | ?summary | ?abs

Generate regular sequences

5:17
 [1]  5  6  7  8  9 10 11 12 13 14 15 16 17
seq(-5, 5, by = 0.2)
 [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0
[12] -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8
[23] -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4
[34]  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6
[45]  3.8  4.0  4.2  4.4  4.6  4.8  5.0
seq(length = 51, from = -5, by = 0.2)
 [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0
[12] -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8
[23] -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4
[34]  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2  3.4  3.6
[45]  3.8  4.0  4.2  4.4  4.6  4.8  5.0

SEE ALSO

?seq

Repeat an object

rep(x, times = 5)
 [1] 10.4  5.6  3.1  6.4 21.7 10.4  5.6  3.1  6.4 21.7 10.4
[12]  5.6  3.1  6.4 21.7 10.4  5.6  3.1  6.4 21.7 10.4  5.6
[23]  3.1  6.4 21.7
rep(x, each = 5)
 [1] 10.4 10.4 10.4 10.4 10.4  5.6  5.6  5.6  5.6  5.6  3.1
[12]  3.1  3.1  3.1  3.1  6.4  6.4  6.4  6.4  6.4 21.7 21.7
[23] 21.7 21.7 21.7

SEE ALSO

?rep

Logical Vectors

  • The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”).
  • Always use TRUE and FALSE, not T and F.
5 > 10
[1] FALSE
x > 13
[1] FALSE FALSE FALSE FALSE  TRUE
as.numeric(x > 13)
[1] 0 0 0 0 1

SEE ALSO

?"Comparison"

Missing Values

When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA.

z <- c(1:3, NA) # a vector contains an NA
is.na(z) # which element(s) of z is NA?
[1] FALSE FALSE FALSE  TRUE
z == NA # wrong way!
[1] NA NA NA NA
0/0 # meaningless
[1] NaN

is.na(xx) is TRUE both for NA and NaN.

SEE ALSO

?is.na | ?is.finite

Character vectors

they are denoted by a sequence of characters delimited by the double quote character.

labs <- paste(c("X","Y"), 1:10, sep = "")
print(labs)
 [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9" 
[10] "Y10"

SEE ALSO

?paste | ?paste0

Indexing, selecting and modifying

print(x) # What was x?
[1] 10.4  5.6  3.1  6.4 21.7
x[3] <- NA # set 3th element of x to NA
print(x)
[1] 10.4  5.6   NA  6.4 21.7
!is.na(x) # The ones that are not NA
[1]  TRUE  TRUE FALSE  TRUE  TRUE
(non_na_x <- x[!is.na(x)]) #non-NA values of x
[1] 10.4  5.6  6.4 21.7

SEE ALSO

?`[[`

Indexing, selecting and modifying

# create a random integer array to
# represent month
set.seed(2) # set seed for reproducibility
month <- round(runif(30, 1, 12))
# get month names
(char_month <- month.abb[month])
 [1] "Mar" "Sep" "Jul" "Mar" "Nov" "Nov" "Feb" "Oct" "Jun"
[10] "Jul" "Jul" "Apr" "Sep" "Mar" "May" "Oct" "Dec" "Mar"
[19] "Jun" "Feb" "Aug" "May" "Oct" "Mar" "May" "Jun" "Mar"
[28] "May" "Dec" "Feb"
which(char_month == "Jun") # Which are June?
[1]  9 19 26
which(month == 6) # same as above
[1]  9 19 26

SEE ALSO

?set.seed | ?runif | ?round | ?which | ?which.max | ?which.min

Indexing, selecting and modifying

(char_month[1:12]) # select first 12 months
 [1] "Mar" "Sep" "Jul" "Mar" "Nov" "Nov" "Feb" "Oct" "Jun"
[10] "Jul" "Jul" "Apr"
(char_month[-(1:20)]) # exclude first 20
 [1] "Aug" "May" "Oct" "Mar" "May" "Jun" "Mar" "May" "Dec"
[10] "Feb"
# exclude June from vector
(char_month[-which(month == 6)])
 [1] "Mar" "Sep" "Jul" "Mar" "Nov" "Nov" "Feb" "Oct" "Jul"
[10] "Jul" "Apr" "Sep" "Mar" "May" "Oct" "Dec" "Mar" "Feb"
[19] "Aug" "May" "Oct" "Mar" "May" "Mar" "May" "Dec" "Feb"

Indexing, selecting and modifying

# use months and month names together
# set names of month
names(month) <- char_month
print(month)
Mar Sep Jul Mar Nov Nov Feb Oct Jun Jul Jul Apr Sep Mar May 
  3   9   7   3  11  11   2  10   6   7   7   4   9   3   5 
Oct Dec Mar Jun Feb Aug May Oct Mar May Jun Mar May Dec Feb 
 10  12   3   6   2   8   5  10   3   5   6   3   5  12   2 
(month[(month == 6)])
Jun Jun Jun 
  6   6   6 

SEE ALSO

?names | ?colnames | ?rownames

Complex Data Types (Objects)

  • Atomic Objects
  • Other types of objects
    • Factors
    • Matrices/Arrays
    • Lists
    • Data frames (A combination of matrix and List, but columns can be of different types)

Atomic Objects

  • Atomic objects are all of the same type. (numeric, complex, logical, character …)
(z <- 0:9)
 [1] 0 1 2 3 4 5 6 7 8 9
(digits <- as.character(z))
 [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"

SEE ALSO

?as.numeric | ?as.character | ?as.logical | ?as.matrix

Other types of objects

  • Factors
  • Matrices/Arrays
  • Lists
  • Data frames (A combination of matrix and List, but columns can be of different types)

SEE ALSO

?matrix | ?factor | ?list | ?data.frame

Factors

A factor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length. R provides both ordered and un-ordered factors.

options(digits = 3) # print only 3 digits
set.seed(1) # set seed for reproducibility
# simulate pm10 distribution
pm10 <- 10 ^ rnorm(100, 1.6, 0.27)
# higher values than 40 ug/m^3
pm10 <- pm10[pm10 > 40]
regions <- c("mar", "ege", "kdz", "ica",
             "akd", "dga", "gda")
reg <- factor(
  sample(1:length(regions), length(pm10),
         replace = TRUE))
levels(reg) <- regions
head(data.frame(reg, pm10), 11)
   reg  pm10
1  ica  44.6
2  ege 107.3
3  akd  48.9
4  ege  53.9
5  ege  63.0
6  ege  56.9
7  kdz 101.9
8  mar  50.7
9  ege  80.1
10 kdz  71.6
11 gda  66.3
# means by region
tapply(pm10, reg, mean)
 mar  ege  kdz  ica  akd  dga  gda 
59.5 65.6 65.9 69.6 71.6 69.6 80.2 

Arrays

  • Vectors are one dimensional (Single type)
  • Matrices and data frames are two dimensional (Single type)
  • Arrays can hold more dimensions. (Single type)
arr3d <- array(1:24, dim = c(4, 3, 2),
                dimnames = list(
                  c("one", "two", "three", "four"),
                  c("ein", "zwei", "drei"),
                  c("un", "deux")))
mat <- matrix(1:12, nrow = 4, byrow = TRUE,
              dimnames = list(
                c("one", "two", "three", "four"),
                c("ein", "zwei", "drei")))
class(arr3d); class(mat) # class of object
length(arr3d); length(mat) # length of object
dim(arr3d); dim(mat) # dimensions
nrow(arr3d); nrow(mat) # number of rows
ncol(arr3d); ncol(mat) # number of columns
rownames(arr3d); rownames(mat)
colnames(arr3d); colnames(mat)
dimnames(arr3d); dimnames(mat)

Indexing

Indexing: array[row, column, …]

print(mat)
      ein zwei drei
one     1    2    3
two     4    5    6
three   7    8    9
four   10   11   12
mat[1:2, 2:3]
    zwei drei
one    2    3
two    5    6
mat[,2]
  one   two three  four 
    2     5     8    11 
mat[,"zwei"]
  one   two three  four 
    2     5     8    11 
class(mat[,2])
[1] "integer"
class(mat[,2, drop = F])
[1] "matrix" "array" 

Combining

# Example matrix
mat2 <- matrix(seq.int(2, 24, 2), nrow = 4,
               dimnames = list(
                 c("five", "six", "seven", "eight"),
                 c("vier", "funf", "sechs")))
rbind(mat, mat2)
      ein zwei drei
one     1    2    3
two     4    5    6
three   7    8    9
four   10   11   12
five    2   10   18
six     4   12   20
seven   6   14   22
eight   8   16   24
cbind(mat, mat2)
      ein zwei drei vier funf sechs
one     1    2    3    2   10    18
two     4    5    6    4   12    20
three   7    8    9    6   14    22
four   10   11   12    8   16    24

SEE ALSO

?rbind | ?cbind | ?c

Arithmetic

mat
      ein zwei drei
one     1    2    3
two     4    5    6
three   7    8    9
four   10   11   12
mat2
      vier funf sechs
five     2   10    18
six      4   12    20
seven    6   14    22
eight    8   16    24
mat + mat2
      ein zwei drei
one     3   12   21
two     8   17   26
three  13   22   31
four   18   27   36
mat %*% c(1,2,3) # matrix multiplication
      [,1]
one     14
two     32
three   50
four    68
mat / mat2
       ein  zwei  drei
one   0.50 0.200 0.167
two   1.00 0.417 0.300
three 1.17 0.571 0.409
four  1.25 0.688 0.500
diag(mat)
[1] 1 5 9
diag(mat) <- 0 # set diagonal to zero
mat
      ein zwei drei
one     0    2    3
two     4    0    6
three   7    8    0
four   10   11   12

SEE ALSO

?t | ?aperm | ?diag | ?`%*%` | ?`%o%`

Lists

  • An object consisting of an ordered collection of objects
  • There is no particular need for the components to be of the same mode or type
# Define/create a list object
Lst <- list(name = "John",
            wife = "Mary",
            no.children = 3,
            child.ages = c(4,7,9))
str(Lst)
List of 4
 $ name       : chr "John"
 $ wife       : chr "Mary"
 $ no.children: num 3
 $ child.ages : num [1:3] 4 7 9
Lst$name # equal to Lst[[1]]
Lst[[4]] # equal to Lst$child.ages
Lst[["wife"]] # same as Lst$wife
[1] "John"
[1] 4 7 9
[1] "Mary"
names(Lst)
[1] "name"        "wife"        "no.children" "child.ages" 

Constructing and modifying lists

Lst <- list(pm10 = pm10, region = reg)
str(Lst)
List of 2
 $ pm10  : num [1:53] 44.6 107.3 48.9 53.9 63 ...
 $ region: Factor w/ 7 levels "mar","ege","kdz",..: 4 2 5 2 2 2 3 1 2 3 ...
head(as.data.frame(Lst))
   pm10 region
1  44.6    ica
2 107.3    ege
3  48.9    akd
4  53.9    ege
5  63.0    ege
6  56.9    ege

SEE ALSO

?list | ?str | ?as.data.frame

Concatenating lists

list.A <- list(name = "John", married = T,
               child.count = 3)
list.B <- list(name = "Jenny", married = F)
# Combine 2 lists
Lst <- c(list.A, list.B)
str(Lst)
List of 5
 $ name       : chr "John"
 $ married    : logi TRUE
 $ child.count: num 3
 $ name       : chr "Jenny"
 $ married    : logi FALSE
head(as.data.frame(Lst))
  name married child.count name.1 married.1
1 John    TRUE           3  Jenny     FALSE
Lst$name # which one? John or Jenny
[1] "John"

Data frames

# Create a data.frame object
df <- data.frame(pm10 = pm10, region = reg)
str(df)
'data.frame':   53 obs. of  2 variables:
 $ pm10  : num  44.6 107.3 48.9 53.9 63 ...
 $ region: Factor w/ 7 levels "mar","ege","kdz",..: 4 2 5 2 2 2 3 1 2 3 ...
head(df)
   pm10 region
1  44.6    ica
2 107.3    ege
3  48.9    akd
4  53.9    ege
5  63.0    ege
6  56.9    ege

SEE ALSO

?list | ?str | ?as.data.frame

Reading data from files

  • read.table
  • read.csv
  • Best Practices

read.table/I

# Read data from text file
dt <- read.table("data.txt")
class(dt)
[1] "data.frame"

str(dt)
'data.frame':   9 obs. of  5 variables:
 $ V1: int  100 200 300 400 500 600 700 800 900
 $ V2: chr  "a1" "a2" "a3" "a4" ...
 $ V3: chr  "b1" "b2" "b3" "b4" ...
 $ V4: logi  TRUE TRUE FALSE FALSE FALSE TRUE ...
 $ V5: chr  "x" "x" "x" "y" ...
head(dt)
   V1 V2 V3    V4 V5
1 100 a1 b1  TRUE  x
2 200 a2 b2  TRUE  x
3 300 a3 b3 FALSE  x
4 400 a4 b4 FALSE  y
5 500 a5 b5 FALSE  y
6 600 a6 b6  TRUE  y

SEE ALSO

?read.table

read.table/II

dt <- read.table("data.txt")
dt2 <- read.table("data.txt",
                  stringsAsFactors = TRUE)
str(dt)
'data.frame':   9 obs. of  5 variables:
 $ V1: int  100 200 300 400 500 600 700 800 900
 $ V2: chr  "a1" "a2" "a3" "a4" ...
 $ V3: chr  "b1" "b2" "b3" "b4" ...
 $ V4: logi  TRUE TRUE FALSE FALSE FALSE TRUE ...
 $ V5: chr  "x" "x" "x" "y" ...
head(dt)
   V1 V2 V3    V4 V5
1 100 a1 b1  TRUE  x
2 200 a2 b2  TRUE  x
3 300 a3 b3 FALSE  x
4 400 a4 b4 FALSE  y
5 500 a5 b5 FALSE  y
6 600 a6 b6  TRUE  y
str(dt2)
'data.frame':   9 obs. of  5 variables:
 $ V1: int  100 200 300 400 500 600 700 800 900
 $ V2: Factor w/ 9 levels "a1","a2","a3",..: 1 2 3 4 5 6 7 8 9
 $ V3: Factor w/ 9 levels "b1","b2","b3",..: 1 2 3 4 5 6 7 8 9
 $ V4: logi  TRUE TRUE FALSE FALSE FALSE TRUE ...
 $ V5: Factor w/ 3 levels "x","y","z": 1 1 1 2 2 2 1 3 3
head(dt2)
   V1 V2 V3    V4 V5
1 100 a1 b1  TRUE  x
2 200 a2 b2  TRUE  x
3 300 a3 b3 FALSE  x
4 400 a4 b4 FALSE  y
5 500 a5 b5 FALSE  y
6 600 a6 b6  TRUE  y

SEE ALSO

?read.table | ?factor

read.csv/I

# read csv file as a data.frame
dt.pm10 <- read.csv("pm10.csv", sep = ";")
# or use read.csv2
class(dt.pm10) # "data.frame"
head(dt.pm10)
                 Date sta1 sta2 sta3 sta4 sta5
1 2008-01-01 00:00:00   NA 36.6 56.9   NA 51.6
2 2008-01-01 01:00:00   NA 30.5 45.8   NA 40.4
3 2008-01-01 02:00:00   NA 33.3 25.3   NA 78.9
4 2008-01-01 03:00:00   NA   NA 20.4   NA 39.4
5 2008-01-01 04:00:00   NA 35.0 35.1   NA 54.6
6 2008-01-01 05:00:00 18.1 29.5 23.7   NA 24.3
str(dt.pm10)
'data.frame':   43848 obs. of  6 variables:
 $ Date: chr  "2008-01-01 00:00:00" "2008-01-01 01:00:00" "2008-01-01 02:00:00" "2008-01-01 03:00:00" ...
 $ sta1: num  NA NA NA NA NA 18.1 NA 13.1 NA 28.2 ...
 $ sta2: num  36.6 30.5 33.3 NA 35 29.5 17 39.8 43.5 66.5 ...
 $ sta3: num  56.9 45.8 25.3 20.4 35.1 23.7 44 47.2 NA 38.4 ...
 $ sta4: num  NA NA NA NA NA NA NA NA NA NA ...
 $ sta5: num  51.6 40.4 78.9 39.4 54.6 24.3 16.8 NA 49.7 20.3 ...

SEE ALSO

?read.csv | ?read.csv2

read.csv/II

Set column classes at first. But note that we lost the time information. Why?

# Sys.setenv(TZ='GMT')
dt.pm10 <- read.csv("pm10.csv", sep = ";",
               colClasses = c("POSIXct", "numeric", "numeric",
                              "numeric", "numeric", "numeric"))
str(dt.pm10)
'data.frame':   43848 obs. of  6 variables:
 $ Date: POSIXct, format: "2008-01-01" ...
 $ sta1: num  NA NA NA NA NA 18.1 NA 13.1 NA 28.2 ...
 $ sta2: num  36.6 30.5 33.3 NA 35 29.5 17 39.8 43.5 66.5 ...
 $ sta3: num  56.9 45.8 25.3 20.4 35.1 23.7 44 47.2 NA 38.4 ...
 $ sta4: num  NA NA NA NA NA NA NA NA NA NA ...
 $ sta5: num  51.6 40.4 78.9 39.4 54.6 24.3 16.8 NA 49.7 20.3 ...
head(dt.pm10)
        Date sta1 sta2 sta3 sta4 sta5
1 2008-01-01   NA 36.6 56.9   NA 51.6
2 2008-01-01   NA 30.5 45.8   NA 40.4
3 2008-01-01   NA 33.3 25.3   NA 78.9
4 2008-01-01   NA   NA 20.4   NA 39.4
5 2008-01-01   NA 35.0 35.1   NA 54.6
6 2008-01-01 18.1 29.5 23.7   NA 24.3

read.csv/III

Because dates are not UTC. Daylight saving is a problem.

Sys.setenv(TZ='GMT')
dt.pm10 <- read.csv("pm10.csv", sep = ";",
               colClasses = c("POSIXct", "numeric", "numeric",
                              "numeric", "numeric", "numeric"))
# Sys.setenv(TZ='EET')
head(dt.pm10)
                 Date sta1 sta2 sta3 sta4 sta5
1 2008-01-01 00:00:00   NA 36.6 56.9   NA 51.6
2 2008-01-01 01:00:00   NA 30.5 45.8   NA 40.4
3 2008-01-01 02:00:00   NA 33.3 25.3   NA 78.9
4 2008-01-01 03:00:00   NA   NA 20.4   NA 39.4
5 2008-01-01 04:00:00   NA 35.0 35.1   NA 54.6
6 2008-01-01 05:00:00 18.1 29.5 23.7   NA 24.3
dt.pm10$Date[2135:2140]
[1] "2008-03-29 22:00:00 GMT" "2008-03-29 23:00:00 GMT"
[3] "2008-03-30 00:00:00 GMT" "2008-03-30 01:00:00 GMT"
[5] "2008-03-30 02:00:00 GMT" "2008-03-30 03:00:00 GMT"
Sys.setenv(TZ = 'EET')
dt.pm10$Date[2135:2140]
[1] "2008-03-30 00:00:00 EET"  "2008-03-30 01:00:00 EET" 
[3] "2008-03-30 02:00:00 EET"  "2008-03-30 04:00:00 EEST"
[5] "2008-03-30 05:00:00 EEST" "2008-03-30 06:00:00 EEST"

read.csv

Another approach to date time objects

dt.pm10 <- read.csv("pm10.csv", sep = ";")
# read date column as character
dt.pm10$Date <- strptime(dt.pm10$Date, "%Y-%m-%d %H:%M:%S")
head(dt.pm10)
                 Date sta1 sta2 sta3 sta4 sta5
1 2008-01-01 00:00:00   NA 36.6 56.9   NA 51.6
2 2008-01-01 01:00:00   NA 30.5 45.8   NA 40.4
3 2008-01-01 02:00:00   NA 33.3 25.3   NA 78.9
4 2008-01-01 03:00:00   NA   NA 20.4   NA 39.4
5 2008-01-01 04:00:00   NA 35.0 35.1   NA 54.6
6 2008-01-01 05:00:00 18.1 29.5 23.7   NA 24.3
dt.pm10$Date[2135:2141]
[1] "2008-03-29 22:00:00 EET"  "2008-03-29 23:00:00 EET" 
[3] "2008-03-30 00:00:00 EET"  "2008-03-30 01:00:00 EET" 
[5] "2008-03-30 02:00:00 EET"  "2008-03-30 03:00:00"     
[7] "2008-03-30 04:00:00 EEST"

This time you will loose timezone information at daylight saving transitions.

Best practices

  • Do not struggle with excel files. Save them as .csv, then read.
  • Organize your csv file(s) before read.
  • Try to fix all possible error.
  • Convert your date-time information to "%Y-%m-%d %H:%M:%S" format.
  • Save data as .rds file and load it by readRDS function.
  • If you need to read similar multiple files, best create your own function to read.

R Packages

  1. Overview
  2. Installing and loading new R packages
  3. Some popular R packages
  4. Writing your own R package

Overview

An R package is essentially a collection of R functions, compiled code, and sample data. It adds extra functionality to R, allows users to perform a wide range of tasks. They are stored in repositories like CRAN (Comprehensive R Archive Network), where users can download and install them as needed.

  1. Functions and Datasets: Packages usually contain a set of functions to perform specific tasks and sometimes include datasets for examples and testing.

  2. Documentation: Each package comes with documentation explaining how to use the functions and data it contains.

  3. Compiled Code: Some packages include compiled code written in languages like C, C++, or FORTRAN for more efficient computation.

  4. Vignettes: Many packages include vignettes, which are detailed guides and tutorials on how to use the package.

  5. Dependencies: Packages can depend on other packages, meaning they require certain other packages to be installed to function properly.

Installing new R packages

R has plenty of packages.

install.packages("rpart")
install.packages("ggplot2", "partykit")

Loading data from other R packages

Almost all of the packages comes with their own sample data.

data(package="rpart")
data(Puromycin, package="datasets")
?airquality
edit(airquality) # edit data if you need

If a package has been attached by library, its datasets are automatically included in the search.

SEE ALSO

?data | ?save | ?dput | ?saveRDS | ?edit

Data Manipulation and Analysis

  • dplyr - A grammar of data manipulation, focused on tools for working with data frames.
  • plyr - Tools for splitting, applying, and combining data.
  • tidyr - Tools for tidying data: turning messy datasets into structured ones.
  • reshape2 - Flexibly reshaping and pivoting data.
  • data.table - Extension of data.frame for fast aggregation and manipulation.
  • stringr - Consistent tools for working with strings (i.e., character vectors).
  • magrittr - Provides a mechanism for chaining commands with a new forward-pipe operator, %>%.
  • zoo - Functions for time-indexed data.
  • Matrix - Sparse and dense matrix classes and methods.

Data Manipulation and Analysis

  • survival - Contains the core survival analysis routines.
  • e1071 - Functions for latent class analysis, short time Fourier transform, fuzzy clustering, etc.
  • quantmod - Quantitative Financial Modelling Framework.
  • tm - Text Mining package.
  • rvest - Easily scrape (or harvest) web data.
  • jsonlite - A robust and quick way to parse JSON files and APIs.
  • Hmisc - Harrell miscellaneous, many functions for data analysis.
  • haven - Import and export ‘SPSS’, ‘Stata’ and ‘SAS’ files.
  • lubridate - Functions to work with dates and times.
  • sqldf - Perform SQL selects on R data frames.

Data Manipulation and Analysis

  • readxl - Read Excel files (.xls and .xlsx).
  • DBI - Defines a common interface between the R and database management systems.
  • broom - Converts statistical objects into tidy data frames.
  • forcats - Tools for working with categorical variables (factors).
  • modelr - Functions for modeling that work well with the pipe.
  • tidytext - Text mining using dplyr, ggplot2, and other tidy tools.

Graphics and Visualization

  • ggplot2 - A system for declaratively creating graphics, based on The Grammar of Graphics.
  • leaflet - Create interactive web maps with the JavaScript ‘Leaflet’ library.
  • plotly - Create interactive web graphics via ‘plotly.js’.
  • scales - Graphical scales map data to aesthetics.
  • gridExtra - Provides functions in addition to the grid package.
  • lattice - High-level data visualization system inspired by Trellis graphics.
  • ggvis - Interactive, web-based graphics built with the grammar of graphics.

Graphics and Visualization

  • highcharter - A wrapper for the ‘Highcharts’ library.
  • dygraphs - Interface to ‘Dygraphs’ interactive time series charting library.
  • rgl - 3D visualization using OpenGL.
  • ggmap - Spatial visualization with Google Maps and OpenStreetMap.
  • ggraph - Creates graphs based on the grammar of graphics.
  • plotrix - Various plotting functions.
  • ggally - Extension of ggplot2 to facilitate plot creation.

Statistical and Machine Learning

  • caret - Classification And REgression Training: tools for data splitting, pre-processing, feature selection, etc.
  • randomForest - Classification and regression based on a forest of trees using random inputs.
  • glmnet - Lasso and elastic-net regularized generalized linear models.
  • nnet - Feed-forward neural networks and multinomial log-linear models.
  • MASS - Functions and datasets to support the book Modern Applied Statistics with S.
  • xgboost - Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework.

Statistical and Machine Learning

  • lme4 - Linear and nonlinear mixed effects models.
  • survminer - Drawing survival curves using ‘ggplot2’.
  • party - Recursive PARTYtioning for classification and regression trees.
  • mboost - Model-Based Boosting.
  • brms - Bayesian regression models using ‘Stan’.
  • easystats - Collection of tools for statistical analysis.

Spatial and Time Series Data

  • sp - Classes and methods for spatial data.
  • rgdal - Bindings for the ‘Geospatial’ Data Abstraction Library.
  • raster - Geographic data analysis and modeling.
  • xts - eXtensible Time Series.
  • forecast - Forecasting functions for time series and linear models.
  • sf - Simple features for handling spatial objects.
  • tmap - Thematic maps.
  • geosphere - Spherical trigonometry for geographic applications.
  • stargazer - Well-formatted regression and summary statistics tables.

Web Technologies and APIs

  • shiny - Web Application Framework for R.
  • httr - Tools for working with URLs and HTTP.
  • curl - A modern and flexible web client for R.
  • XML - Tools for parsing and generating XML within R.
  • jsonlite - JSON parser/generator.
  • RCurl - General network (HTTP/FTP/…) client interface for R.
  • plumber - Enables you to create a web API by merely decorating your existing R source code.
  • googleVis - Interface between R and the Google Chart Tools.

Data Import/Export

  • readr - Read rectangular data.
  • Rcpp - Seamless R and C++ Integration.
  • rJava - Low-level R to Java interface.
  • RODBC - ODBC Database Access.
  • xlsx - Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files.
  • openxlsx - Simplifies the creation of .xlsx files.

Programming Tools

  • devtools - Tools to make developing R packages easier.
  • roxygen2 - In-line documentation for R.
  • testthat - Unit testing for R.
  • purrr - A functional programming toolkit for R.
  • tibble - A modern reimagining of the data frame.
  • stringi - Character string processing facilities.
  • usethis - Automates repetitive tasks that arise during project setup.
  • rlang - Functions for base types and core R and ‘Tidyverse’ features.
  • pkgdown - Build static html documentation for an R package.
  • covr - Test coverage reports for R.
  • profvis - Interactive visualizations to understand how R spends its time.

Reporting

  • RMarkdown - Dynamic documents for R.
  • knitr - General-purpose tool for dynamic report generation in R.
  • rmarkdown - Convert R Markdown documents

Loops and control flow

  • Conditional execution: if statements
  • Loops and control flow

Conditional execution: if statements

if (expr_1) expr_2 else expr_3

The “short-circuit” operators && and || are often used as part of the condition in an if statement. Whereas & and | apply element-wise to vectors, && and || apply to vectors of length one, and only evaluate their second argument if necessary.

age <- 12
if (age < 13) {
  print("Watch this with your Mom")
} else {
  print("Enjoy the movie!")
}
[1] "Watch this with your Mom"
x <- 0
if (x > 0) {
  print("Positive Number")
} else if (x < 0) {
  print("Negative Number")
} else {
  print("Zero")
}
[1] "Zero"
age <- 21
print(ifelse(age < 13, "Watch this with your Mom", "Enjoy the movie!"))
[1] "Enjoy the movie!"

SEE ALSO

?`if`; ?ifelse

More if-else examples/I

marks <- 75
if (marks >= 90) {
  grade <- "A"
} else {
  if (marks >= 80) {
    grade <- "B"
  } else {
    if (marks >= 70) {
      grade <- "C"
    } else {
      grade <- "F"
    }
  }
}
print(paste("Grade:", grade))
[1] "Grade: C"
student_marks <- c(50, 76, 90, 40, 85)
pass_fail <- ifelse(student_marks >= 60, "Pass", "Fail")
print(pass_fail)
[1] "Fail" "Pass" "Pass" "Fail" "Pass"
year <- 2020
if ((year %% 4 == 0 & year %% 100 != 0) 
    | year %% 400 == 0) {
  print(paste(year, "is a leap year"))
} else {
  print(paste(year, "is not a leap year"))
}
[1] "2020 is a leap year"

More if-else examples/II

# Define a function to categorize temperature
cat_temp <- function(temp) {
  if (temp < 0) {
    return("Freezing")
  } else if (temp >= 0 && temp < 10) {
    return("Cold")
  } else if (temp >= 10 && temp < 20) {
    return("Mild")
  } else if (temp >= 20 && temp < 30) {
    return("Warm")
  } else {
    return("Hot")
  }
}

temperature <- 22
weather_condition <- cat_temp(temperature)
print(paste("Weather condition:", weather_condition))
[1] "Weather condition: Warm"
# Define a function to classify age
classify_age <- function(age) {
  result <- if (age < 18) {
    "Minor"
  } else if (age >= 18 & age <= 65) {
    if (age < 30) {
      "Young Adult"
    } else if (age <= 50) {
      "Adult"
    } else {
      "Senior Adult"
    }
  } else {
    "Elderly"
  }
  return(result)
}

age_groups <- sapply(c(15, 25, 45, 70),
                     classify_age)
print(age_groups)
[1] "Minor"       "Young Adult" "Adult"       "Elderly"    

More if-else examples/III

# Define a function to classify age
age_class <- function(age) {
  if (age < 18) {
    "Minor"
  } else if (age >= 18 & age <= 65) {
    if (age < 30) {
      "Young Adult"
    } else if (age <= 50) {
      "Adult"
    } else {
      "Senior Adult"
    }
  } else {
    "Elderly"
  }
}

classify_age1 <- Vectorize(age_class)
ages <- c(15, 25, 45, 70)
age_groups <- classify_age1(ages)
print(age_groups)
[1] "Minor"       "Young Adult" "Adult"       "Elderly"    
# Define a function to classify age
classify_age2 <- function(age_vector) {
  age_breaks <- c(-Inf, 18, 30, 50, 65, Inf)
  age_labels <- c("Minor", "Young Adult",
                  "Adult", "Senior Adult",
                  "Elderly")
  
  age_categories <- cut(
    age_vector,
    age_breaks,
    age_labels, right = FALSE)
  return(age_categories)
}

ages <- c(15, 25, 45, 70)
age_groups <- classify_age2(ages)
print(age_groups)
[1] Minor       Young Adult Adult       Elderly    
Levels: Minor Young Adult Adult Senior Adult Elderly

More if-else examples/IV

ages <- c(15, 25, 45, 70)
microbenchmark::microbenchmark(
  classify_age1(ages),
  classify_age2(ages),
  times = 10000
)
Unit: microseconds
                expr  min   lq mean median   uq  max neval
 classify_age1(ages) 13.3 13.9 15.8   14.2 14.6 1245 10000
 classify_age2(ages) 20.6 21.4 22.7   21.6 22.0 1252 10000
set.seed(123) # Set seed for reproducibility
ages <- runif(1000, 0, 100)
microbenchmark::microbenchmark(
  classify_age1(ages),
  classify_age2(ages),
  times = 1000
)
Unit: microseconds
                expr   min    lq  mean median    uq    max
 classify_age1(ages) 375.3 390.7 423.7  394.7 399.4 2681.8
 classify_age2(ages)  42.8  44.5  46.9   45.4  47.7   89.1
 neval
  1000
  1000

for loops, repeat and while

for (name in expr_1) expr_2
repeat expr_2
while (condition) expr_2
  • expr_1 is a vector expression (often a sequence like 1:20) expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr_2 is repeatedly evaluated as name ranges through the values in the vector result of expr_1.
  • Only way to terminate repeat loops is break.
  • The next statement can be used to discontinue one particular cycle and skip to the “next”.

WARNING: AVOID FOR-LOOP STATEMENTS AS MUCH AS POSSIBLE.

SEE ALSO

?`for`; ?`repeat`; ?`while`

for loops, repeat and while

Calculate factorial for n = 5.

n <- 5
# Factorial calculation
# with for-loop
result <- 1
for (i in 2:n) {
  result <- result * i
}
print(result)
[1] 120
# Factorial calculation
# with while
result <- 1
i <- 2
while (i <= n) {
  result <- result * i
  i <- i + 1
}
print(result)
[1] 120
# Factorial calculation
# with repeat
result <- 1
i <- 2
repeat {
  if (i > n) {
    break
  }
  result <- result * i
  i <- i + 1
}
print(result)
[1] 120

for-loop

n <- 5
sum_of_squares <- 0
for (i in 1:n) {
  sum_of_squares <- sum_of_squares + i^2
}
print(sum_of_squares)
[1] 55

while

set.seed(123) # Set seed for reproducibility
count <- 0
roll <- 0
while (roll != 6) {
  roll <- sample(1:6, 1)  # Simulate rolling a die
  count <- count + 1
}
cat("I found", roll, "at roll", count, "\n")
I found 6 at roll 2 

repeat

divisible_by <- 7
num <- 8
repeat {
  if (num %% divisible_by == 0) {
    print(num)
    break
  }
  num <- num + 1
}
[1] 14

for loops, repeat and while

for (i in 2:3) {
  plot(dt.pm10[,1], dt.pm10[,i], type = "l", main = paste0("plot", i))
}
i <- 2
repeat {
  plot(dt.pm10[,1], dt.pm10[,i], type = "l", main = paste0("plot", i))
  i <- i + 1
  if (i == 3) break
}
i <- 2
while (i < 4) {
  plot(dt.pm10[,1], dt.pm10[,i], type = "l", main = paste0("plot", i))
  i <- i + 1
}
for (i in 2:4) {
  if (i == 3) next
  plot(dt.pm10[,1], dt.pm10[,i], type = "l", main = paste0("plot", i))
}

Writing your own functions

  • What is a function?
  • Scope
  • Calling functions in another functions
  • Fibonacci function
  • Prime numbers function

What is a Function?

> name <- function(arg_1, arg_2, ...) expression
> return(value)
  • We already have seen functions.
  • mean, sd, mean, summary all of them are base R functions
  • and are not different from the functions that you will write.
  • The expression is an R expression, (usually a grouped expression), that uses the arguments, arg i, to calculate a value. The value of the expression is the value returned for the function.
  • A call to the function then usually takes the form name(expr_1, expr_2, ...) and may occur anywhere a function call is legitimate.
# Create a function to calculate the volume of a cake
make_cake <- function(height, radius) pi * (radius ^ 2) * height
cake1 <- make_cake(0.3, 0.5)
cake2 <- make_cake(1, 2)
cat("Volume of cake1 is", cake1, "m^3 and Volume of cake2 is", cake2, "m^3\n")
Volume of cake1 is 0.236 m^3 and Volume of cake2 is 12.6 m^3

Functions

# Create a function to calculate the volume of a cake
make_cake <- function(height = 0.1, radius = 0.5) {
  cake <- pi * (radius ^ 2) * height
  return(cake)
}
cake1 <- make_cake()
cat("Volume of cake1 is", cake1, "m^3\n")
Volume of cake1 is 0.0785 m^3
cake2 <- make_cake(0.2)
cat("Volume of cake2 is", cake2, "m^3\n")
Volume of cake2 is 0.157 m^3
cake3 <- make_cake(radius = 2)
cat("Volume of cake3 is", cake3, "m^3\n")
Volume of cake3 is 1.26 m^3

Scope

What happens if I define same variable name in and out of a function?

myfunc <- function() {
  x <- 20
  print(x)
}
x <- 10
print(x)
[1] 10
myfunc()
[1] 20

Calling functions in another functions/I

area_of_rectangle <- function(height = 1, width = 1) {
  area <- height * width
  return(area)
}

area_of_square <- function(height = 1) {
  return(area_of_rectangle(height, height))
}

area_of_triangle <- function(height = 1, width = 1) {
  return(area_of_rectangle(height, width)/2)
}

Calling functions in another functions/II

vol_of_cube <- function(height = 1, width = 1, depth = 1) {
  height * width * depth
}

area_of_rectangle <- function(height = 1, width = 1) {
  vol_of_cube(height, width)
}

area_of_square <- function(height = 1) {
  area_of_rectangle(height, height)
}

area_of_triangle <- function(height = 1, width = 1) {
  area_of_rectangle(height, width)/2
}

Fibonacci function

A function for Fibonacci Sequence

fib <- function(n, last = TRUE) {
  x <- numeric(n)
  x[1:2] <- c(1, 1)
  for (i in 3:n) x[i] <- x[i-1] + x[i-2]
  if (last) x <- x[n]
  x
}

fib2 <- function(n, last = TRUE) {
  x <- if (last) n else 1:n
  round(((5 + sqrt(5)) / 10) * (( 1 + sqrt(5)) / 2) ^ (x - 1))
}

library(microbenchmark)
microbenchmark(fib(30, F), fib2(30, F))
Unit: microseconds
        expr  min   lq  mean median   uq  max neval
  fib(30, F) 2.05 2.13 23.62   2.17 2.25 2137   100
 fib2(30, F) 1.31 1.39  9.44   1.44 1.52  793   100

Prime Numbers function/I

A function that determines if a number is prime or not.

is.prime <- function(x) {
  x <- x[1] # make sure length of x is 1
  it.is.prime <- FALSE
  if(x > 1) {
    it.is.prime <- TRUE
    for (i in 2:(x - 1)) {
      if (x %% i == 0) {
        it.is.prime <- FALSE
        break
      }
    }
  }
  if (x == 2) it.is.prime <- TRUE
  return(it.is.prime)
}
is.prime(13)
is.prime(21)
is.prime(19999999)

Prime Numbers function/II

Another function for prime number determination. Which one is faster? is.prime or is.prime2?

is.prime2 <- function(x) {
  x <- x[1] # make sure length of x is 1
  it.is.prime <- FALSE
  if (x > 1) {
    it.is.prime <- TRUE
    i <- 2:(x - 1)
    if (any(x %% i == 0)) {
      it.is.prime <- FALSE
    }
  }
  if (x == 2) it.is.prime <- TRUE
  return(it.is.prime)
}

Prime Numbers function/III

Another function for prime number determination. Which one is faster? is.prime or is.prime2?

library(microbenchmark)
x <- 31
microbenchmark(is.prime(x), is.prime2(x))
Unit: nanoseconds
         expr  min   lq  mean median   uq     max neval
  is.prime(x) 2091 2132 29092   2173 2276 2675947   100
 is.prime2(x)  615  656 22658    656  738 2183701   100

Prime Numbers function/IV

A more interesting function for prime number determination. This time we can use vectors to test for prime.

is.prime3 <- function(x) {
  sapply(x, is.prime2)
}

Number of Primes below n

Now write another function calculates number of primes below n.

nprime <- function(n) {
  sapply(n, function(x) sum(sapply(1:x, is.prime2)))
}
nprime(10)
[1] 4
nprime(11:20)
 [1] 5 5 6 6 6 6 7 7 8 8

Apply Function family

  • Part I
    • apply
    • lapply
    • sapply
    • vapply
    • tapply
    • mapply
  • Part II
    • rapply
    • Map
    • Reduce
    • Filter
    • Find
    • Position
    • sweep
    • Negate

Part I

  1. apply: Used to apply a function to the rows or columns of a matrix or an array.

  2. lapply: Applies a function to each element of a list and returns a list. It is useful when you want to perform an operation on each element of a list and keep the results in a list.

  3. sapply: A user-friendly version of lapply. It applies a function to each element of a list, but tries to simplify the result to a vector or matrix if possible.

  4. vapply: Similar to sapply, but you can specify the type and structure of the output in advance, which makes it safer and can prevent unexpected results or errors.

  5. tapply: Applies a function to subsets of a vector broken down by factors and is particularly useful for data analysis.

  6. mapply: A multivariate version of sapply. It applies a function in parallel over sets of arguments (i.e., it can take multiple vectors/lists as input and apply a function to the corresponding elements of each).

apply function/I

apply(X, MARGIN, FUN, ..., simplify = TRUE)

This function is used to apply a function to the rows or columns of a matrix or array.

(m <- matrix(1:12, nrow = 3))
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
# 1 indicates rows, 2 would indicate columns
apply(m, 1, mean)
[1] 5.5 6.5 7.5
apply(m, 2, function(x) sum(x^3))
[1]   36  405 1584 4059

apply function/II

apply(X, MARGIN, FUN, ..., simplify = TRUE)

Apply a complex function (is.prime3) to each column of a matrix.

# or let's employ is.prime3 function
apply(m, 2, is.prime3)
      [,1]  [,2]  [,3]  [,4]
[1,] FALSE FALSE  TRUE FALSE
[2,]  TRUE  TRUE FALSE  TRUE
[3,]  TRUE FALSE FALSE FALSE
# or let's normalize each column
apply(m, 2, function(x) {
  return((x - min(x)) / (max(x) - min(x)))
})
     [,1] [,2] [,3] [,4]
[1,]  0.0  0.0  0.0  0.0
[2,]  0.5  0.5  0.5  0.5
[3,]  1.0  1.0  1.0  1.0

tapply function/I

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

This is used to apply a function over subsets of a vector and is particularly useful for data analysis.

numbers <- c(1, 2, 3, 4, 5, 6)
groups <- factor(c('A', 'B', 'A', 'B', 'A', 'B'))
tapply(numbers, groups, mean)
A B 
3 4 
# or let's normalize each group
tapply(numbers, groups, function(x) {
  return((x - min(x)) / (max(x) - min(x)))
})
$A
[1] 0.0 0.5 1.0

$B
[1] 0.0 0.5 1.0

tapply function/II

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

A more complex example.

# Create random example pollution data
pollutant <- runif(30)
city <- factor(
  rep(c("Istanbul", "Ankara", "Izmir"), 10))
year <- factor(
  rep(c("2020", "2021", "2022"), each = 10))

air_data <- data.frame(pollutant, city, year)
head(air_data)
  pollutant     city year
1     0.490 Istanbul 2020
2     0.879   Ankara 2020
3     0.813    Izmir 2020
4     0.854 Istanbul 2020
5     0.368   Ankara 2020
6     0.874    Izmir 2020
tapply(air_data$pollutant, 
       list(air_data$city, air_data$year), mean)
          2020  2021  2022
Ankara   0.509 0.832 0.561
Istanbul 0.618 0.162 0.234
Izmir    0.785 0.385 0.348
tapply(pollutant, list(city, year), mean)
          2020  2021  2022
Ankara   0.509 0.832 0.561
Istanbul 0.618 0.162 0.234
Izmir    0.785 0.385 0.348

sapply function/I

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

It applies a function over a list or vector and simplifies the result into a vector or matrix. It is a user-friendly version of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array().

v <- c(1, 4, 9, 16)
sapply(v, sqrt)
[1] 1 2 3 4
# or let's calculate sqrt and cube of each value
sapply(v, function(x) sqrt(x)^3)
[1]  1  8 27 64

sapply function/II

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
# Create random example pollution data
set.seed(123)  # For reproducibility
city_data <- list(
    Istanbul = data.frame(PM25 = rnorm(10), NO2 = rnorm(10)),
    Ankara = data.frame(PM25 = rnorm(10), NO2 = rnorm(10)),
    Izmir = data.frame(PM25 = rnorm(10), NO2 = rnorm(10))
)
# Let's calculate mean of PM25 for each city
sapply(city_data,
       function(x) mean(x$PM25, na.rm = TRUE))
Istanbul   Ankara    Izmir 
 0.07463 -0.42456 -0.00872 
# or let's calculate mean of each column
sapply(city_data, function(x) colMeans(x))
     Istanbul Ankara    Izmir
PM25   0.0746 -0.425 -0.00872
NO2    0.2086  0.322  0.22169
# or a shorter and tidier way
sapply(city_data, colMeans)
     Istanbul Ankara    Izmir
PM25   0.0746 -0.425 -0.00872
NO2    0.2086  0.322  0.22169

vapply function/I

vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

This is a safe version of sapply. You specify the type of output, which can avoid unexpected results.

v <- c(1, 4, 9, 16)
vapply(v, sqrt, numeric(1))
[1] 1 2 3 4
v <- 1:10
# what are the results?
vapply(v, is.prime2, numeric(1))
 [1] 0 1 1 0 1 0 1 0 0 0
# what are the results?
vapply(v, is.prime2, logical(1))
 [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE
[10] FALSE

vapply function/II

vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
# Create random example pollution data
set.seed(123)  # For reproducibility
measurements <- list(
    morning = rnorm(50, mean = 100, sd = 10),
    afternoon = rnorm(60, mean = 120, sd = 15),
    evening = rnorm(40, mean = 90, sd = 20)
)
vapply(measurements,
       function(x) c(min(x),
                     mean(x),
                     median(x),
                     max(x)),
       numeric(4))
     morning afternoon evening
[1,]    80.3      85.4    48.9
[2,]   100.3     120.8    85.6
[3,]    99.3     119.8    84.8
[4,]   121.7     152.8   132.0
vapply(measurements,
       function(x) c(min = min(x),
                     mean = mean(x),
                     median = median(x),
                     max = max(x)),
       numeric(4))
       morning afternoon evening
min       80.3      85.4    48.9
mean     100.3     120.8    85.6
median    99.3     119.8    84.8
max      121.7     152.8   132.0

mapply function/I

mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

mapply in R is a multivariate version of sapply. It applies a function in parallel over a set of arguments. This is particularly useful when you have several lists or vectors and you want to apply a function to the 1st elements of each, then the 2nd elements, and so on.

# Create random example pollution data
set.seed(123)  # For reproducibility
pm25 <- rnorm(7, mean = 35, sd = 5)  # PM2.5 readings for a week
no2 <- rnorm(7, mean = 50, sd = 10)  # NO2 readings for the same week
so2 <- rnorm(7, mean = 20, sd = 3)   # SO2 readings for the same week
# let's calculate mean for the pollutants.
mapply(mean, pm25, no2, so2)
[1] 32.2 33.8 42.8 35.4 35.6 43.6 37.3
# let's assume a hypothetical formula for an 
# air quality index
pollution_index <- function(pm25, no2, so2) {
  x <- (pm25 * 0.4 + no2 * 0.3 + so2 * 0.2)
  sqrt(x) / 3
}
mapply(pollution_index, pm25, no2, so2)
[1] 1.76 1.87 1.97 1.99 1.97 2.04 1.93

mapply function/II

set.seed(123)  # For reproducibility
actuals <- list(
  pm25 = rnorm(10), no2 = rnorm(10), so2 = rnorm(10))
predicted <- list(
  pm25 = rnorm(10), no2 = rnorm(10), so2 = rnorm(10))
# let's calculate RMSE for each pollutant
mapply(function(a, p) {
  sqrt(mean((a - p)^2))
}, actuals, predicted)
 pm25   no2   so2 
0.722 1.607 1.616 

or is there a tidier way?

sqrt(mean((actuals$pm25 - predicted$pm25)^2))
sapply(1:3, function(i) {
  sqrt(mean((actuals[[i]] - predicted[[i]])^2))
})
for (i in 1:3) {
  print(sqrt(mean((actuals[[i]] - predicted[[i]])^2)))
}

Part II

  1. rapply (Recursive Apply): This function is used to apply a function recursively to elements of a deeply nested list or expression at each level or at specific levels.

  2. Map: Similar to mapply (wrapper around mapply with SIMPLIFY = FALSE), but it always returns a list regardless of the output of the function being applied.

  3. Reduce: This function applies a function successively over elements of a vector/list.

  4. Filter: This function is used to filter elements of a list/vector that satisfy a certain condition given by a function.

  5. Find: Similar to Filter, but it returns the first element of a list/vector.

  6. Position: This function is used to find the position of the first or last element of a vector or list that satisfies a certain condition given by a function.

  7. sweep: It’s typically used for operations on arrays/matrices to apply a function to rows/columns, and it’s useful for data standardization.

  8. Negate: Used to create the negation of a given predicate function.

rapply function (Recursive version of lapply)

rapply(object, f, classes = "ANY", deflt = NULL,
       how = c("unlist", "replace", "list"), ...)

This function is used to apply a function recursively to elements of a list or expression. It’s particularly useful when dealing with deeply nested lists where you need to perform an operation at each level or at specific levels.

Example:

set.seed(123)  # For reproducibility
# Create a complex example data
measurements  <- sapply(
  c("Ankara", "Istanbul", "Izmir"),
  function(city) {
    sapply(paste0("day", 1:30), function(day) {
      sapply(c("PM25", "NO2", "SO2"),
             function(pol)
               rnorm(24, mean = 50, sd = 10),
             simplify = FALSE)
    }, simplify = FALSE)
  }, simplify = FALSE)

norm <- function(x) { # normalize function
    (x - min(x)) / (max(x) - min(x))
}
(normalized_measurements <- rapply(
  measurements, norm, how = "replace"))
$Ankara
$Ankara$day1
$Ankara$day1$PM25
 [1] 0.375 0.463 0.939 0.543 0.558 0.981 0.647 0.187 0.341
[10] 0.405 0.850 0.620 0.631 0.553 0.376 1.000 0.657 0.000
[19] 0.711 0.398 0.239 0.466 0.251 0.330

$Ankara$day1$NO2
 [1] 0.275 0.000 0.655 0.477 0.142 0.763 0.548 0.361 0.670
[10] 0.665 0.651 0.616 0.581 0.421 0.358 0.339 0.257 0.384
[19] 0.109 1.000 0.751 0.146 0.333 0.316

$Ankara$day1$SO2
 [1] 0.709 0.511 0.588 0.523 0.520 0.844 0.478 0.878 0.174
[10] 0.664 0.558 0.579 0.617 0.414 0.453 0.296 0.284 0.599
[19] 0.633 0.542 0.741 1.000 0.417 0.000


$Ankara$day2
$Ankara$day2$PM25
 [1] 0.863 0.198 0.206 0.870 0.363 0.000 0.543 0.419 0.475
[10] 0.622 0.329 0.723 0.387 0.601 0.898 0.641 0.347 0.918
[19] 0.858 0.685 0.565 0.230 1.000 0.240

$Ankara$day2$NO2
 [1] 1.000 0.830 0.372 0.166 0.248 0.499 0.369 0.342 0.186
[10] 0.421 0.229 0.000 0.334 0.671 0.283 0.590 0.013 0.418
[19] 0.567 0.511 0.460 0.266 0.212 0.167

$Ankara$day2$SO2
 [1] 0.548 0.279 0.394 0.454 0.984 0.354 0.578 0.538 0.275
[10] 0.500 0.883 0.632 0.529 0.412 0.000 0.804 0.150 0.705
[19] 1.000 0.154 0.695 0.452 0.121 0.136


$Ankara$day3
$Ankara$day3$PM25
 [1] 0.0000 0.2211 0.0289 0.4728 0.7644 0.0649 0.4934 0.4895
 [9] 0.3993 0.1225 0.3061 0.2728 0.4470 0.2538 0.5325 0.2534
[17] 0.5481 0.1141 0.0705 1.0000 0.2446 0.3923 0.4622 0.2308

$Ankara$day3$NO2
 [1] 0.5002 0.4543 0.2731 0.3602 0.3293 1.0000 0.1100 0.0000
 [9] 0.3516 0.4362 0.4753 0.1977 0.0101 0.7317 0.2315 0.0715
[17] 0.2666 0.2788 0.6841 0.3662 0.5738 0.1851 0.4064 0.2392

$Ankara$day3$SO2
 [1] 0.4004 0.1184 0.0000 0.9426 0.5447 0.0170 0.1993 0.0357
 [9] 1.0000 0.7474 0.2979 0.5283 0.2554 0.2378 0.1488 0.2041
[17] 0.8439 0.3581 0.4075 0.4429 0.7247 0.2264 0.0907 0.8509


$Ankara$day4
$Ankara$day4$PM25
 [1] 0.26766 0.18151 0.02467 0.00987 0.22707 0.59133 0.74164
 [8] 0.61871 0.29134 0.42073 0.18715 0.18329 0.67282 0.09211
[15] 1.00000 0.37487 0.46803 0.17678 0.22694 0.00000 0.34657
[22] 0.53051 0.50158 0.16364

$Ankara$day4$NO2
 [1] 0.2167 0.3000 0.8817 0.1152 0.3941 1.0000 0.4168 0.0504
 [9] 0.2527 0.5875 0.3369 0.2827 0.3461 0.4726 0.9115 0.4204
[17] 0.7608 0.6298 0.4131 0.0000 0.2945 0.3036 0.4600 0.8247

$Ankara$day4$SO2
 [1] 1.0000 0.8159 0.4009 0.0000 0.3377 0.4558 0.6424 0.6714
 [9] 0.6027 0.0892 0.6436 0.3235 0.4769 0.4522 0.5395 0.4398
[17] 0.0220 0.6156 0.5291 0.3682 0.4629 0.4669 0.4883 0.8389


$Ankara$day5
$Ankara$day5$PM25
 [1] 0.443 0.539 0.785 0.757 0.780 0.355 0.991 0.514 0.958
[10] 0.164 0.502 0.806 0.321 0.311 0.265 0.237 0.389 0.579
[19] 0.000 0.549 0.802 1.000 0.818 0.684

$Ankara$day5$NO2
 [1] 0.0871 0.3882 0.4550 0.7375 0.5209 0.2124 1.0000 0.7931
 [9] 0.6128 0.8752 0.1909 0.7261 0.4093 0.7322 0.5329 0.7186
[17] 0.9066 0.5512 0.8215 0.2312 0.3561 0.9558 0.6502 0.0000

$Ankara$day5$SO2
 [1] 0.2187 0.4497 0.6614 0.4693 0.6134 0.6799 0.8213 0.5006
 [9] 0.4792 0.1415 0.5092 0.3760 0.2962 0.4538 0.6910 0.0939
[17] 0.4047 0.5127 0.3122 0.5558 0.5712 0.4830 0.0000 1.0000


$Ankara$day6
$Ankara$day6$PM25
 [1] 0.3767 0.5803 0.4906 0.6691 0.6199 0.3757 0.5154 0.2008
 [9] 0.6292 0.3160 1.0000 0.0331 0.3153 0.6217 0.5468 0.2854
[17] 0.1886 0.4599 0.4221 0.0000 0.4338 0.4708 0.4671 0.1748

$Ankara$day6$NO2
 [1] 0.719 0.966 0.714 0.279 0.470 0.684 0.413 0.000 0.831
[10] 0.363 0.433 1.000 0.378 0.820 0.245 0.492 0.569 0.270
[19] 0.416 0.581 0.772 0.138 0.494 0.796

$Ankara$day6$SO2
 [1] 0.417 0.569 0.622 0.578 0.654 0.500 0.442 0.000 0.506
[10] 0.610 0.631 0.414 0.877 0.581 0.549 0.777 0.382 0.435
[19] 1.000 0.527 0.848 0.239 0.487 0.599


$Ankara$day7
$Ankara$day7$PM25
 [1] 0.724 0.405 0.655 0.387 0.825 0.916 0.107 0.952 0.347
[10] 0.762 0.812 0.602 0.493 0.691 0.758 0.866 0.149 0.250
[19] 1.000 0.906 0.757 0.825 0.875 0.000

$Ankara$day7$NO2
 [1] 0.7377 0.2541 0.4710 0.3116 0.6655 0.2954 0.5583 0.2826
 [9] 0.0698 0.7679 0.6257 0.9239 0.4208 0.7422 1.0000 0.3158
[17] 0.0000 0.3285 0.3362 0.4471 0.9202 0.3022 0.5142 0.3321

$Ankara$day7$SO2
 [1] 0.4618 0.5602 0.8999 0.4956 0.6938 0.7159 0.7614 0.2625
 [9] 1.0000 0.3273 0.4195 0.9253 0.8885 0.0898 0.1625 0.0000
[17] 0.5159 0.5102 0.5770 0.6400 0.2533 0.1221 0.7990 0.7066


$Ankara$day8
$Ankara$day8$PM25
 [1] 0.170 0.597 0.355 0.000 0.671 0.602 0.597 0.691 0.893
[10] 0.688 0.440 0.404 0.586 0.720 0.312 0.570 0.919 0.593
[19] 0.415 0.543 0.963 0.792 1.000 0.668

$Ankara$day8$NO2
 [1] 0.4754 0.2239 0.5261 0.2374 0.0000 0.4021 0.8742 0.5769
 [9] 0.6715 0.4621 0.2109 0.3164 0.2979 0.2472 0.5518 0.0580
[17] 0.5939 0.5933 0.0576 0.5350 1.0000 0.2242 0.5883 0.1658

$Ankara$day8$SO2
 [1] 0.826029 0.549295 1.000000 0.000000 0.452499 0.299608
 [7] 0.405578 0.266610 0.200948 0.655020 0.119384 0.946033
[13] 0.087681 0.501481 0.640319 0.657596 0.371992 0.494545
[19] 0.777989 0.000805 0.217697 0.571983 0.326013 0.909379


$Ankara$day9
$Ankara$day9$PM25
 [1] 0.746 0.617 0.279 0.607 0.534 0.580 0.349 0.708 0.379
[10] 0.553 0.683 0.434 0.727 0.320 0.000 0.701 0.782 0.540
[19] 0.710 0.354 0.574 0.186 0.855 1.000

$Ankara$day9$NO2
 [1] 0.857 0.600 0.599 0.253 0.791 0.557 0.453 0.277 0.537
[10] 0.409 0.514 0.323 1.000 0.603 0.857 0.000 0.501 0.449
[19] 0.321 0.967 0.277 0.681 0.804 0.648

$Ankara$day9$SO2
 [1] 0.2650 0.7157 0.5245 0.2182 0.1377 1.0000 0.3138 0.0217
 [9] 0.6145 0.5591 0.1462 0.0000 0.4533 0.7649 0.6400 0.3598
[17] 0.2752 0.5494 0.5212 0.6384 0.3839 0.7053 0.2760 0.4001


$Ankara$day10
$Ankara$day10$PM25
 [1] 0.7093 0.7684 0.0737 0.5590 0.6780 0.1894 0.6476 0.3371
 [9] 0.7756 0.6613 0.7160 0.5622 0.7384 0.8610 1.0000 0.5268
[17] 0.0000 0.5410 0.5823 0.4967 0.6684 0.7733 0.4107 0.4638

$Ankara$day10$NO2
 [1] 0.5459 0.3369 0.4783 0.0257 0.2113 0.1655 0.2700 0.3053
 [9] 0.5424 0.6747 0.2978 0.2384 0.2285 0.3668 0.4055 0.5648
[17] 0.3873 0.0000 0.4380 0.7199 0.7208 0.2843 0.1170 1.0000

$Ankara$day10$SO2
 [1] 0.4751 0.4963 0.6659 0.0000 0.2899 0.0241 0.3014 0.5671
 [9] 0.0816 0.7213 0.6229 0.2323 0.6023 0.7731 0.8755 0.4586
[17] 0.4976 0.4999 1.0000 0.8965 0.8013 0.4342 0.6504 0.6602


$Ankara$day11
$Ankara$day11$PM25
 [1] 0.2085 0.0498 0.5976 0.0958 0.4496 0.5071 0.5796 0.4930
 [9] 0.2799 0.6772 0.8511 0.6529 0.0000 0.7727 0.4363 0.5706
[17] 0.5930 0.4269 0.4318 0.6848 0.6137 0.6781 1.0000 0.6028

$Ankara$day11$NO2
 [1] 0.4932 0.0000 1.0000 0.3525 0.9353 0.5274 0.7647 0.4286
 [9] 0.5553 0.4946 0.4130 0.4264 0.6575 0.4098 0.0353 0.4110
[17] 0.5610 0.5767 0.5767 0.1058 0.5262 0.6484 0.7113 0.4050

$Ankara$day11$SO2
 [1] 0.4266 0.8983 0.0757 0.3966 0.6297 0.0887 0.5834 1.0000
 [9] 0.5094 0.4379 0.6343 0.4377 0.7225 0.7462 0.5112 0.2037
[17] 0.1668 0.3559 0.7183 0.0000 0.0187 0.3400 0.6133 0.2723


$Ankara$day12
$Ankara$day12$PM25
 [1] 0.727 0.608 0.501 0.712 0.620 0.442 0.587 0.676 0.613
[10] 0.396 0.720 0.784 0.596 0.568 0.521 1.000 0.596 0.503
[19] 0.000 0.202 0.520 0.581 0.596 0.713

$Ankara$day12$NO2
 [1] 0.4923 0.9104 0.2764 0.1912 0.1851 0.7702 0.5253 0.7843
 [9] 0.4704 0.3742 0.4406 0.0000 0.2027 0.1219 0.8628 0.3726
[17] 0.7842 1.0000 0.4930 0.6341 0.0301 0.3069 0.5666 0.6387

$Ankara$day12$SO2
 [1] 0.1688 1.0000 0.2414 0.3676 0.5089 0.3458 0.1968 0.6043
 [9] 0.7419 0.3449 0.4878 0.3286 0.5974 0.0730 0.9122 0.2064
[17] 0.0000 0.2674 0.3875 0.7439 0.5720 0.3864 0.0167 0.1431


$Ankara$day13
$Ankara$day13$PM25
 [1] 0.385 1.000 0.000 0.678 0.302 0.799 0.163 0.454 0.172
[10] 0.626 0.758 0.864 0.602 0.839 0.704 0.392 0.530 0.201
[19] 0.380 0.698 0.544 0.525 0.186 0.285

$Ankara$day13$NO2
 [1] 0.1119 0.4203 0.0518 0.4646 0.0183 0.8014 0.4376 0.1288
 [9] 0.3793 0.2215 0.4139 0.3799 0.0455 0.1048 0.3948 0.7510
[17] 0.6036 0.1491 0.5348 0.3023 0.3953 0.3844 1.0000 0.0000

$Ankara$day13$SO2
 [1] 0.4591 0.0329 0.5699 0.6847 0.7638 0.0000 0.5298 0.7973
 [9] 0.3424 0.3882 0.4280 0.5189 0.5870 0.7348 0.4146 0.4807
[17] 1.0000 0.5339 0.4790 0.1321 0.4369 0.3861 0.4337 0.3058


$Ankara$day14
$Ankara$day14$PM25
 [1] 0.2706 0.2614 0.3985 0.8271 0.7407 0.3591 0.0000 0.2278
 [9] 0.3326 0.4846 0.2771 1.0000 0.3828 0.1278 0.4183 0.1843
[17] 0.4989 0.5198 0.5529 0.0173 0.5163 0.5406 0.0717 0.7153

$Ankara$day14$NO2
 [1] 0.6556 0.9988 0.5135 0.4886 0.4999 0.4938 0.0595 0.0996
 [9] 0.4605 0.3465 0.4904 0.0000 0.1387 0.2286 0.7013 0.2293
[17] 0.2970 0.4961 0.4050 0.5112 1.0000 0.2234 0.4089 0.3610

$Ankara$day14$SO2
 [1] 0.818 0.358 0.497 0.672 0.300 0.847 0.649 0.297 0.605
[10] 0.470 0.576 0.495 0.729 0.241 0.408 0.463 0.313 0.304
[19] 0.510 0.487 0.000 0.723 0.564 1.000


$Ankara$day15
$Ankara$day15$PM25
 [1] 0.5208 0.2682 0.9922 1.0000 0.0959 0.4726 0.3208 0.4097
 [9] 0.4187 0.0863 0.4316 0.7583 0.3313 0.3315 0.6800 0.5684
[17] 0.0000 0.4189 0.7369 0.6838 0.4616 0.5289 0.1008 0.4349

$Ankara$day15$NO2
 [1] 0.1145 0.2728 0.7173 0.0563 0.5040 0.5555 0.4364 1.0000
 [9] 0.0928 0.6122 0.3905 0.3357 0.5669 0.0935 0.0000 0.2612
[17] 0.9775 0.4367 0.5246 0.3583 0.5689 0.1154 0.7882 0.6053

$Ankara$day15$SO2
 [1] 0.0986 0.3987 0.6295 0.5742 0.8964 0.1660 0.7689 0.7411
 [9] 0.3231 0.0395 0.0000 0.5393 0.3549 1.0000 0.5663 0.2500
[17] 0.0222 0.1767 0.4993 0.2225 0.5590 0.2632 0.3429 0.1250


$Ankara$day16
$Ankara$day16$PM25
 [1] 1.000 0.633 0.609 0.160 0.709 0.316 0.347 0.770 0.449
[10] 0.147 0.968 0.000 0.689 0.793 0.408 0.659 0.658 0.798
[19] 0.668 0.498 0.769 0.742 0.336 0.228

$Ankara$day16$NO2
 [1] 0.417 0.334 0.197 0.403 0.132 0.418 0.266 0.571 0.854
[10] 0.210 0.590 0.545 0.373 0.299 0.690 0.000 0.263 0.110
[19] 0.655 0.350 0.285 1.000 0.377 0.301

$Ankara$day16$SO2
 [1] 0.594 0.000 0.654 0.468 0.647 0.186 0.173 0.278 0.254
[10] 0.911 1.000 0.448 0.384 0.394 0.520 0.529 0.590 0.261
[19] 0.229 0.879 0.324 0.595 0.480 0.194


$Ankara$day17
$Ankara$day17$PM25
 [1] 0.745 0.626 0.976 0.610 0.482 0.656 0.238 0.338 0.722
[10] 0.481 0.558 0.544 0.391 0.000 0.676 0.794 0.208 1.000
[19] 0.687 0.960 0.303 0.624 0.548 0.796

$Ankara$day17$NO2
 [1] 0.3695 0.4983 0.0957 0.3236 0.5291 0.3398 0.4907 0.3941
 [9] 0.6388 0.7666 0.3134 0.4307 0.8284 0.0000 0.4278 0.9707
[17] 1.0000 0.5363 0.3040 0.5383 0.3520 0.8314 0.3170 0.5855

$Ankara$day17$SO2
 [1] 0.85127 0.43162 0.92186 0.43453 0.85444 0.99655 0.36125
 [8] 1.00000 0.51326 0.75210 0.72391 0.53226 0.74410 0.28753
[15] 0.62094 0.91402 0.00000 0.20907 0.30839 0.17825 0.79025
[22] 0.35989 0.00292 0.46816


$Ankara$day18
$Ankara$day18$PM25
 [1] 0.3971 0.7179 0.2136 0.7962 0.6000 0.5907 0.2783 1.0000
 [9] 0.4553 0.0282 0.4451 0.0000 0.4177 0.3889 0.6729 0.7572
[17] 0.3108 0.7049 0.8639 0.2959 0.3297 0.6649 0.5446 0.6387

$Ankara$day18$NO2
 [1] 0.9888 0.5879 0.8498 0.6808 0.6063 0.5875 0.9442 0.0405
 [9] 0.0000 0.6686 0.3883 0.4646 0.7275 0.7322 0.3319 0.3123
[17] 0.4644 0.8542 0.6033 1.0000 0.4324 0.8789 0.7833 0.6766

$Ankara$day18$SO2
 [1] 0.6676 0.4976 0.5394 0.6223 0.5781 0.7969 0.5394 0.5489
 [9] 0.4476 0.5323 0.5493 0.0000 0.2322 0.5484 0.5012 0.2723
[17] 0.0788 0.4794 0.7260 1.0000 0.5567 0.9315 0.5864 0.5944


$Ankara$day19
$Ankara$day19$PM25
 [1] 0.666 0.273 0.542 0.857 0.313 0.454 0.182 0.676 0.699
[10] 0.772 0.159 0.594 0.249 0.559 1.000 0.394 0.794 0.313
[19] 0.402 0.703 0.336 0.000 0.975 0.677

$Ankara$day19$NO2
 [1] 0.2619 0.0000 0.7236 1.0000 0.4089 0.5282 0.5978 0.2400
 [9] 0.3051 0.0496 0.2335 0.4859 0.3163 0.3306 0.3494 0.0242
[17] 0.5259 0.1023 0.3316 0.2889 0.3297 0.6415 0.2622 0.2249

$Ankara$day19$SO2
 [1] 0.00000 0.20253 0.10027 0.54680 0.18627 0.51733 0.38954
 [8] 0.02442 0.20681 0.68216 0.68726 1.00000 0.43006 0.00676
[15] 0.21310 0.54696 0.97009 0.46229 0.75457 0.64896 0.09973
[22] 0.38254 0.80134 0.67960


$Ankara$day20
$Ankara$day20$PM25
 [1] 0.745 0.749 0.453 0.199 1.000 0.168 0.315 0.796 0.216
[10] 0.434 0.241 0.793 0.740 0.732 0.739 0.774 0.799 0.000
[19] 0.558 0.638 0.825 0.239 0.450 0.274

$Ankara$day20$NO2
 [1] 0.900 0.972 0.812 0.156 0.702 0.930 0.610 0.370 0.201
[10] 0.338 0.788 0.414 0.629 0.977 0.744 1.000 0.000 0.463
[19] 0.576 0.744 0.174 0.818 0.655 0.641

$Ankara$day20$SO2
 [1] 0.749 0.603 0.847 0.782 0.595 0.874 0.699 0.787 0.491
[10] 1.000 0.629 0.829 0.607 0.565 0.814 0.260 0.742 0.722
[19] 0.000 0.717 0.123 0.785 0.524 0.468


$Ankara$day21
$Ankara$day21$PM25
 [1] 0.5167 0.8757 0.3532 0.6637 0.6072 0.2116 0.0655 0.0000
 [9] 0.6619 0.5094 1.0000 0.6677 0.7396 0.8657 0.3882 0.6512
[17] 0.7445 0.8748 0.7169 0.6281 0.2926 0.7083 0.6486 0.6835

$Ankara$day21$NO2
 [1] 0.7171 0.5200 0.5594 0.8003 0.6377 0.3220 0.4922 0.1679
 [9] 0.2788 0.5380 0.0169 0.0514 0.9311 0.4019 0.0000 0.5480
[17] 1.0000 0.7924 0.7227 0.9567 0.3405 0.8136 0.3986 0.5839

$Ankara$day21$SO2
 [1] 0.982 0.658 0.605 0.117 0.201 1.000 0.121 0.461 0.638
[10] 0.671 0.689 0.363 0.192 0.326 0.171 0.569 0.697 0.959
[19] 0.501 0.178 0.672 0.641 0.343 0.000


$Ankara$day22
$Ankara$day22$PM25
 [1] 0.679 0.797 0.341 0.607 0.347 0.654 0.398 0.325 0.471
[10] 0.344 0.246 0.334 0.433 0.502 0.926 0.000 0.371 0.638
[19] 1.000 0.572 0.431 0.337 0.568 0.752

$Ankara$day22$NO2
 [1] 0.4361 0.5473 0.5644 0.7284 0.8507 0.5843 0.4705 0.9414
 [9] 0.5476 1.0000 0.4751 0.6914 0.0000 0.4589 0.4879 0.2522
[17] 0.1633 0.7984 0.7875 0.0792 0.6308 0.1388 0.1585 0.6631

$Ankara$day22$SO2
 [1] 0.3088 0.8858 0.5577 0.7287 0.3646 0.1490 0.8588 0.6263
 [9] 0.4624 0.5683 1.0000 0.3047 0.2871 0.3316 0.0000 0.8261
[17] 0.6106 0.6157 0.5296 0.3692 0.2616 0.8371 0.0546 0.6958


$Ankara$day23
$Ankara$day23$PM25
 [1] 0.308 0.535 0.319 0.365 0.850 0.617 0.022 0.502 0.380
[10] 0.235 0.520 0.503 0.406 0.893 0.000 0.653 0.415 0.615
[19] 0.380 0.350 0.114 0.431 1.000 0.685

$Ankara$day23$NO2
 [1] 0.7001 0.5589 0.6015 0.5566 0.4773 0.2808 0.2900 0.0917
 [9] 0.3879 0.5560 0.5806 0.4067 0.2873 1.0000 0.5258 0.6528
[17] 0.3263 0.3819 0.4723 0.7003 0.3311 0.5475 0.0000 0.2948

$Ankara$day23$SO2
 [1] 0.2304 0.8668 0.5287 0.4689 0.9345 0.8676 0.3346 0.6855
 [9] 0.2472 0.4464 1.0000 0.9560 0.8657 0.4212 0.0744 0.2611
[17] 0.8243 0.4954 0.0000 0.5246 0.9454 0.7425 0.3549 0.1632


$Ankara$day24
$Ankara$day24$PM25
 [1] 0.3141 0.5571 0.7979 0.0000 0.8361 0.5489 0.7081 0.0809
 [9] 0.3593 0.4209 0.8419 0.8375 0.6701 1.0000 0.4019 0.8331
[17] 0.5372 0.3591 0.8079 0.9370 0.7952 0.4036 0.4148 0.9361

$Ankara$day24$NO2
 [1] 0.692 0.789 0.551 0.876 0.455 0.747 0.454 0.119 0.347
[10] 0.648 0.800 0.264 0.365 0.979 0.532 0.723 0.513 0.559
[19] 0.638 0.622 0.581 0.524 0.000 1.000

$Ankara$day24$SO2
 [1] 0.832 0.212 0.412 0.743 0.662 0.722 0.219 1.000 0.206
[10] 0.610 0.478 0.362 0.367 0.236 0.796 0.467 0.504 0.493
[19] 0.368 0.205 0.503 0.000 0.145 0.218


$Ankara$day25
$Ankara$day25$PM25
 [1] 0.1285 0.4775 0.9163 0.0000 0.5735 0.7106 0.1547 0.4304
 [9] 0.6883 0.3707 0.3810 0.5365 0.4283 0.2824 0.4051 1.0000
[17] 0.7030 0.1768 0.0796 0.3400 0.8370 0.5321 0.2395 0.7055

$Ankara$day25$NO2
 [1] 0.4334 0.2095 0.5682 0.8093 0.5622 0.5072 0.3148 0.5926
 [9] 0.2209 0.0887 0.4538 0.4558 1.0000 0.1878 0.5024 0.6462
[17] 0.7019 0.4612 0.2951 0.4487 0.9035 0.5731 0.0000 0.7865

$Ankara$day25$SO2
 [1] 0.61406 0.39798 0.40389 0.35423 0.50718 0.22095 0.51488
 [8] 0.06004 0.48595 0.82724 0.60228 0.79973 0.00000 0.43124
[15] 0.64295 0.00861 1.00000 0.02110 0.47254 0.46776 0.45641
[22] 0.33141 0.57997 0.02247


$Ankara$day26
$Ankara$day26$PM25
 [1] 0.0385 0.2507 0.4503 0.8895 1.0000 0.5762 0.5970 0.3880
 [9] 0.4855 0.0000 0.5663 0.2904 0.5813 0.3538 0.2483 0.3962
[17] 0.3468 0.8353 0.5394 0.6140 0.0547 0.9697 0.5227 0.1832

$Ankara$day26$NO2
 [1] 0.6133 0.7705 0.1394 1.0000 0.4826 0.8407 0.5470 0.4337
 [9] 0.0131 0.2188 0.5152 0.7586 0.0000 0.9119 0.9185 0.2067
[17] 0.6590 0.2374 0.4918 0.7868 0.7699 0.2891 0.4186 0.2535

$Ankara$day26$SO2
 [1] 0.606 0.218 0.000 0.836 1.000 0.443 0.692 0.535 0.837
[10] 0.649 0.578 0.169 0.757 0.465 0.250 0.495 0.510 0.752
[19] 0.298 0.423 0.840 0.813 0.774 0.875


$Ankara$day27
$Ankara$day27$PM25
 [1] 0.495 0.316 0.576 0.691 1.000 0.000 0.511 0.598 0.436
[10] 0.093 0.618 0.271 0.564 0.656 0.742 0.093 0.696 0.276
[19] 0.154 0.749 0.711 0.393 0.622 0.252

$Ankara$day27$NO2
 [1] 0.486 0.497 0.460 0.661 0.820 0.425 0.509 0.839 0.450
[10] 0.000 0.617 0.340 0.318 0.797 1.000 0.310 0.720 0.497
[19] 0.544 0.757 0.558 0.347 0.422 0.651

$Ankara$day27$SO2
 [1] 0.0000 0.9433 0.2114 0.6482 0.5899 0.9977 0.6941 0.6702
 [9] 0.8342 0.4148 0.5884 0.9660 0.8301 0.2915 0.8176 0.2635
[17] 0.4777 0.9751 0.0233 0.1531 1.0000 0.4182 0.9841 0.5960


$Ankara$day28
$Ankara$day28$PM25
 [1] 0.504 0.584 0.280 0.352 0.329 1.000 0.441 0.513 0.344
[10] 0.365 0.702 0.345 0.419 0.493 0.326 0.822 0.119 0.833
[19] 0.342 0.697 0.000 0.614 0.597 0.151

$Ankara$day28$NO2
 [1] 0.276 0.556 0.832 0.276 0.434 0.957 0.171 0.775 0.630
[10] 0.177 0.305 0.727 0.572 0.914 0.324 0.000 0.805 0.322
[19] 0.534 1.000 0.426 0.951 0.180 0.769

$Ankara$day28$SO2
 [1] 0.533 0.278 0.137 0.489 0.535 0.541 0.608 0.349 0.336
[10] 0.530 0.328 0.785 0.514 0.309 0.000 0.302 1.000 0.323
[19] 0.252 0.638 0.584 0.664 0.397 0.760


$Ankara$day29
$Ankara$day29$PM25
 [1] 0.0000 0.3478 0.1450 0.5736 0.4031 0.7064 1.0000 0.4064
 [9] 0.1975 0.1080 0.2815 0.3414 0.9794 0.0135 0.0246 0.2367
[17] 0.5448 0.6170 0.0819 0.4022 0.3496 0.7328 0.7000 0.2663

$Ankara$day29$NO2
 [1] 0.534 0.724 0.000 0.422 0.700 0.355 0.501 0.513 0.292
[10] 0.207 0.457 0.410 0.781 1.000 0.516 0.645 0.497 0.708
[19] 0.588 0.594 0.869 0.209 0.974 0.629

$Ankara$day29$SO2
 [1] 0.7837 0.5515 0.6757 0.8441 0.0805 0.5386 0.9106 0.6348
 [9] 0.4763 0.5279 0.8009 0.2149 1.0000 0.0476 0.5694 0.4804
[17] 0.3359 0.3894 0.8054 0.6895 0.6328 0.0000 0.3122 0.6859


$Ankara$day30
$Ankara$day30$PM25
 [1] 0.1219 0.1589 1.0000 0.6019 0.0463 0.3084 0.3921 0.2120
 [9] 0.0827 0.4983 0.8595 0.6252 0.4307 0.6985 0.8835 0.4579
[17] 0.7295 0.0000 0.4081 0.3304 0.6110 0.6847 0.4656 0.1728

$Ankara$day30$NO2
 [1] 0.396 0.698 0.969 0.431 0.532 0.711 0.490 0.659 0.617
[10] 0.434 0.837 0.841 0.885 0.946 0.502 0.512 0.694 0.786
[19] 0.543 0.522 0.474 1.000 0.000 0.587

$Ankara$day30$SO2
 [1] 0.5778 0.3014 0.6653 0.0729 0.8087 0.1494 0.0135 0.3759
 [9] 0.6090 0.4041 0.4274 0.6521 0.3016 0.5466 0.6936 0.9797
[17] 0.6889 1.0000 0.6823 0.5674 0.0000 0.2598 0.4879 0.7476



$Istanbul
$Istanbul$day1
$Istanbul$day1$PM25
 [1] 0.535 0.820 0.291 0.551 0.712 0.523 0.678 0.420 0.764
[10] 0.614 0.626 0.000 0.419 0.151 0.113 0.479 0.588 0.734
[19] 1.000 0.787 0.974 0.468 0.540 0.998

$Istanbul$day1$NO2
 [1] 0.424 0.000 0.210 0.678 0.270 0.508 0.428 0.299 0.336
[10] 0.523 0.477 0.736 0.374 0.191 0.466 0.602 1.000 0.307
[19] 0.342 0.647 0.747 0.258 0.592 0.281

$Istanbul$day1$SO2
 [1] 0.326 0.714 0.587 0.657 0.420 1.000 0.724 0.470 0.733
[10] 0.265 0.528 0.904 0.515 0.784 0.507 0.000 0.346 0.616
[19] 0.915 0.416 0.629 0.326 0.313 0.372


$Istanbul$day2
$Istanbul$day2$PM25
 [1] 1.0000 0.8761 0.7759 0.9267 0.4816 0.8238 0.6649 0.3011
 [9] 0.3641 0.6188 0.6637 0.5114 0.3163 0.0000 0.7272 0.5190
[17] 0.7630 0.1575 0.2421 0.5721 0.1960 0.0312 0.8308 0.8647

$Istanbul$day2$NO2
 [1] 0.258 0.517 0.223 0.718 0.599 0.313 0.345 0.247 0.758
[10] 0.804 0.738 0.322 0.000 0.791 0.549 0.596 0.962 0.305
[19] 1.000 0.469 0.366 0.539 0.743 0.539

$Istanbul$day2$SO2
 [1] 0.393 0.542 0.000 0.487 0.719 0.569 0.774 0.726 0.764
[10] 0.767 0.193 0.610 1.000 0.494 0.718 0.644 0.743 0.544
[19] 0.454 0.627 0.659 0.565 0.617 0.708


$Istanbul$day3
$Istanbul$day3$PM25
 [1] 0.293 0.748 1.000 0.828 0.370 0.045 0.491 0.687 0.662
[10] 0.186 0.213 0.509 0.626 0.685 0.671 0.248 0.166 0.000
[19] 0.598 0.674 0.557 0.226 0.850 0.390

$Istanbul$day3$NO2
 [1] 1.000 0.668 0.841 0.608 0.501 0.245 0.368 0.506 0.595
[10] 0.510 0.471 0.569 0.377 0.374 0.125 0.748 0.532 0.496
[19] 0.319 0.452 0.225 0.235 0.000 0.784

$Istanbul$day3$SO2
 [1] 0.3003 0.0000 0.5510 0.4207 0.4492 0.2153 0.1677 0.2461
 [9] 0.3676 0.4631 0.6539 0.1468 0.0245 0.2942 1.0000 0.1663
[17] 0.5466 0.2111 0.1542 0.0982 0.2188 0.6665 0.3428 0.3876


$Istanbul$day4
$Istanbul$day4$PM25
 [1] 0.749 1.000 0.737 0.289 0.369 0.256 0.000 0.841 0.479
[10] 0.659 0.819 0.475 0.552 0.650 0.625 0.422 0.727 0.690
[19] 0.520 0.331 0.708 0.869 0.860 0.756

$Istanbul$day4$NO2
 [1] 0.4840 0.4712 0.4563 0.5030 0.8003 0.6802 0.1197 0.1186
 [9] 0.6061 0.6520 0.0000 0.3956 0.6195 1.0000 0.0782 0.2791
[17] 0.3256 0.6587 0.7000 0.5271 0.1844 0.4449 0.4635 0.5584

$Istanbul$day4$SO2
 [1] 0.7202 0.1417 0.3312 0.1345 0.0000 0.4719 0.0477 1.0000
 [9] 0.3353 0.3271 0.2952 0.7757 0.4790 0.2488 0.0186 0.0507
[17] 0.1145 0.6176 0.2644 0.1215 0.5112 0.6503 0.1373 0.2001


$Istanbul$day5
$Istanbul$day5$PM25
 [1] 0.5824 0.4756 0.6010 1.0000 0.4586 0.3914 0.6251 0.4620
 [9] 0.6686 0.4087 0.6124 0.0000 0.4813 0.3196 0.1139 0.3510
[17] 0.5905 0.5052 0.6886 0.7641 0.7298 0.3781 0.0719 0.3915

$Istanbul$day5$NO2
 [1] 0.7459 0.4028 0.0000 0.5656 0.5099 0.3286 0.4811 0.7989
 [9] 0.7803 0.3138 0.4379 0.0487 0.2417 0.9687 0.8331 0.3741
[17] 0.2511 0.0719 1.0000 0.4655 0.2772 0.7215 0.1429 0.6281

$Istanbul$day5$SO2
 [1] 0.815 0.307 0.818 0.569 0.162 0.548 0.154 0.207 0.610
[10] 0.225 0.364 0.685 0.000 0.862 1.000 0.560 0.501 0.218
[19] 0.522 0.217 0.460 0.160 0.614 0.584


$Istanbul$day6
$Istanbul$day6$PM25
 [1] 0.8303 1.0000 0.3965 0.7667 0.2173 0.4399 0.7620 0.3754
 [9] 0.6243 0.3382 0.4599 0.5356 0.4044 0.0982 0.6007 0.6540
[17] 0.5763 0.4327 0.7516 0.7923 0.0000 0.8031 0.1364 0.4658

$Istanbul$day6$NO2
 [1] 0.254 0.379 0.889 0.684 0.674 0.000 0.596 0.709 0.770
[10] 0.289 0.790 0.179 1.000 0.335 0.745 0.628 0.266 0.844
[19] 0.553 0.759 0.602 0.772 0.550 0.837

$Istanbul$day6$SO2
 [1] 0.389 0.647 0.603 0.740 0.311 0.545 0.323 0.759 0.306
[10] 1.000 0.724 0.489 0.846 0.468 0.658 0.647 0.617 0.318
[19] 0.126 0.000 0.359 0.693 0.423 0.140


$Istanbul$day7
$Istanbul$day7$PM25
 [1] 1.000 0.686 0.955 0.796 0.402 0.445 0.222 0.694 0.885
[10] 0.335 0.274 0.725 0.206 0.949 0.657 0.163 0.610 0.556
[19] 0.000 0.525 0.407 0.342 0.585 0.718

$Istanbul$day7$NO2
 [1] 1.000 0.000 0.774 0.385 0.869 0.683 0.935 0.264 0.717
[10] 0.672 0.844 0.551 0.866 0.502 0.505 0.610 0.359 0.499
[19] 0.858 0.983 0.859 0.506 0.677 0.626

$Istanbul$day7$SO2
 [1] 0.0000 0.5510 0.1580 0.0721 0.7211 0.2003 0.4052 0.1037
 [9] 0.4778 0.6364 0.6401 0.0668 0.9146 1.0000 0.5930 0.5147
[17] 0.4856 0.5511 0.5943 0.5072 0.6951 0.3355 0.8679 0.4438


$Istanbul$day8
$Istanbul$day8$PM25
 [1] 0.1249 0.6103 0.7705 1.0000 0.6571 0.0709 0.6409 0.2880
 [9] 0.3534 0.3389 0.4053 0.9728 0.3958 0.5108 0.6935 0.9005
[17] 0.4441 0.3973 0.4875 0.2961 0.9091 0.0000 0.4848 0.2171

$Istanbul$day8$NO2
 [1] 0.144 0.617 0.977 1.000 0.476 0.444 0.482 0.780 0.000
[10] 0.669 0.514 0.433 0.132 0.561 0.217 0.607 0.397 0.621
[19] 0.720 0.490 0.367 0.465 0.451 0.255

$Istanbul$day8$SO2
 [1] 0.1555 0.3552 0.3459 0.5331 0.8604 0.5345 0.9230 0.2006
 [9] 0.8804 0.6229 0.0156 0.5578 0.0000 0.7432 0.2374 0.3407
[17] 0.1899 0.1311 1.0000 0.7584 0.6507 0.4461 0.6715 0.6461


$Istanbul$day9
$Istanbul$day9$PM25
 [1] 0.936762 0.757288 0.505701 0.503627 0.443322 0.468173
 [7] 0.498437 0.846096 0.000213 0.387638 0.523162 0.249149
[13] 0.940473 0.499089 0.288771 0.342437 0.762613 1.000000
[19] 0.525214 0.580307 0.328879 0.868430 0.726946 0.000000

$Istanbul$day9$NO2
 [1] 0.402 0.512 0.279 0.502 0.272 0.670 0.664 0.791 0.686
[10] 1.000 0.472 0.000 0.581 0.748 0.468 0.948 0.867 0.718
[19] 0.593 0.427 0.527 0.700 0.833 0.496

$Istanbul$day9$SO2
 [1] 0.1799 1.0000 0.4990 0.1350 0.6046 0.3821 0.5779 0.3796
 [9] 0.3513 0.3895 0.4720 0.2101 0.1985 0.3190 0.4101 0.2851
[17] 0.4474 0.0843 0.0000 0.8506 0.4681 0.2647 0.2946 0.5673


$Istanbul$day10
$Istanbul$day10$PM25
 [1] 0.421 0.500 0.410 0.363 0.582 0.490 0.721 0.239 0.435
[10] 0.421 0.249 0.660 0.441 0.187 0.345 0.243 0.669 0.967
[19] 0.329 0.444 0.518 1.000 0.000 0.360

$Istanbul$day10$NO2
 [1] 0.02349 0.28828 0.27517 0.15192 0.84810 0.41988 1.00000
 [8] 0.57371 0.00652 0.14549 0.42422 0.19317 0.44628 0.54899
[15] 0.46443 0.31353 0.17535 0.00000 0.44264 0.60052 0.32845
[22] 0.55471 0.95491 0.87516

$Istanbul$day10$SO2
 [1] 0.3461 0.5767 0.3002 0.6083 0.4427 0.5171 0.2655 0.7225
 [9] 0.4370 0.3569 0.4387 0.4725 0.8180 0.2097 1.0000 0.2844
[17] 0.4943 0.0000 0.3107 0.6176 0.5675 0.5465 0.3523 0.0475


$Istanbul$day11
$Istanbul$day11$PM25
 [1] 0.251 0.493 0.737 0.799 0.472 0.515 0.512 0.575 1.000
[10] 0.350 0.500 0.581 0.174 0.294 0.616 0.681 0.886 0.425
[19] 0.976 0.310 0.100 0.000 0.876 0.646

$Istanbul$day11$NO2
 [1] 0.524 0.479 1.000 0.893 0.282 0.175 0.861 0.716 0.419
[10] 0.670 0.000 0.376 0.681 0.540 0.205 0.842 0.883 0.300
[19] 0.509 0.763 0.800 0.414 0.317 0.656

$Istanbul$day11$SO2
 [1] 0.0000 0.2486 0.6922 0.0546 0.1841 0.4311 0.4154 0.0417
 [9] 0.6242 0.2302 0.4524 0.4425 0.6724 0.4761 0.4585 0.9190
[17] 0.5888 0.6536 0.6153 0.6402 0.8369 0.4784 0.5556 1.0000


$Istanbul$day12
$Istanbul$day12$PM25
 [1] 0.25880 0.78099 0.23670 0.90118 0.21974 0.17726 0.70798
 [8] 0.68749 0.56899 0.48912 0.48014 1.00000 0.00000 0.12795
[15] 0.04399 0.68629 0.34035 0.37351 0.92216 0.24475 0.99083
[22] 0.00662 0.78868 0.50344

$Istanbul$day12$NO2
 [1] 0.3227 0.3611 0.6298 0.6947 0.0934 1.0000 0.8501 0.2984
 [9] 0.3804 0.2738 0.3753 0.5233 0.4102 0.4259 0.6547 0.0000
[17] 0.3044 0.4175 0.2526 0.5804 0.3126 0.3155 0.6032 0.5401

$Istanbul$day12$SO2
 [1] 0.3619 0.3207 0.0606 0.2349 1.0000 0.3881 0.6088 0.3539
 [9] 0.5383 0.3845 0.7134 0.2798 0.0000 0.6054 0.3416 0.7365
[17] 0.4256 0.7342 0.4781 0.3157 0.5652 0.2901 0.7879 0.4308


$Istanbul$day13
$Istanbul$day13$PM25
 [1] 0.0439 0.5041 0.1613 0.0000 0.6611 0.8350 0.4186 0.2510
 [9] 0.4657 0.1855 0.8513 0.8133 0.1723 0.4592 0.9530 0.5940
[17] 0.6464 0.7669 0.8288 1.0000 0.8719 0.2366 0.5365 0.7329

$Istanbul$day13$NO2
 [1] 0.545 0.143 0.419 0.763 0.597 0.280 0.608 0.535 0.697
[10] 0.975 0.272 0.468 0.539 0.152 0.560 0.451 1.000 0.000
[19] 0.855 0.762 0.709 0.595 0.428 0.781

$Istanbul$day13$SO2
 [1] 0.2784 0.7831 0.6898 0.9049 0.3707 0.3757 0.6577 0.6588
 [9] 1.0000 0.7162 0.6075 0.8303 0.3876 0.9103 0.2347 0.6692
[17] 0.0162 0.6399 0.0605 0.4873 0.0000 0.3570 0.3134 0.7384


$Istanbul$day14
$Istanbul$day14$PM25
 [1] 0.3370 0.2821 0.8682 1.0000 0.8305 0.0692 0.8844 0.1466
 [9] 0.4872 0.7247 0.4454 0.4641 0.5453 0.6344 0.6296 0.0000
[17] 0.4567 0.5929 0.6824 0.4150 0.3015 0.1057 0.2938 0.4726

$Istanbul$day14$NO2
 [1] 0.4345 0.9834 0.8820 0.0000 0.8596 0.9598 0.6014 0.7062
 [9] 0.5251 0.5871 0.2685 0.9078 0.1713 0.4545 0.3102 0.3293
[17] 0.5535 0.5014 0.4056 0.6049 0.0106 0.7676 1.0000 0.1453

$Istanbul$day14$SO2
 [1] 0.0000 1.0000 0.4534 0.4023 0.3332 0.5228 0.3979 0.2379
 [9] 0.2690 0.2932 0.3000 0.1814 0.6823 0.4134 0.1756 0.5499
[17] 0.4838 0.3499 0.4976 0.1390 0.3331 0.1136 0.0629 0.2968


$Istanbul$day15
$Istanbul$day15$PM25
 [1] 0.44629 0.20548 0.36986 0.00111 0.30934 0.66963 0.15237
 [8] 0.14312 0.35988 0.52796 0.68421 1.00000 0.58726 0.55843
[15] 0.86059 0.34182 0.36133 0.46528 0.09252 0.00000 0.81133
[22] 0.28077 0.30686 0.18767

$Istanbul$day15$NO2
 [1] 0.587 0.686 0.504 0.558 0.767 0.552 0.429 0.541 0.367
[10] 0.577 0.354 0.865 0.000 0.644 0.446 0.998 0.502 0.662
[19] 0.641 0.610 0.670 1.000 0.761 0.598

$Istanbul$day15$SO2
 [1] 0.919 0.517 0.539 0.737 0.376 0.564 0.231 0.364 0.336
[10] 0.110 1.000 0.137 0.204 0.764 0.316 0.712 0.120 0.318
[19] 0.239 0.000 0.222 0.104 0.608 0.226


$Istanbul$day16
$Istanbul$day16$PM25
 [1] 0.586 0.433 0.647 0.273 0.000 0.987 0.757 0.732 0.146
[10] 0.741 0.709 0.587 0.771 0.832 0.578 1.000 0.805 0.853
[19] 0.538 0.576 0.414 0.686 0.657 0.621

$Istanbul$day16$NO2
 [1] 0.2897 0.3879 0.8732 0.2353 0.3498 0.1315 0.1671 0.3759
 [9] 0.3531 0.3037 0.2693 0.3957 0.6869 0.3604 0.1578 0.2047
[17] 0.0000 0.0643 0.2751 0.2250 0.1872 0.8741 0.8220 1.0000

$Istanbul$day16$SO2
 [1] 0.6556 0.4262 0.2104 0.1964 0.3441 0.7789 0.6287 0.6833
 [9] 0.4199 0.0856 0.5455 0.4132 0.3913 1.0000 0.6784 0.4098
[17] 0.0000 0.3955 0.7080 0.5569 0.5322 0.7213 0.4243 0.3811


$Istanbul$day17
$Istanbul$day17$PM25
 [1] 0.558 0.771 0.554 0.776 0.348 0.338 0.640 0.776 0.525
[10] 0.488 0.000 0.693 0.418 0.699 0.920 0.505 0.763 0.580
[19] 0.692 0.363 0.703 0.894 1.000 0.356

$Istanbul$day17$NO2
 [1] 0.7390 0.2647 0.7340 0.4148 0.3831 0.8066 0.8357 0.6231
 [9] 0.7073 0.6683 0.6228 0.5376 0.9129 0.8192 0.8575 1.0000
[17] 0.7461 0.4264 0.0000 0.2738 0.0999 0.3601 0.2569 0.4065

$Istanbul$day17$SO2
 [1] 0.192 0.317 0.622 0.578 0.385 0.827 0.000 0.661 1.000
[10] 0.384 0.632 0.337 0.504 0.765 0.226 0.338 0.677 0.641
[19] 0.762 0.901 0.751 0.175 0.559 0.387


$Istanbul$day18
$Istanbul$day18$PM25
 [1] 0.4298 0.7542 0.5289 0.4864 0.3896 0.5714 0.0175 0.2836
 [9] 1.0000 0.6484 0.0938 0.0000 0.0946 0.6357 0.5841 0.2320
[17] 0.3022 0.7756 0.3239 0.4113 0.5557 0.3677 0.1735 0.2867

$Istanbul$day18$NO2
 [1] 0.433 0.761 0.943 0.713 0.833 0.524 0.632 0.633 0.359
[10] 0.680 0.763 0.806 0.940 0.350 0.792 0.813 0.617 0.789
[19] 0.362 0.707 0.000 0.727 1.000 0.987

$Istanbul$day18$SO2
 [1] 0.000 0.751 0.543 0.619 0.104 0.716 0.501 0.713 0.580
[10] 0.384 0.703 0.765 0.638 0.758 0.516 0.927 0.541 0.489
[19] 0.870 0.887 0.785 0.677 1.000 0.388


$Istanbul$day19
$Istanbul$day19$PM25
 [1] 0.7569 0.9062 0.4766 0.6140 0.6203 0.7714 0.2402 0.6943
 [9] 0.0201 1.0000 0.3146 0.3806 0.5295 0.1388 0.3944 0.4784
[17] 0.3745 0.9869 0.0000 0.4352 0.7676 0.2460 0.4265 0.9654

$Istanbul$day19$NO2
 [1] 0.0193 0.2992 0.1953 0.6762 0.2827 0.6337 0.3407 0.4705
 [9] 0.7359 0.5466 0.0000 0.3696 0.6983 1.0000 0.2173 0.4504
[17] 0.4891 0.3179 0.1468 0.9944 0.8995 0.1405 0.0149 0.4376

$Istanbul$day19$SO2
 [1] 0.7162 0.5705 0.8982 0.5398 0.3004 1.0000 0.5431 0.5301
 [9] 0.5355 0.6186 0.0000 0.7821 0.7919 0.7680 0.4969 0.6299
[17] 0.2302 0.6978 0.3912 0.9471 0.3629 0.0676 0.6490 0.1793


$Istanbul$day20
$Istanbul$day20$PM25
 [1] 0.257 0.713 0.994 0.432 0.757 0.138 0.957 0.500 0.296
[10] 0.606 0.288 0.514 0.528 0.809 0.352 0.402 0.534 0.737
[19] 1.000 0.917 0.754 0.000 0.750 0.699

$Istanbul$day20$NO2
 [1] 0.2459 0.2768 0.0895 0.4055 0.4295 0.0000 0.0401 0.4490
 [9] 0.1692 0.3387 0.6431 0.3092 0.4380 0.5518 0.1465 0.6978
[17] 0.6219 0.7048 0.3091 0.0674 0.5274 0.5524 1.0000 0.8088

$Istanbul$day20$SO2
 [1] 0.306 0.163 0.594 0.957 0.116 0.223 0.892 0.916 0.481
[10] 0.000 0.286 0.739 1.000 0.425 0.262 0.510 0.468 0.188
[19] 0.586 0.798 0.433 0.595 0.496 0.552


$Istanbul$day21
$Istanbul$day21$PM25
 [1] 0.571 0.898 0.674 0.504 0.767 0.843 0.603 0.000 1.000
[10] 0.447 0.540 0.996 0.181 0.364 0.447 0.316 0.876 0.914
[19] 0.737 0.519 0.523 0.489 0.499 0.733

$Istanbul$day21$NO2
 [1] 0.8796 0.5382 0.7320 0.0559 0.6712 0.4327 0.9269 0.9487
 [9] 0.1032 0.8828 0.5109 0.6113 0.8769 0.8245 0.4476 0.7323
[17] 0.5355 0.9652 0.2595 0.8572 1.0000 0.0000 0.8142 0.5615

$Istanbul$day21$SO2
 [1] 0.635 0.484 0.867 0.961 0.828 0.549 0.422 0.536 0.501
[10] 0.673 0.263 0.824 0.194 0.000 0.694 0.757 1.000 0.881
[19] 0.679 0.867 0.201 0.137 0.107 0.609


$Istanbul$day22
$Istanbul$day22$PM25
 [1] 0.567 0.629 0.775 0.483 0.367 0.232 0.464 0.383 0.317
[10] 0.228 0.145 0.151 0.771 0.463 0.291 0.000 0.448 0.274
[19] 0.378 0.347 0.596 0.382 0.355 1.000

$Istanbul$day22$NO2
 [1] 0.9175 0.5734 0.2639 0.9506 0.1914 0.5094 0.8332 0.3788
 [9] 0.3947 0.6850 0.5768 0.4371 0.4514 0.6169 0.6945 0.9919
[17] 0.3760 1.0000 0.0000 0.2948 0.0404 0.6379 0.6780 0.0357

$Istanbul$day22$SO2
 [1] 0.5474 0.2189 0.6997 0.0000 0.8264 0.5159 0.1772 0.5027
 [9] 0.4403 0.4066 0.0197 0.3312 0.4718 0.0245 0.6343 1.0000
[17] 0.7981 0.2583 0.3067 0.3720 0.7380 0.5175 0.2083 0.0147


$Istanbul$day23
$Istanbul$day23$PM25
 [1] 0.746 0.293 0.728 0.609 0.792 0.897 0.186 0.877 0.782
[10] 0.501 0.582 0.726 0.774 0.104 1.000 0.190 0.000 0.857
[19] 0.161 0.525 0.208 0.081 0.463 0.771

$Istanbul$day23$NO2
 [1] 0.872 0.556 0.763 0.735 0.760 0.621 0.887 0.411 0.892
[10] 0.656 0.657 1.000 0.627 0.444 0.372 0.988 0.000 0.631
[19] 0.827 0.315 0.770 0.988 0.167 0.476

$Istanbul$day23$SO2
 [1] 1.0000 0.1554 0.0466 0.0845 0.2324 0.4491 0.1383 0.4547
 [9] 0.6295 0.5017 0.6462 0.1043 0.3413 0.6163 0.4034 0.4585
[17] 0.4020 0.4481 0.3756 0.0000 0.4450 0.2251 0.8859 0.1991


$Istanbul$day24
$Istanbul$day24$PM25
 [1] 0.539 0.000 0.333 0.290 0.812 0.462 0.537 0.576 0.365
[10] 0.576 0.882 0.656 0.681 0.289 0.545 0.574 0.334 0.530
[19] 1.000 0.253 0.640 0.228 0.589 0.236

$Istanbul$day24$NO2
 [1] 0.573 0.153 0.444 0.742 0.389 0.416 0.757 0.899 1.000
[10] 0.351 0.202 0.587 0.538 0.730 0.474 0.476 0.781 0.476
[19] 0.594 0.549 0.430 0.319 0.364 0.000

$Istanbul$day24$SO2
 [1] 0.8922 0.1037 0.3793 0.4398 0.8722 0.5573 0.8005 0.4607
 [9] 1.0000 0.6750 0.6916 0.7822 0.1347 0.5385 0.6002 0.3310
[17] 0.5447 0.3435 0.6017 0.0517 0.6194 0.3668 0.7991 0.0000


$Istanbul$day25
$Istanbul$day25$PM25
 [1] 0.6708 0.5757 0.9952 0.0848 0.4573 0.9126 0.8769 0.7842
 [9] 0.7597 0.0528 0.9629 0.7334 0.0000 0.5928 1.0000 0.9097
[17] 0.4275 0.9833 0.5121 0.7935 0.9909 0.7966 0.4197 0.9517

$Istanbul$day25$NO2
 [1] 0.956 0.236 0.460 0.814 0.259 0.000 0.366 0.833 0.107
[10] 0.440 0.802 0.528 0.684 1.000 0.784 0.691 0.661 0.168
[19] 0.765 0.676 0.635 0.531 0.134 0.560

$Istanbul$day25$SO2
 [1] 0.772 0.300 0.000 0.920 0.312 0.743 0.613 0.874 0.444
[10] 0.379 0.893 1.000 0.536 0.498 0.643 0.285 0.581 0.205
[19] 0.380 0.656 0.262 0.670 0.582 0.817


$Istanbul$day26
$Istanbul$day26$PM25
 [1] 0.837 0.222 0.439 0.701 0.396 0.205 0.128 0.665 0.934
[10] 0.117 0.376 0.000 0.367 0.283 0.349 1.000 0.038 0.448
[19] 0.762 0.214 0.168 0.362 0.643 0.274

$Istanbul$day26$NO2
 [1] 0.2836 0.5267 0.0636 0.1839 0.7519 0.6128 0.0290 0.6480
 [9] 0.0000 0.4119 0.5774 0.1943 0.7921 0.7803 0.3614 0.7596
[17] 0.5546 0.6552 0.6598 0.2263 0.0618 1.0000 0.5623 0.5943

$Istanbul$day26$SO2
 [1] 0.20424 0.77602 0.94984 0.87322 0.00000 0.24172 0.52160
 [8] 0.66889 0.36383 0.28646 0.55405 0.26579 1.00000 0.04872
[15] 0.62265 0.00671 0.37602 0.86252 0.77337 0.15337 0.13462
[22] 0.51184 0.26324 0.59732


$Istanbul$day27
$Istanbul$day27$PM25
 [1] 0.384 0.248 0.504 0.332 0.833 0.525 0.589 0.897 1.000
[10] 0.586 0.669 0.881 0.559 0.371 0.813 0.267 0.544 0.510
[19] 0.484 0.000 0.470 0.872 0.495 0.463

$Istanbul$day27$NO2
 [1] 0.409 0.350 0.636 0.761 1.000 0.157 0.705 0.552 0.675
[10] 0.306 0.608 0.272 0.779 0.330 0.613 0.000 0.686 0.905
[19] 0.809 0.890 0.216 0.548 0.575 0.796

$Istanbul$day27$SO2
 [1] 0.3360 0.3285 0.7264 0.6022 0.0000 0.2901 0.1617 1.0000
 [9] 0.2669 0.6143 0.3919 0.3945 0.4762 0.3413 0.5477 0.4899
[17] 0.0151 0.2540 0.2608 0.2645 0.5968 0.6860 0.1765 0.4860


$Istanbul$day28
$Istanbul$day28$PM25
 [1] 0.202 0.434 0.759 0.574 0.387 0.168 0.527 0.665 0.859
[10] 0.313 0.868 0.204 0.438 0.816 0.383 0.350 0.627 0.659
[19] 1.000 0.557 0.000 0.397 0.406 0.292

$Istanbul$day28$NO2
 [1] 0.3557 1.0000 0.4953 0.2147 0.4094 0.6870 0.1691 0.4569
 [9] 0.4266 0.2243 0.2772 0.4532 0.2528 0.1602 0.0849 0.3465
[17] 0.2443 0.1506 0.3127 0.0320 0.0233 0.6665 0.0000 0.3539

$Istanbul$day28$SO2
 [1] 0.2479 0.5170 0.7435 0.2329 0.8181 0.6810 0.4856 0.6906
 [9] 0.6285 0.2949 0.0916 0.6398 0.2347 0.5883 0.0000 0.2623
[17] 0.7004 0.9824 0.6629 0.2384 0.4496 1.0000 0.7903 0.8882


$Istanbul$day29
$Istanbul$day29$PM25
 [1] 0.82221 0.56606 0.86882 0.00734 0.53387 0.26949 0.49337
 [8] 0.95818 0.61832 0.14947 0.26508 0.01639 0.37277 0.20326
[15] 0.73736 0.20610 0.70375 0.70806 0.22984 1.00000 0.42106
[22] 0.13988 0.44070 0.00000

$Istanbul$day29$NO2
 [1] 0.7866 0.3639 0.9621 0.0416 0.3045 0.5161 0.4171 0.9768
 [9] 0.5921 0.7056 0.8020 0.6504 0.8713 0.7288 0.7995 0.2024
[17] 0.3007 0.0000 0.5394 1.0000 0.3412 0.6947 0.7292 0.6738

$Istanbul$day29$SO2
 [1] 0.564 0.472 0.467 0.647 1.000 0.423 0.608 0.669 0.860
[10] 0.578 0.818 0.230 0.840 0.669 0.340 0.000 0.541 0.826
[19] 0.730 0.542 0.656 0.685 0.435 0.657


$Istanbul$day30
$Istanbul$day30$PM25
 [1] 0.574 0.414 0.591 0.612 0.653 0.689 0.631 0.610 0.534
[10] 0.639 0.310 0.678 0.450 1.000 0.439 0.000 0.815 0.438
[19] 0.666 0.596 0.762 0.380 0.325 0.756

$Istanbul$day30$NO2
 [1] 0.1934 0.0520 0.6202 0.4130 0.9890 0.5135 0.3422 0.2749
 [9] 0.8431 0.1304 0.5806 0.5671 0.2882 0.8065 0.2452 0.4242
[17] 0.5243 1.0000 0.5802 0.8546 0.2499 0.0000 0.6476 0.0288

$Istanbul$day30$SO2
 [1] 0.17685 0.70383 0.52397 0.76757 0.92574 0.78531 0.64219
 [8] 0.68797 0.57810 1.00000 0.55316 0.75681 0.28535 0.54961
[15] 0.00000 0.57646 0.13744 0.50317 0.47423 0.80130 0.36573
[22] 0.35981 0.00664 0.70642



$Izmir
$Izmir$day1
$Izmir$day1$PM25
 [1] 0.898 0.554 1.000 0.506 0.351 0.673 0.792 0.222 0.185
[10] 0.625 0.739 0.000 0.167 0.693 0.845 0.513 0.927 0.453
[19] 0.569 0.966 0.592 0.620 0.378 0.631

$Izmir$day1$NO2
 [1] 0.294 0.028 0.219 0.388 0.481 0.689 0.452 0.414 0.698
[10] 1.000 0.423 0.715 0.555 0.425 0.000 0.220 0.315 0.187
[19] 0.175 0.348 0.663 0.231 0.470 0.396

$Izmir$day1$SO2
 [1] 0.4881 0.3323 0.2896 0.2037 0.5085 0.5596 0.3284 1.0000
 [9] 0.3237 0.9281 0.2967 0.0000 0.0301 0.1444 0.6297 0.6761
[17] 0.4065 0.3881 0.2135 0.1846 0.1761 0.1446 0.2129 0.1958


$Izmir$day2
$Izmir$day2$PM25
 [1] 0.505 0.546 0.687 0.000 0.499 0.361 0.370 0.137 0.410
[10] 0.284 0.476 0.227 1.000 0.283 0.368 0.207 0.360 0.777
[19] 0.350 0.519 0.256 0.676 0.495 0.294

$Izmir$day2$NO2
 [1] 0.5073 0.2825 0.6087 0.1286 0.5921 1.0000 0.8833 0.5345
 [9] 0.5449 0.5769 0.7506 0.5639 0.6441 0.6557 0.5063 0.6858
[17] 0.4444 0.7917 0.2674 0.4940 0.4352 0.0322 0.8600 0.0000

$Izmir$day2$SO2
 [1] 0.8704 0.3258 0.3456 0.7684 0.5522 0.1476 0.9159 0.0000
 [9] 0.4121 0.2221 0.4406 0.5913 0.4944 0.4381 0.5334 0.6327
[17] 0.3746 0.0949 0.3585 0.1340 0.7167 1.0000 0.3081 0.3678


$Izmir$day3
$Izmir$day3$PM25
 [1] 0.416 0.939 0.701 0.845 0.934 0.769 0.832 0.326 0.672
[10] 0.498 0.281 0.324 0.986 0.283 0.564 1.000 0.795 0.896
[19] 0.795 0.644 0.000 0.500 0.258 0.572

$Izmir$day3$NO2
 [1] 1.000 0.543 0.338 0.722 0.627 0.491 0.456 0.475 0.493
[10] 0.658 0.540 0.360 0.709 0.343 0.822 0.276 0.688 0.810
[19] 0.802 0.483 0.675 0.000 0.616 0.484

$Izmir$day3$SO2
 [1] 0.519 0.421 0.659 0.365 0.546 0.378 0.531 0.151 0.521
[10] 0.321 0.478 0.223 0.508 1.000 0.708 0.596 0.275 0.464
[19] 0.367 0.000 0.351 0.335 0.381 0.292


$Izmir$day4
$Izmir$day4$PM25
 [1] 0.6508 1.0000 0.7897 0.6531 0.4862 0.4919 0.7902 0.8244
 [9] 0.3079 0.7329 0.1449 0.1423 0.9110 0.4812 0.4804 0.7410
[17] 0.4168 0.2376 0.5092 0.0000 0.3408 0.4278 0.0734 0.5220

$Izmir$day4$NO2
 [1] 0.3847 0.8663 0.5025 0.2913 0.6382 0.5619 0.5863 0.6320
 [9] 0.4730 0.4906 0.7448 1.0000 0.0736 0.5404 0.8313 0.2682
[17] 0.8801 0.4769 0.7603 0.9601 0.8101 0.8924 0.0000 0.2437

$Izmir$day4$SO2
 [1] 0.7541 0.2970 0.4070 1.0000 0.6117 0.5899 0.5008 0.0866
 [9] 0.0000 0.5093 0.6847 0.1720 0.0223 0.9667 0.7313 0.7237
[17] 0.8569 0.7459 0.2527 0.9589 0.1848 0.4294 0.6782 0.6621


$Izmir$day5
$Izmir$day5$PM25
 [1] 0.4926 0.5680 0.1254 0.0000 0.0314 0.4887 0.2165 0.4463
 [9] 0.6484 0.1616 0.5806 0.1687 0.4998 0.3469 0.1147 0.5867
[17] 0.5276 0.1099 0.3478 1.0000 0.1406 0.2942 0.7062 0.0456

$Izmir$day5$NO2
 [1] 0.847 0.302 0.329 0.415 0.614 1.000 0.508 0.532 0.176
[10] 0.447 0.000 0.376 0.683 0.386 0.411 0.429 0.400 0.440
[19] 0.152 0.336 0.361 0.524 0.459 0.286

$Izmir$day5$SO2
 [1] 0.7870 0.9172 0.0529 0.5576 0.2358 0.2844 0.6389 0.7845
 [9] 0.6252 0.3097 0.8838 0.8500 0.1529 0.6021 0.0942 1.0000
[17] 0.5873 0.8622 0.6280 0.4568 0.7530 0.1136 0.0000 0.2764


$Izmir$day6
$Izmir$day6$PM25
 [1] 0.5623 0.3622 0.3470 0.7564 0.3361 0.7722 0.2561 0.4087
 [9] 0.2404 0.3585 0.1413 0.6164 0.8336 0.5681 0.9901 0.0000
[17] 0.2624 0.6394 0.6715 0.5810 1.0000 0.9792 0.0534 0.3611

$Izmir$day6$NO2
 [1] 0.0000 0.1107 0.4617 0.4900 0.1794 0.4763 0.1938 0.4620
 [9] 0.4246 0.3156 0.1301 0.2659 0.5283 0.2579 0.3042 0.1000
[17] 0.2744 0.3867 0.1935 0.2885 0.0915 0.5992 1.0000 0.2769

$Izmir$day6$SO2
 [1] 0.660 0.883 0.490 0.490 0.696 0.377 0.533 0.457 0.374
[10] 0.651 0.539 1.000 0.000 0.696 0.403 0.510 0.350 0.683
[19] 0.838 0.665 0.385 0.889 0.535 0.603


$Izmir$day7
$Izmir$day7$PM25
 [1] 0.484 0.430 0.526 0.168 0.871 0.708 0.525 0.804 0.106
[10] 0.849 0.568 0.662 0.353 0.248 0.324 0.833 0.851 1.000
[19] 0.480 0.256 0.973 0.000 0.448 0.694

$Izmir$day7$NO2
 [1] 0.403 0.318 0.757 1.000 0.499 0.505 0.451 0.971 0.501
[10] 0.473 0.480 0.526 0.524 0.782 0.717 0.754 0.517 0.808
[19] 0.000 0.894 0.695 0.566 0.754 0.390

$Izmir$day7$SO2
 [1] 0.2988 0.5372 0.0000 0.4892 0.4905 0.4125 0.0866 0.8771
 [9] 0.0248 0.0275 0.3149 1.0000 0.5365 0.5855 0.4327 0.5892
[17] 0.7056 0.7295 0.9426 0.2521 0.4513 0.6350 0.7673 0.3951


$Izmir$day8
$Izmir$day8$PM25
 [1] 0.872 0.459 0.530 0.996 0.000 0.724 0.966 0.529 0.863
[10] 0.405 0.707 0.651 0.638 0.209 0.592 0.368 0.524 1.000
[19] 0.453 0.636 0.683 0.582 0.925 0.633

$Izmir$day8$NO2
 [1] 0.573 0.320 0.443 0.379 0.650 0.814 0.424 0.000 0.331
[10] 0.310 0.821 0.570 0.914 0.632 0.161 0.655 0.355 0.536
[19] 0.511 0.501 0.455 1.000 0.901 0.321

$Izmir$day8$SO2
 [1] 0.6166 0.8721 0.8590 0.3232 0.9414 0.3743 0.5973 0.6085
 [9] 0.0484 0.5275 0.0000 0.0359 0.6492 0.6548 0.8246 0.6651
[17] 0.3826 0.3485 0.7791 0.9524 0.7775 0.6215 0.5448 1.0000


$Izmir$day9
$Izmir$day9$PM25
 [1] 0.000 0.828 0.552 0.324 0.791 0.359 0.674 0.636 0.626
[10] 0.152 0.631 0.475 0.425 0.311 0.322 0.391 0.888 0.757
[19] 0.657 1.000 0.975 0.807 0.311 0.933

$Izmir$day9$NO2
 [1] 0.4000 0.3900 0.5084 0.4654 0.3281 0.6187 0.3310 0.3106
 [9] 0.3872 0.2626 0.4877 0.4259 0.3769 0.5642 0.0000 0.5121
[17] 0.6720 0.6669 1.0000 0.2921 0.2082 0.5302 0.0344 0.7483

$Izmir$day9$SO2
 [1] 0.3289 0.6768 0.4950 0.0169 0.6348 0.3158 0.7901 0.0000
 [9] 0.5434 0.3274 0.8381 0.4037 0.7931 0.4644 0.6045 0.9166
[17] 0.4144 1.0000 0.8050 0.8749 0.0480 0.5346 0.2306 0.7519


$Izmir$day10
$Izmir$day10$PM25
 [1] 0.842 0.486 0.507 0.293 0.758 0.888 0.277 0.256 0.826
[10] 0.866 0.677 0.652 0.486 0.671 0.518 1.000 0.519 0.000
[19] 0.702 0.363 0.764 0.307 0.556 0.681

$Izmir$day10$NO2
 [1] 0.490 0.000 0.145 0.307 1.000 0.510 0.902 0.595 0.407
[10] 0.861 0.224 0.960 0.802 0.639 0.706 0.602 0.420 0.449
[19] 0.376 0.892 0.713 0.515 0.226 0.635

$Izmir$day10$SO2
 [1] 0.11581 0.16080 1.00000 0.00274 0.15168 0.23942 0.23508
 [8] 0.08127 0.35355 0.10667 0.79783 0.44392 0.65721 0.00000
[15] 0.40386 0.56646 0.43654 0.41316 0.10776 0.22551 0.29049
[22] 0.17509 0.62982 0.91894


$Izmir$day11
$Izmir$day11$PM25
 [1] 0.8714 0.6741 0.5545 0.7080 0.4062 0.6392 0.6174 0.0621
 [9] 1.0000 0.2862 0.0000 0.5805 0.6044 0.5789 0.6117 0.3186
[17] 0.7116 0.8859 0.3952 0.6483 0.5207 0.5672 0.7964 0.6420

$Izmir$day11$NO2
 [1] 1.000 0.166 0.834 0.120 0.203 0.168 0.394 0.460 0.342
[10] 0.509 0.000 0.200 0.528 0.671 0.413 0.550 0.598 0.513
[19] 0.206 0.228 0.517 0.120 0.239 0.254

$Izmir$day11$SO2
 [1] 0.000 0.568 0.638 0.451 1.000 0.417 0.641 0.627 0.701
[10] 0.593 0.413 0.636 0.544 0.541 0.682 0.698 0.467 0.354
[19] 0.862 0.485 0.358 0.189 0.549 0.584


$Izmir$day12
$Izmir$day12$PM25
 [1] 0.257 0.195 0.652 0.353 1.000 0.267 0.135 0.576 0.737
[10] 0.644 0.561 0.710 0.368 0.665 0.348 0.357 0.629 0.720
[19] 0.338 0.134 0.291 0.000 0.636 0.772

$Izmir$day12$NO2
 [1] 0.965 0.206 0.325 0.428 0.525 0.234 0.719 0.351 0.735
[10] 0.000 0.412 0.555 0.320 0.323 0.279 0.487 0.813 0.170
[19] 0.430 0.322 0.855 1.000 0.599 0.477

$Izmir$day12$SO2
 [1] 0.842 0.268 0.395 0.743 0.446 0.264 0.471 0.273 1.000
[10] 0.293 0.708 0.413 0.831 0.705 0.209 0.502 0.367 0.571
[19] 0.398 0.561 0.376 0.713 0.511 0.000


$Izmir$day13
$Izmir$day13$PM25
 [1] 0.6584 0.5798 0.9412 0.3771 0.1549 0.7799 0.8978 0.7762
 [9] 1.0000 0.8879 0.4322 0.8950 0.5171 0.9384 0.9105 0.6512
[17] 0.9292 0.5969 0.9150 0.0899 0.4021 0.7002 0.5365 0.0000

$Izmir$day13$NO2
 [1] 0.321 0.908 0.232 0.561 0.403 0.526 0.479 0.489 0.492
[10] 0.000 0.522 0.552 0.620 0.849 0.530 0.586 0.179 0.545
[19] 0.149 1.000 0.436 0.606 0.627 0.208

$Izmir$day13$SO2
 [1] 0.5161 0.1031 0.5139 0.2249 0.2085 0.5020 0.4892 0.3241
 [9] 0.1934 0.4429 0.3778 0.6483 0.2213 0.3579 1.0000 0.0465
[17] 0.4760 0.5010 0.2705 0.2005 0.0000 0.0106 0.3201 0.7269


$Izmir$day14
$Izmir$day14$PM25
 [1] 0.4655 0.9392 0.2573 0.6322 0.4880 0.5636 0.5137 0.6954
 [9] 0.4460 0.7560 0.6041 0.4454 0.4127 0.0431 0.2425 0.3649
[17] 0.7455 0.6997 0.1925 0.2972 1.0000 0.0000 0.5054 0.5366

$Izmir$day14$NO2
 [1] 0.2662 0.0659 0.2910 1.0000 0.3563 0.3302 0.4228 0.1127
 [9] 0.5544 0.5470 0.3015 0.3952 0.2654 0.5403 0.0000 0.3304
[17] 0.5250 0.7697 0.1726 0.1243 0.3567 0.3683 0.5559 0.3553

$Izmir$day14$SO2
 [1] 0.2853 1.0000 0.3563 0.7073 0.6677 0.3242 0.9340 0.1165
 [9] 0.1798 0.7903 0.0000 0.6367 0.5091 0.6663 0.6785 0.2587
[17] 0.2532 0.6880 0.7095 0.1287 0.0944 0.1400 0.4475 0.4457


$Izmir$day15
$Izmir$day15$PM25
 [1] 0.912 0.616 0.502 0.440 0.439 0.425 0.346 0.556 0.000
[10] 0.334 0.691 0.552 0.619 0.430 0.760 0.478 0.624 0.529
[19] 0.606 0.454 0.733 0.442 1.000 0.500

$Izmir$day15$NO2
 [1] 0.631 0.416 0.806 0.649 0.777 0.800 0.373 0.832 0.251
[10] 0.639 0.857 0.136 0.467 0.478 0.000 0.536 1.000 0.189
[19] 0.277 0.481 0.577 0.625 0.649 0.514

$Izmir$day15$SO2
 [1] 1.0000 0.2942 0.0710 0.0133 0.3566 0.2711 0.6707 0.3597
 [9] 0.4267 0.0000 0.3464 0.3384 0.0540 0.1772 0.1872 0.6148
[17] 0.5471 0.5814 0.7603 0.3260 0.7043 0.3399 0.6975 0.0917


$Izmir$day16
$Izmir$day16$PM25
 [1] 0.352 0.698 0.868 0.481 0.414 0.633 0.462 0.493 0.547
[10] 0.739 0.246 0.389 0.627 0.000 0.786 0.342 0.251 0.528
[19] 1.000 0.336 0.148 0.459 0.757 0.426

$Izmir$day16$NO2
 [1] 0.5752 0.7136 0.0000 0.5752 0.2975 0.4113 0.9842 0.4888
 [9] 0.6276 0.1800 0.5330 0.3953 0.3684 0.5648 0.0521 0.7718
[17] 0.9935 0.3164 0.5550 0.9367 0.2068 0.4378 1.0000 0.1960

$Izmir$day16$SO2
 [1] 0.6762 0.5681 0.3804 0.5430 0.3873 0.6324 0.3475 0.0676
 [9] 0.6443 0.7159 0.4868 0.2756 0.3536 0.1663 0.1506 0.7159
[17] 0.0000 0.5872 0.3756 0.1644 0.4097 0.4194 0.7316 1.0000


$Izmir$day17
$Izmir$day17$PM25
 [1] 1.000 0.949 0.946 0.766 0.405 0.568 0.124 0.320 0.295
[10] 0.239 0.893 0.654 0.593 0.521 0.781 0.755 0.287 0.000
[19] 0.216 0.394 0.523 0.917 0.690 0.703

$Izmir$day17$NO2
 [1] 0.6026 0.3590 0.1147 0.1604 0.0414 0.0581 0.0000 0.6152
 [9] 0.0783 0.5531 1.0000 0.3415 0.3452 0.6099 0.3890 0.1950
[17] 0.0253 0.4170 0.2822 0.4944 0.4658 0.7288 0.5482 0.6870

$Izmir$day17$SO2
 [1] 0.4736 0.6930 0.3804 0.4079 0.4751 0.1267 0.0986 0.3677
 [9] 0.0410 0.4813 0.6879 0.1936 0.4684 0.0000 0.7280 0.3481
[17] 0.9032 0.4654 1.0000 0.5516 0.3973 0.7507 0.6089 0.2898


$Izmir$day18
$Izmir$day18$PM25
 [1] 0.5429 0.5269 0.4693 0.3064 0.3748 0.5823 0.6372 0.4525
 [9] 0.2058 0.5544 0.4195 0.6401 0.0283 0.2198 0.5050 0.0838
[17] 1.0000 0.1957 0.4859 0.0000 0.8786 0.5481 0.2712 0.4443

$Izmir$day18$NO2
 [1] 0.415 0.135 0.310 0.359 0.928 0.741 0.509 0.541 0.909
[10] 0.333 0.778 0.353 0.811 0.960 0.490 0.739 0.000 0.702
[19] 0.699 0.730 0.949 0.347 1.000 0.345

$Izmir$day18$SO2
 [1] 0.156 1.000 0.695 0.683 0.654 0.469 0.737 0.831 0.172
[10] 0.264 0.356 0.631 0.000 0.481 0.196 0.439 0.162 0.772
[19] 0.455 0.472 0.843 0.325 0.175 0.379


$Izmir$day19
$Izmir$day19$PM25
 [1] 0.2698 0.6100 0.3841 0.0126 0.4288 0.6935 0.0356 0.8306
 [9] 0.7311 1.0000 0.3800 0.6345 0.1950 0.4041 0.1337 0.8559
[17] 0.0000 0.6996 0.2593 0.1358 0.9866 0.5274 0.6042 0.5571

$Izmir$day19$NO2
 [1] 0.697 0.889 0.560 0.830 0.000 0.692 0.644 0.588 0.575
[10] 0.439 0.401 0.800 0.502 0.820 0.456 0.310 0.882 0.523
[19] 0.829 0.259 0.629 0.472 1.000 0.850

$Izmir$day19$SO2
 [1] 0.686 0.494 0.762 0.715 0.326 0.798 0.739 0.552 1.000
[10] 0.756 0.493 0.752 0.659 0.554 0.000 0.508 0.590 0.715
[19] 0.648 0.626 0.857 0.576 0.396 0.495


$Izmir$day20
$Izmir$day20$PM25
 [1] 0.517 0.397 0.000 0.757 0.818 0.347 0.708 0.477 0.483
[10] 0.760 0.703 0.394 0.349 0.741 0.267 0.357 1.000 0.623
[19] 0.437 0.211 0.795 0.404 0.604 0.941

$Izmir$day20$NO2
 [1] 0.4665 0.8445 0.1864 0.0000 0.6318 0.8927 0.3738 0.4557
 [9] 0.8128 0.3954 0.1943 0.2833 0.4806 0.0647 0.7773 0.7063
[17] 0.7875 1.0000 0.8313 0.9705 0.4270 0.8836 0.1248 0.7183

$Izmir$day20$SO2
 [1] 0.902 1.000 0.565 0.682 0.228 0.116 0.491 0.566 0.461
[10] 0.296 0.193 0.725 0.741 0.455 0.907 0.497 0.734 0.328
[19] 0.643 0.656 0.565 0.762 0.461 0.000


$Izmir$day21
$Izmir$day21$PM25
 [1] 0.589 0.879 0.183 0.225 0.684 0.710 1.000 0.662 0.812
[10] 0.750 0.875 0.920 0.326 0.498 0.921 0.106 0.000 0.714
[19] 0.687 0.586 0.902 0.993 0.940 0.632

$Izmir$day21$NO2
 [1] 0.542 0.628 0.000 0.290 0.755 0.613 0.213 0.882 0.676
[10] 0.394 0.552 0.566 0.472 0.502 0.511 0.720 0.807 0.692
[19] 1.000 0.709 0.632 0.766 0.514 0.738

$Izmir$day21$SO2
 [1] 1.0000 0.4576 0.7621 0.5147 0.3253 0.7864 0.6211 0.4310
 [9] 0.9403 0.0536 0.4932 0.4575 0.4766 0.5383 0.9648 0.0000
[17] 0.7628 0.3223 0.7395 0.3990 0.7198 0.4542 0.3590 0.2076


$Izmir$day22
$Izmir$day22$PM25
 [1] 0.289 0.448 0.529 0.668 0.520 0.848 0.228 0.678 0.769
[10] 0.667 0.507 0.413 0.186 0.152 0.714 0.459 0.160 0.668
[19] 0.737 0.000 1.000 0.849 0.222 0.338

$Izmir$day22$NO2
 [1] 0.3517 0.3869 0.0363 0.1610 1.0000 0.5161 0.6182 0.1258
 [9] 0.2353 0.0000 0.3864 0.2934 0.8260 0.2330 0.3071 0.0925
[17] 0.3566 0.1280 0.0706 0.2812 0.1010 0.1102 0.4196 0.5502

$Izmir$day22$SO2
 [1] 0.7628 0.4908 0.5208 0.2567 0.5843 0.8032 0.1901 0.8033
 [9] 0.2903 0.6263 0.0605 0.5457 0.9002 0.4681 0.0000 1.0000
[17] 0.2367 0.1874 0.3009 0.1827 0.5538 0.5961 0.3801 0.7483


$Izmir$day23
$Izmir$day23$PM25
 [1] 0.46953 0.00534 0.52707 0.00000 0.32702 0.27597 0.13459
 [8] 0.65262 0.35651 0.48911 0.33740 0.11159 0.50596 0.13513
[15] 0.57818 1.00000 0.44191 0.53114 0.46722 0.68619 0.01341
[22] 0.09214 0.36945 0.33349

$Izmir$day23$NO2
 [1] 0.15258 0.00208 0.32212 0.85049 1.00000 0.64831 0.03540
 [8] 0.62695 0.30501 0.47179 0.60895 0.85417 0.47338 0.26426
[15] 0.63078 0.64446 0.66545 0.72534 0.51192 0.20440 0.47309
[22] 0.00000 0.72709 0.56606

$Izmir$day23$SO2
 [1] 0.754 0.622 0.000 0.434 0.812 0.516 0.786 0.354 0.645
[10] 0.449 0.424 0.537 1.000 0.624 0.637 0.970 0.512 0.708
[19] 0.406 0.789 0.254 0.569 0.857 0.908


$Izmir$day24
$Izmir$day24$PM25
 [1] 0.37725 0.00000 0.66088 0.28945 0.50904 0.11352 0.31834
 [8] 0.65856 0.44758 0.44768 0.53849 0.16788 0.39662 0.46919
[15] 0.57696 0.25884 0.42978 0.50053 1.00000 0.65181 0.00314
[22] 0.39423 0.42271 0.52542

$Izmir$day24$NO2
 [1] 0.146 0.685 0.148 0.335 0.560 0.175 0.395 0.869 0.221
[10] 0.000 0.598 0.119 0.283 0.529 0.515 0.743 0.152 0.267
[19] 0.159 0.316 1.000 0.349 0.359 0.531

$Izmir$day24$SO2
 [1] 0.000 0.732 0.391 0.173 0.537 0.349 0.359 0.372 0.932
[10] 0.224 0.732 0.466 0.317 0.193 0.287 0.317 0.778 0.147
[19] 0.965 0.353 0.190 0.172 0.474 1.000


$Izmir$day25
$Izmir$day25$PM25
 [1] 0.3698 0.4142 0.5178 0.4690 0.1915 0.7878 0.0230 0.1093
 [9] 0.3039 0.3512 0.8105 0.2324 0.3857 0.3326 1.0000 0.9466
[17] 0.0823 0.5976 0.0000 0.1614 0.2467 0.1551 0.5259 0.3392

$Izmir$day25$NO2
 [1] 0.6081 0.8176 0.8043 0.0689 0.4810 0.6467 0.8029 0.4093
 [9] 0.5677 0.9089 0.5838 0.9642 0.0359 1.0000 0.0000 0.7957
[17] 0.7798 0.4525 0.4636 0.6193 0.5772 0.5929 0.8367 0.5907

$Izmir$day25$SO2
 [1] 0.170 0.790 0.605 0.897 0.467 0.623 0.301 0.315 0.249
[10] 0.692 0.000 1.000 0.523 0.690 0.986 0.761 0.792 0.527
[19] 0.362 0.847 0.635 0.686 0.480 0.547


$Izmir$day26
$Izmir$day26$PM25
 [1] 0.127 0.254 0.271 0.549 0.372 0.663 0.592 0.485 0.264
[10] 0.451 0.763 0.680 0.523 0.136 0.488 0.000 0.940 0.592
[19] 0.837 0.550 1.000 0.405 0.752 0.583

$Izmir$day26$NO2
 [1] 0.727 0.355 1.000 0.650 0.505 0.380 0.319 0.266 0.284
[10] 0.602 0.798 0.659 0.246 0.611 0.866 0.779 0.929 0.671
[19] 0.000 0.711 0.634 0.284 0.571 0.474

$Izmir$day26$SO2
 [1] 0.478 0.706 0.441 0.426 0.226 0.631 0.726 0.731 0.884
[10] 0.000 0.597 0.257 0.740 0.608 0.323 0.495 0.937 0.454
[19] 0.159 0.859 0.879 0.437 1.000 0.547


$Izmir$day27
$Izmir$day27$PM25
 [1] 0.0216 0.3736 0.5774 0.6894 0.6249 0.9344 0.5688 0.6728
 [9] 0.8578 0.4244 0.1970 0.3823 0.0000 0.5029 0.6681 0.1041
[17] 0.1687 0.0818 0.4702 1.0000 0.2548 0.2563 0.4757 0.1301

$Izmir$day27$NO2
 [1] 0.8149 0.4168 0.8500 0.7014 0.7484 0.0609 0.0000 0.5259
 [9] 0.3997 0.4055 0.4772 0.9919 0.3685 0.3214 1.0000 0.5757
[17] 0.5813 0.1166 0.7346 0.8613 0.7240 0.1606 0.9465 0.4180

$Izmir$day27$SO2
 [1] 0.9532 0.5774 0.5642 1.0000 0.7811 0.5763 0.2349 0.6971
 [9] 0.6607 0.3096 0.3901 0.3748 0.0000 0.6359 0.0425 0.5448
[17] 0.0266 0.7121 0.6754 0.9630 0.1254 0.7117 0.3912 0.5093


$Izmir$day28
$Izmir$day28$PM25
 [1] 0.516 0.689 0.370 0.414 0.774 0.000 0.546 0.510 0.758
[10] 0.692 0.320 0.912 0.823 0.903 0.399 1.000 0.854 0.844
[19] 0.350 0.588 0.642 0.317 0.560 0.765

$Izmir$day28$NO2
 [1] 0.4764 0.4612 0.7103 0.2554 0.5752 0.6320 0.1412 0.0000
 [9] 0.6098 0.3595 0.9063 0.4772 0.4057 0.0537 0.2441 0.1265
[17] 1.0000 0.1469 0.5397 0.6885 0.3444 0.4412 0.1878 0.4101

$Izmir$day28$SO2
 [1] 0.492 0.402 0.714 0.383 0.218 0.766 0.623 0.200 0.413
[10] 0.773 0.345 0.755 0.500 0.248 0.685 0.000 0.426 0.556
[19] 0.634 0.636 0.603 0.897 0.973 1.000


$Izmir$day29
$Izmir$day29$PM25
 [1] 0.0249 0.1758 0.0406 0.2917 0.5081 0.4727 0.3477 0.3704
 [9] 0.2919 0.1596 0.1975 0.1690 0.2652 0.0000 0.7853 1.0000
[17] 0.0256 0.7921 0.4507 0.4248 0.4343 0.5535 0.4475 0.1595

$Izmir$day29$NO2
 [1] 0.691 0.852 0.609 0.394 0.300 0.797 0.647 0.531 0.710
[10] 0.509 0.643 0.723 0.216 0.446 0.372 0.391 1.000 0.919
[19] 0.828 0.580 0.920 0.394 0.225 0.000

$Izmir$day29$SO2
 [1] 0.707 0.660 0.460 0.770 0.662 0.578 0.960 1.000 0.904
[10] 0.796 0.409 0.591 0.641 0.688 0.922 0.308 0.324 0.637
[19] 0.296 0.922 0.611 0.000 0.594 0.703


$Izmir$day30
$Izmir$day30$PM25
 [1] 0.2241 0.0722 0.4649 0.8368 0.3137 0.8019 0.8913 0.0727
 [9] 0.7268 0.4278 0.5364 1.0000 0.7621 0.3128 0.0000 0.5725
[17] 0.6230 0.8337 0.8999 0.5271 0.5124 0.5963 0.7937 0.5124

$Izmir$day30$NO2
 [1] 0.4276 0.2321 0.5459 0.5375 0.5421 0.4261 0.3682 0.8141
 [9] 0.4998 1.0000 0.3118 0.0000 0.4949 0.8100 0.4968 0.0779
[17] 0.3707 0.5305 0.9253 0.8721 0.7803 0.6805 0.3028 0.7803

$Izmir$day30$SO2
 [1] 0.8914 0.7194 0.8763 0.5612 0.8192 0.9583 0.1018 1.0000
 [9] 0.7276 0.7061 0.4780 0.1066 0.0000 0.5607 0.3993 0.8360
[17] 0.4455 0.3174 0.9972 0.3556 0.2079 0.6939 0.9943 0.0987

Map function/I

Map(f, ...)

Similar to mapply, but always returns a list.

Useful when you want to ensure that the output format is consistent, especially with complex data structures.

# Create a complex random example data
set.seed(123)  # For reproducibility
PM25 <- list(Ankara = rnorm(10, mean = 35, sd = 5),
             Istanbul = rnorm(10, mean = 30, sd = 4),
             Izmir = rnorm(10, mean = 40, sd = 6))

NO2 <- list(Ankara = rnorm(10, mean = 50, sd = 10),
            Istanbul = rnorm(10, mean = 45, sd = 7),
            Izmir = rnorm(10, mean = 55, sd = 8))

Map function/II

pollution_stats <- Map(
  function(pm25, no2)
    list(mean_pm25 = mean(pm25),
         sd_pm25 = sd(pm25),
         mean_no2 = mean(no2),
         sd_no2 = sd(no2)),
  PM25, NO2)
class(pollution_stats)
[1] "list"
pollution_stats2 <- mapply(
  function(pm25, no2) {
    list(mean_pm25 = mean(pm25),
         sd_pm25 = sd(pm25),
         mean_no2 = mean(no2),
         sd_no2 = sd(no2))
    },
  PM25, NO2)
class(pollution_stats2)
[1] "matrix" "array" 
print(pollution_stats2)
          Ankara Istanbul Izmir
mean_pm25 35.4   30.8     37.5 
sd_pm25   4.77   4.15     5.58 
mean_no2  53.2   44.9     56.8 
sd_no2    5.27   7.58     6.85 

Reduce function

Reduce(f, x, init, right = FALSE, accumulate = FALSE)
  • Used to successively apply a function to elements of a vector or list.
  • Particularly useful when you want to progressively reduce a list or vector to a single value by applying a function in a pairwise manner.
weekly_AQI <- list(
    week1 = c(120, 110, 115, 130, 125, 140, 135),
    week2 = c(128, 122, 118, 135, 140, 145, 130),
    week3 = c(130, 125, 120, 140, 135, 150, 145)
)
(cumulative_product <- Reduce(function(x, y) x * y, weekly_AQI))
[1] 1996800 1677500 1628400 2457000 2362500 3045000 2544750

In this code, function(x, y) x * y is an anonymous function that takes two arguments and returns their product. Reduce applies this function cumulatively to the elements of weekly_AQI.

Filter function/I

Filter(f, x)
  • Great for extracting elements from a vector or list based on a specified condition.
  • It’s very useful when you want to subset data according to certain criteria.
daily_PM25 <- c(35, 40, 25, 20, 50, 45, 55, 30, 25, 40,
                60, 20, 30, 35, 40, 45, 25, 50, 55, 60,
                30, 25, 20, 35, 40, 45, 30, 50, 55, 60)
(safe_days <- Filter(function(x) x < 40,
                     daily_PM25))
 [1] 35 25 20 30 25 20 30 35 25 30 25 20 35 30

Is there another way?

# indices of the values lower than 40
i <- which(daily_PM25 < 40)
daily_PM25[i]
 [1] 35 25 20 30 25 20 30 35 25 30 25 20 35 30

or

daily_PM25[which(daily_PM25 < 40)]
 [1] 35 25 20 30 25 20 30 35 25 30 25 20 35 30

Then, why do we need Filter function?

Filter function/II

# Create a complex example data
set.seed(123)  # For reproducibility
# 3 cities, 10 days, 2 pollutants
air_quality_data <- list(
    Ankara = data.frame(day = 1:10, PM25 = rnorm(10, mean = 35, sd = 5),
                        NO2 = rnorm(10, mean = 50, sd = 10)),
    Istanbul = data.frame(day = 1:10, PM25 = rnorm(10, mean = 40, sd = 6),
                          NO2 = rnorm(10, mean = 60, sd = 15)),
    Izmir = data.frame(day = 1:10, PM25 = rnorm(10, mean = 30, sd = 4),
                       NO2 = rnorm(10, mean = 45, sd = 7))
)
# PM2.5 mean values of all cities
sapply(air_quality_data,
       function(df) mean(df$PM25))
  Ankara Istanbul    Izmir 
    35.4     37.5     30.0 
# We want to extract the cities with
# mean PM2.5 lower than 37
safe_cities <- Filter(
  function(df) mean(df$PM25) < 37,
  air_quality_data)

names(safe_cities)
[1] "Ankara" "Izmir" 

Find function/I

Find(f, x, right = FALSE, nomatch = NULL)
  • Used to locate the first element of a list or vector that satisfies a given condition.
  • It’s quite useful when you want to quickly identify an element that meets certain criteria without having to process the entire dataset.
# Toy data
daily_PM25 <- list(
    day1 = 30, day2 = 35, day3 = 40, day4 = 45,
    day5 = 25, day6 = 50, day7 = 55, day8 = 20
)
# Assume the threshold for concern is a 
# PM2.5 level of 50.
(first_high_day <- Find(function(x) x > 50,
                        daily_PM25))
[1] 55

Find function/II

A more complex example.

# Create a complex example data
set.seed(123)  # Ensuring reproducibility
air_quality_data <- list(
    Ankara = data.frame(
      day = 1:7,
      PM25 = rnorm(7, mean = 35, sd = 5)),
    Istanbul = data.frame(
      day = 1:7,
      PM25 = rnorm(7, mean = 40, sd = 6)),
    Izmir = data.frame(
      day = 1:7,
      PM25 = rnorm(7, mean = 30, sd = 4))
)
print(air_quality_data)
$Ankara
  day PM25
1   1 32.2
2   2 33.8
3   3 42.8
4   4 35.4
5   5 35.6
6   6 43.6
7   7 37.3

$Istanbul
  day PM25
1   1 32.4
2   2 35.9
3   3 37.3
4   4 47.3
5   5 42.2
6   6 42.4
7   7 40.7

$Izmir
  day PM25
1   1 27.8
2   2 37.1
3   3 32.0
4   4 22.1
5   5 32.8
6   6 28.1
7   7 25.7
(first_exceeding_city <- Find(
  function(df) any(df$PM25 > 45),
  air_quality_data))
  day PM25
1   1 32.4
2   2 35.9
3   3 37.3
4   4 47.3
5   5 42.2
6   6 42.4
7   7 40.7

Position function

Position(f, x, right = FALSE, nomatch = NA_integer_)
  • Used to find the position or index of the first element in a vector or list that satisfies a specified condition.
  • This is particularly useful when you want to know where in your data a certain criterion is first met, rather than just retrieving the value itself.
# Create a complex example data
set.seed(123)  # For reproducibility
environmental_data <- list(
    Ankara = data.frame(day = 1:10,
                        PM25 = rnorm(10, mean = 35, sd = 5),
                        Temp = rnorm(10, 20),
                        Humidity = rnorm(10, 60)),
    Istanbul = data.frame(day = 1:10,
                          PM25 = rnorm(10, mean = 40, sd = 6),
                          Temp = rnorm(10, 22),
                          Humidity = rnorm(10, 65)),
    Izmir = data.frame(day = 1:10,
                       PM25 = rnorm(10, mean = 30, sd = 4),
                       Temp = rnorm(10, 25),
                       Humidity = rnorm(10, 70)))
(first_exceeding_city_index <- Position(
  function(df) mean(df$PM25) > 37, environmental_data))
[1] 2

sweep function/I

sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
  • Useful for performing operations on arrays or matrices, such as standardizing data by subtracting the mean and dividing by the standard deviation.
  • This function is often used in data analysis for normalizing or scaling data.
# Create a complex example data
set.seed(123)  # For reproducibility
pollution_data <- matrix(
  rnorm(30), nrow = 10, ncol = 3)
mycolnames <- c("PM2.5", "NO2", "SO2")
colnames(pollution_data) <- mycolnames
head(pollution_data)
       PM2.5    NO2    SO2
[1,] -0.5605  1.224 -1.068
[2,] -0.2302  0.360 -0.218
[3,]  1.5587  0.401 -1.026
[4,]  0.0705  0.111 -0.729
[5,]  0.1293 -0.556 -0.625
[6,]  1.7151  1.787 -1.687

These values are required for standardization.

means <- colMeans(pollution_data)
sds <- apply(pollution_data, 2, sd)

sweep function/II

# Subtract the mean
(centered_data <- sweep(pollution_data, 2, means, FUN = "-"))
         PM2.5     NO2    SO2
 [1,] -0.63510  1.0155 -0.643
 [2,] -0.30480  0.1512  0.207
 [3,]  1.48408  0.1921 -0.601
 [4,] -0.00412 -0.0979 -0.304
 [5,]  0.05466 -0.7645 -0.200
 [6,]  1.64044  1.5783 -1.262
 [7,]  0.38629  0.2892  1.262
 [8,] -1.33969 -2.1752  0.578
 [9,] -0.76148  0.4927 -0.714
[10,] -0.52029 -0.6814  1.678
# Divide by the standard deviation
(standardized_data <- sweep(centered_data, 2, sds, FUN = "/"))
         PM2.5     NO2    SO2
 [1,] -0.66588  0.9782 -0.691
 [2,] -0.31957  0.1456  0.222
 [3,]  1.55599  0.1851 -0.646
 [4,] -0.00432 -0.0943 -0.327
 [5,]  0.05731 -0.7364 -0.215
 [6,]  1.71993  1.5204 -1.356
 [7,]  0.40501  0.2786  1.356
 [8,] -1.40460 -2.0955  0.621
 [9,] -0.79838  0.4747 -0.767
[10,] -0.54550 -0.6564  1.803

sweep function/III

Can’t we do it without sweep function?

(standardized_data2 <- apply(
  pollution_data, 2, function(x) (x - mean(x)) / sd(x)))
         PM2.5     NO2    SO2
 [1,] -0.66588  0.9782 -0.691
 [2,] -0.31957  0.1456  0.222
 [3,]  1.55599  0.1851 -0.646
 [4,] -0.00432 -0.0943 -0.327
 [5,]  0.05731 -0.7364 -0.215
 [6,]  1.71993  1.5204 -1.356
 [7,]  0.40501  0.2786  1.356
 [8,] -1.40460 -2.0955  0.621
 [9,] -0.79838  0.4747 -0.767
[10,] -0.54550 -0.6564  1.803
# Are they equal?
all.equal(standardized_data, standardized_data2)
[1] TRUE

Then, why do we need sweep function?

sweep function/IV

  • apply function can ONLY calculate the result in a single step.

  • However, sweep function allows you to apply different set of operations on each mathematical step. Think that you will need mean of centered_data for another operation.

  • Also, you can keep mean and standard deviation values for later to de-standardize your data.

# Get centered data from standardized data
centered_data2 <- sweep(standardized_data, 2, sds, FUN = "*")
# Get original data from centered data
original_data <- sweep(centered_data2, 2, means, FUN = "+")
# Are they equal?
all.equal(original_data, pollution_data)
[1] TRUE

Negate function/I

Negate(f)
  • It takes a function that tests for a condition and returns a new function that tests for the opposite condition.
# We have a function that identifies days with poor
# air quality based on certain criteria.
is_poor_air_quality <- function(pm25, no2) {
    pm25 > 35 && no2 > 50
}
# Use Negate to create the OPPOSITE function
is_good_air_quality <- Negate(is_poor_air_quality)

Let’s test the function with some example data.

# Example data for a week
air_quality_data <- data.frame(
    day = 1:7,
    PM25 = c(30, 40, 36, 38, 50, 33, 45),
    NO2 = c(45, 55, 60, 48, 53, 49, 52)
)
head(air_quality_data)
  day PM25 NO2
1   1   30  45
2   2   40  55
3   3   36  60
4   4   38  48
5   5   50  53
6   6   33  49

Negate function/II

# Test the original function
(poor_quality_days <- apply(
  air_quality_data, 1,
  function(x) is_poor_air_quality(x["PM25"], x["NO2"])))
[1] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
# Test the negated function
(good_quality_days <- apply(
  air_quality_data, 1,
  function(x) is_good_air_quality(x["PM25"], x["NO2"])))
[1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE

Do we really need such a function?

print(is_poor_air_quality) # Original function
function(pm25, no2) {
    pm25 > 35 && no2 > 50
}
<bytecode: 0x1158b7e10>
print(is_good_air_quality) # Negated function
function (...) 
!f(...)
<bytecode: 0x105a34a40>
<environment: 0x115c35070>

xts package

  • What is xts?
  • Why do we need?
  • Matrix vs. xts
  • Metadata
  • Subsetting
  • Time alignment and merging

What is xts?

  • xts stands for eXtensible Time Series.
  • It’s an R package specifically designed for handling and analyzing time-series data.
  • Built on top of the zoo package, it inherits all zoo functionalities and adds time-based - indexing and sub-setting, plus some performance optimizations.
install.packages("xts")
library(xts)
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Why do we need?

Key Features

  • Time-based Indexing/Sub-setting/Aggregation/Plotting/Calculations etc…
  • Efficient time-based operations (e.g. extracting data for a specific time period)
  • Dealing with Large datasets
  • Handling Irregular Time Series
  • Metadata handling
  • Automatic Time Zone Handling
  • Periodicity Recognition

Limitations

  • Handling of Non-numeric Data: xts primarily handles numeric time series data. Working with non-numeric data types (like strings or factors) can be cumbersome and might require additional data manipulation.
  • Handling of Missing Data: xts does not provide any special handling of missing data. It is up to the user to decide how to handle missing data.
  • Time Zone Management: Although xts handles time zones, managing time zone conversions and daylight saving time can be complex and may require extra caution and work.

Creating xts Objects/I

Using vector

library(xts)

# Sample data vectors
dates <- seq(as.Date("2023-01-01"),
             by = "days", length.out = 5)
temperature <- c(22, 23, 21, 20, 19)

# Use just temperature vector to create xts object
xts_object <- xts(
  temperature,
  order.by = dates)
colnames(xts_object) <- "Temperature"
class(xts_object)
[1] "xts" "zoo"
print(xts_object)
           Temperature
2023-01-01          22
2023-01-02          23
2023-01-03          21
2023-01-04          20
2023-01-05          19

Converting matrix

humidity <- c(60, 65, 58, 55, 57)

# Create a sample matrix
sample_matrix <- cbind(
  temperature, humidity)
rownames(sample_matrix) <- as.character(dates)
# Convert the matrix to an xts object
xts_from_matrix <- as.xts(sample_matrix)
print(xts_from_matrix)
           temperature humidity
2023-01-01          22       60
2023-01-02          23       65
2023-01-03          21       58
2023-01-04          20       55
2023-01-05          19       57

Creating xts Objects/II

directly from matrix

sample_matrix <- matrix(
  c(temperature, humidity),
  ncol = 2,
  dimnames = list(
    dates, c("Temperature", "Humidity")))
# create xts object directly from matrix
xts_from_matrix <- xts(sample_matrix,
                       order.by = dates)
print(xts_from_matrix)
           Temperature Humidity
2023-01-01          22       60
2023-01-02          23       65
2023-01-03          21       58
2023-01-04          20       55
2023-01-05          19       57

Using data.frame

# Create a sample data.frame
sample_df <- data.frame(
  date = dates,
  temperature = temperature,
  humidity = humidity
)
print(sample_df)
        date temperature humidity
1 2023-01-01          22       60
2 2023-01-02          23       65
3 2023-01-03          21       58
4 2023-01-04          20       55
5 2023-01-05          19       57
# Create xts object from data.frame
xts_from_df <- xts(
  sample_df[, -1], order.by = sample_df$date)
print(xts_from_df)
           temperature humidity
2023-01-01          22       60
2023-01-02          23       65
2023-01-03          21       58
2023-01-04          20       55
2023-01-05          19       57

Matrix vs. xts/I

Matrix

data(sample_matrix)
class(sample_matrix)
[1] "matrix" "array" 
head(sample_matrix)
           Open High  Low Close
2007-01-02 50.0 50.1 50.0  50.1
2007-01-03 50.2 50.4 50.2  50.4
2007-01-04 50.4 50.4 50.3  50.3
2007-01-05 50.4 50.4 50.2  50.3
2007-01-06 50.2 50.2 50.1  50.2
2007-01-07 50.1 50.2 50.0  50.0
colnames(sample_matrix)
[1] "Open"  "High"  "Low"   "Close"
rownames(sample_matrix)
2007-01-02 2007-01-03 2007-01-04 2007-01-05 ...
index(sample_matrix)
1 2 3 4 5 6 ... 175 176 177 178 179 180

xts

sample.xts <- as.xts(sample_matrix)
class(sample.xts)
[1] "xts" "zoo"
head(sample.xts)
           Open High  Low Close
2007-01-02 50.0 50.1 50.0  50.1
2007-01-03 50.2 50.4 50.2  50.4
2007-01-04 50.4 50.4 50.3  50.3
2007-01-05 50.4 50.4 50.2  50.3
2007-01-06 50.2 50.2 50.1  50.2
2007-01-07 50.1 50.2 50.0  50.0
colnames(sample.xts)
[1] "Open"  "High"  "Low"   "Close"
rownames(sample.xts)
NULL
index(sample.xts)
2007-01-02 2007-01-03 ... 2007-06-29 2007-06-30

Matrix vs. xts/II

Matrix

sample_matrix[31:58,]
             Open   High    Low  Close
2007-02-01 50.224 50.414 50.191 50.358
2007-02-02 50.445 50.535 50.361 50.369
2007-02-03 50.372 50.469 50.299 50.431
2007-02-04 50.482 50.555 50.402 50.555
              ...    ...    ...    ...
2007-02-25  50.79 50.932  50.79 50.848
2007-02-26 50.882 50.882 50.755 50.755
2007-02-27 50.743 50.789 50.619 50.692
2007-02-28 50.694 50.771 50.599 50.771
sample_matrix[31:89,]
             Open   High    Low  Close
2007-02-01 50.224 50.414 50.191 50.358
2007-02-02 50.445 50.535 50.361 50.369
2007-02-03 50.372 50.469 50.299 50.431
2007-02-04 50.482 50.555 50.402 50.555
              ...    ...    ...    ...
2007-03-28 48.331 48.536 48.331 48.536
2007-03-29 48.592   48.7 48.574   48.7
2007-03-30 48.746 49.002 48.746 48.935
2007-03-31 48.956 49.097 48.956 48.975

xts

sample.xts['2007-02']
             Open   High    Low  Close
2007-02-01 50.224 50.414 50.191 50.358
2007-02-02 50.445 50.535 50.361 50.369
2007-02-03 50.372 50.469 50.299 50.431
2007-02-04 50.482 50.555 50.402 50.555
              ...    ...    ...    ...
2007-02-25  50.79 50.932  50.79 50.848
2007-02-26 50.882 50.882 50.755 50.755
2007-02-27 50.743 50.789 50.619 50.692
2007-02-28 50.694 50.771 50.599 50.771
sample.xts['2007-02/2007-03']
             Open   High    Low  Close
2007-02-01 50.224 50.414 50.191 50.358
2007-02-02 50.445 50.535 50.361 50.369
2007-02-03 50.372 50.469 50.299 50.431
2007-02-04 50.482 50.555 50.402 50.555
              ...    ...    ...    ...
2007-03-28 48.331 48.536 48.331 48.536
2007-03-29 48.592   48.7 48.574   48.7
2007-03-30 48.746 49.002 48.746 48.935
2007-03-31 48.956 49.097 48.956 48.975

Metadata

# Adding and viewing meta-data
attr(sample.xts, "description") <- "Sample xts data"
attributes(sample.xts)
$dim
[1] 180   4

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "Open"  "High"  "Low"   "Close"


$index
  [1] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
  [7] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [13] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [19] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [25] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [31] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [37] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [43] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [49] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [55] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [61] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [67] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [73] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [79] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [85] 1.17e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
 [91] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
 [97] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[103] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[109] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[115] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[121] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[127] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[133] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[139] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[145] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[151] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[157] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[163] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[169] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[175] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
attr(,"tzone")
[1] ""
attr(,"tclass")
[1] "POSIXct" "POSIXt" 

$class
[1] "xts" "zoo"

$description
[1] "Sample xts data"
# do some math
sample.xts <- sample.xts * sample_matrix
attributes(sample.xts)
$index
  [1] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
  [7] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [13] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [19] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [25] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [31] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [37] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [43] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [49] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [55] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [61] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [67] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [73] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [79] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09
 [85] 1.17e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
 [91] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
 [97] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[103] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[109] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[115] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[121] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[127] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[133] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[139] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[145] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[151] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[157] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[163] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[169] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
[175] 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09 1.18e+09
attr(,"tzone")
[1] ""
attr(,"tclass")
[1] "POSIXct" "POSIXt" 

$class
[1] "xts" "zoo"

$description
[1] "Sample xts data"

$dim
[1] 180   4

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "Open"  "High"  "Low"   "Close"
  • New description metadata is still there.

Periodicity/I

Estimate the periodicity of a time-series-like object by calculating the median time between observations in days.

p <- periodicity(sample.xts)
print(p)
Daily periodicity from 2007-01-02 to 2007-06-30 
# Convert to a different time zone
sample.xts.tz <- as.xts(sample_matrix, tzone = "GMT")
p.tz <- periodicity(sample.xts.tz)
print(p)
Daily periodicity from 2007-01-02 to 2007-06-30 
unclass(p)
$difftime
Time difference of 86400 days

$frequency
[1] 86400

$start
[1] "2007-01-02 EET"

$end
[1] "2007-06-30 EEST"

$units
[1] "days"

$scale
[1] "daily"

$label
[1] "day"
unclass(p.tz)
$difftime
Time difference of 86400 days

$frequency
[1] 86400

$start
[1] "2007-01-01 22:00:00 GMT"

$end
[1] "2007-06-29 21:00:00 GMT"

$units
[1] "days"

$scale
[1] "daily"

$label
[1] "day"

Periodicity/II

# Change system time zone information
Sys.setenv(TZ = 'GMT')
unclass(p)
$difftime
Time difference of 86400 days

$frequency
[1] 86400

$start
[1] "2007-01-01 22:00:00 GMT"

$end
[1] "2007-06-29 21:00:00 GMT"

$units
[1] "days"

$scale
[1] "daily"

$label
[1] "day"
unclass(p.tz)
$difftime
Time difference of 86400 days

$frequency
[1] 86400

$start
[1] "2007-01-01 22:00:00 GMT"

$end
[1] "2007-06-29 21:00:00 GMT"

$units
[1] "days"

$scale
[1] "daily"

$label
[1] "day"
  • Actually both xts objects are the same. Just Time-zone information are different.

Time-based Indexing

A more complex xts object.

library(xts)

# Generate sample data
set.seed(123)  # For reproducibility
dates <- seq(as.Date("2020-01-01"),
             by = "day", length.out = 730)
data <- matrix(rnorm(730*3), ncol = 3)

# Create xts object
xts_data <- xts(data, order.by = dates)
cn <- c("Temperature", "Humidity", "WindSpeed")
colnames(xts_data) <- cn
           Temperature Humidity WindSpeed
2020-01-01       -0.56    1.739    -0.868
2020-01-02       -0.23    0.881      0.73
2020-01-03       1.559   -1.944       0.5
2020-01-04       0.071      1.4     0.634
                   ...      ...       ...
2021-12-27       0.564    0.869    -0.847
2021-12-28       0.189    1.369     1.008
2021-12-29      -0.733    0.763    -0.611
2021-12-30       0.986    0.421     0.333

Data for the year 2021

data_2021 <- xts_data["2021"]

Data for all January months

data_january <- xts_data[.indexmon(xts_data) == 0]

Data from March to April 2021

data_mar_apr_2021 <- xts_data["2021-03/2021-04"]

Data for all Mondays

data_mondays <- xts_data[.indexwday(xts_data) == 1]

Last day of each month

eom_dates <- endpoints(xts_data, on = "months")
data_eom <- xts_data[eom_dates]

Days when temperature was above 1 standard deviation of the mean

mean_plus_1std <- mean(xts_data$Temperature) + sd(xts_data$Temperature)
idx <- xts_data$Temperature > mean_plus_1std
high_temp_days <- xts_data[idx]

Time alignment and merging

# First time series
set.seed(123) # For reproducibility
dates1 <- seq(as.Date("2023-01-01"),
              by = "days", length.out = 6)
xts1 <- xts(rnorm(length(dates1)), order.by = dates1)
print(xts1)
              [,1]
2023-01-01 -0.5605
2023-01-02 -0.2302
2023-01-03  1.5587
2023-01-04  0.0705
2023-01-05  0.1293
2023-01-06  1.7151
# Second time series, offset by 2 days
dates2 <- seq(as.Date("2023-01-03"),
              by = "days", length.out = 6)
xts2 <- xts(rnorm(length(dates2)), order.by=dates2)
print(xts2)
             [,1]
2023-01-03  0.461
2023-01-04 -1.265
2023-01-05 -0.687
2023-01-06 -0.446
2023-01-07  1.224
2023-01-08  0.360
# Merging the two series
merged_xts <- merge(xts1, xts2)
              xts1   xts2
2023-01-01 -0.5605     NA
2023-01-02 -0.2302     NA
2023-01-03  1.5587  0.461
2023-01-04  0.0705 -1.265
2023-01-05  0.1293 -0.687
2023-01-06  1.7151 -0.446
2023-01-07      NA  1.224
2023-01-08      NA  0.360
  • The first two days will only have data from xts1.
  • The next three days will have data from both xts1 and xts2.
  • The last two days will only have data from xts2.

Data Manipulation

  • What is data manipulation?
  • Why do we need?
  • Overview

What is data manipulation?

Data manipulation/transformation is combination of following operations:

  • Creating data: Creating/transforming data in/to specific formats.
  • Subsetting/Filtering: Selecting a subset of the data based on certain criteria.
  • Sorting: Arranging data in a specific order (ascending or descending).
  • Merging and joining: Combining data from different sources based on common identifiers or keys.
  • Reshaping: Changing shape of the data from wide to long formats (or vice versa).(wide to long or vice versa)
  • Transforming: Creating new variables from existing ones, such as computing new columns as linear combinations of other columns or normalizing data etc.
  • Aggregating: Summarizing data, which could involve computing sums, averages, counts, maxima, minima, etc., often grouped by certain categories.
  • Cleaning: Improving data quality by handling missing values, removing duplicates, correcting errors, or standardizing formats.

Why do we need?

  • Easier exploration: Data manipulation is a crucial step in the data analysis process, as it helps in preparing the raw data into a format that is more suitable for exploration, analysis, and visualization.
  • Supporting Decision Making: Effective data manipulation leads to easier analysis and more accurate results, thereby forming a foundation for data-driven decision-making.
  • Improving Efficiency: To automate repetitive tasks and streamline data processing workflows and to enhance performance and speed of data analysis.
  • Data Reduction: To reduce the size of the data by removing unnecessary information, which can be useful for speeding up analysis and reducing storage requirements or to focus on relevant information.

Overview

  • Data manipulation with time series data (xts)
    • Irregularr to Regular time-series
    • Imputation of missing values
    • Aggregation in time
    • Averages for the same months/days/hours
      • Using sapply
      • Using aggregate
      • Using split-Apply-Combine
  • Reshaping data
    • Understanding Long and Wide Data Formats
    • Using reshape function
    • Using tidyr package

Data Manipulation with xts

  • Irregularr to Regular time-series
  • Imputation of missing values
  • Aggregation in time
  • Averages for the same months/days/hours
    • Using sapply
    • Using aggregate
    • Using split-Apply-Combine

Irregular to Regular time-series

# Sample irregular time series
set.seed(123)  # For reproducibility
endp <- c(0, 1, 3, 7, 10)
ir_dates <- as.Date("2023-01-01") + endp
ir_data <- matrix(rnorm(length(ir_dates) * 3),
                  ncol = 3)
ir_xts <- xts(ir_data, order.by = ir_dates)
print(ir_xts)
              [,1]   [,2]   [,3]
2023-01-01 -0.5605  1.715  1.224
2023-01-02 -0.2302  0.461  0.360
2023-01-04  1.5587 -1.265  0.401
2023-01-08  0.0705 -0.687  0.111
2023-01-11  0.1293 -0.446 -0.556
(tdif <- unclass(periodicity(ir_xts))$difftime)
Time difference of 216000 days
(tdif/60/60/24) # How?
Time difference of 2.5 days
tdif2 <- diff(endp)
mean(tdif2)
[1] 2.5
# Define the range for the regular series
reg_dates <- seq(start(ir_xts),
                 end(ir_xts), by = "day")

# Convert to a regular time series by merging
merged_xts <- merge(ir_xts, xts(, reg_dates))
print(merged_xts)
            ir_xts ir_xts.1 ir_xts.2
2023-01-01 -0.5605    1.715    1.224
2023-01-02 -0.2302    0.461    0.360
2023-01-03      NA       NA       NA
2023-01-04  1.5587   -1.265    0.401
2023-01-05      NA       NA       NA
2023-01-06      NA       NA       NA
2023-01-07      NA       NA       NA
2023-01-08  0.0705   -0.687    0.111
2023-01-09      NA       NA       NA
2023-01-10      NA       NA       NA
2023-01-11  0.1293   -0.446   -0.556
(tdif <- unclass(periodicity(merged_xts))$difftime)
Time difference of 86400 days
(tdif/60/60/24)
Time difference of 1 days

Imputation of missing values

What if I want to fill the missing values with a method? Which method do you think is the best?

print(merged_xts)
            ir_xts ir_xts.1 ir_xts.2
2023-01-01 -0.5605    1.715    1.224
2023-01-02 -0.2302    0.461    0.360
2023-01-03      NA       NA       NA
2023-01-04  1.5587   -1.265    0.401
2023-01-05      NA       NA       NA
2023-01-06      NA       NA       NA
2023-01-07      NA       NA       NA
2023-01-08  0.0705   -0.687    0.111
2023-01-09      NA       NA       NA
2023-01-10      NA       NA       NA
2023-01-11  0.1293   -0.446   -0.556
(reg_xts1 <- na.approx(merged_xts))
            ir_xts ir_xts.1 ir_xts.2
2023-01-01 -0.5605    1.715    1.224
2023-01-02 -0.2302    0.461    0.360
2023-01-03  0.6643   -0.402    0.380
2023-01-04  1.5587   -1.265    0.401
2023-01-05  1.1867   -1.121    0.328
2023-01-06  0.8146   -0.976    0.256
2023-01-07  0.4426   -0.831    0.183
2023-01-08  0.0705   -0.687    0.111
2023-01-09  0.0901   -0.606   -0.111
2023-01-10  0.1097   -0.526   -0.334
2023-01-11  0.1293   -0.446   -0.556
  • na.trim: Removes leading and trailing NAs.
  • na.fill: Replaces NA values with a constant value.
  • na.approx: Linear interpolation
  • na.locf: Last observation carried forward
  • na.spline: Spline interpolation
  • na.aggregate: Replaces NA values by applying a statistical function (like mean, median) over a window.
  • na.kalman: Kalman smoothing and state estimation
  • na.interp: Linear interpolation (forecast package)

Aggregation in time/I

# Sample xts object
set.seed(123)  # For reproducibility
# A week of hourly data
dates <- seq(as.POSIXct("2023-01-01"),
             by = "hour", length.out = 168)
data <- matrix(rnorm(168*2), ncol = 2)
colnames(data) <- c("Temperature", "Humidity")
xts_data <- xts(data, order.by = dates)
                    Temperature Humidity
2023-01-01                -0.56    0.517
2023-01-01 01:00:00       -0.23    0.369
2023-01-01 02:00:00       1.559   -0.215
2023-01-01 03:00:00       0.071    0.065
                            ...      ...
2023-01-07 20:00:00      -0.417   -0.722
2023-01-07 21:00:00       0.298    1.519
2023-01-07 22:00:00       0.637    0.377
2023-01-07 23:00:00      -0.484   -2.052
# Daily aggregation - mean
(daily_mean <- apply.daily(xts_data, mean))
                    Temperature Humidity
2023-01-01 23:00:00    -0.00868  0.05040
2023-01-02 23:00:00     0.05132  0.09639
2023-01-03 23:00:00     0.05600 -0.18550
2023-01-04 23:00:00     0.17563 -0.00535
2023-01-05 23:00:00    -0.19707  0.21486
2023-01-06 23:00:00    -0.14226  0.27335
2023-01-07 23:00:00     0.04996  0.04417
# Monthly aggregation - sum
(monthly_sum <- apply.monthly(xts_data, colSums))
                    Temperature Humidity
2023-01-07 23:00:00      -0.362     11.7
# Yearly aggregation - maximum
(yearly_max <- apply.yearly(xts_data, max))
                    [,1]
2023-01-07 23:00:00 3.24
  • Why number of columns is 1 in the above example?
# Custom period aggregation - median
# Define a custom endpoint (e.g., every 3 days)
ep_3day <- endpoints(
  xts_data, on = "days", k = 3)
(three_day_median <- period.apply(
  xts_data, INDEX = ep_3day,
  FUN = function(x) apply(x, 2, median)))
                    Temperature Humidity
2023-01-02 23:00:00     -0.1349  -0.0440
2023-01-05 23:00:00     -0.0357  -0.0894
2023-01-07 23:00:00     -0.1878   0.1900

Aggregation in time/II

# Sample daily data for 2 years
set.seed(123)  # For reproducibility
dates <- seq(as.Date("2023-01-01"), by = "day", length.out = 365 * 2)
data <- matrix(rnorm(length(dates)*2), ncol = 2)
colnames(data) <- c("V1", "V2")
(daily_xts <- xts(data, order.by = dates))
               V1     V2
2023-01-01  -0.56  1.739
2023-01-02  -0.23  0.881
2023-01-03  1.559 -1.944
2023-01-04  0.071    1.4
              ...    ...
2024-12-27  0.564  0.869
2024-12-28  0.189  1.369
2024-12-29 -0.733  0.763
2024-12-30  0.986  0.421
(yearly_max <- apply.yearly(
  daily_xts, function(x) apply(x, 2, max)))
             V1   V2
2023-12-31 3.24 2.83
2024-12-30 2.46 3.39
# Aggregate to monthly data
(monthly_xts <- aggregate(
  daily_xts, as.yearmon,
  function(x) sum(x, na.rm = TRUE)))
              V1     V2
Jan 2023 -0.9866 12.683
Feb 2023  4.7077 -2.514
Mar 2023  0.9486  1.891
Apr 2023 -2.8167 -0.689
May 2023 -4.7197 10.076
Jun 2023  2.7604  4.105
Jul 2023  1.8674  3.531
Aug 2023 -2.8472 -9.264
Sep 2023  3.8096 -2.747
Oct 2023  4.1499  9.570
Nov 2023  6.1588  5.995
Dec 2023 -1.3990 -3.681
Jan 2024 -3.3796  3.922
Feb 2024 -3.0369 -1.370
Mar 2024  0.0967 -4.181
Apr 2024  9.8611  2.994
May 2024 -2.4736 -9.670
Jun 2024  3.4262 11.397
Jul 2024  3.3914 -5.004
Aug 2024 -8.7452 -2.128
Sep 2024 -6.6526  0.583
Oct 2024  3.8253 -2.573
Nov 2024 -9.8455  6.135
Dec 2024 -1.0988  0.361
class(monthly_xts)
[1] "zoo"

Aggregation in time/III

Rolling aggregation

Rolling mean with a 24-hour window and calculate FUN at every (by-th) time point.

Align right

(rolling_mean1 <- rollapply(
  xts_data, width = 24, by = 1,
  align = 'right', FUN = mean))
                    Temperature  Humidity
2023-01-01 00:00:00          NA        NA
2023-01-01 01:00:00          NA        NA
2023-01-01 02:00:00          NA        NA
2023-01-01 03:00:00          NA        NA
2023-01-01 04:00:00          NA        NA
2023-01-01 05:00:00          NA        NA
2023-01-01 06:00:00          NA        NA
2023-01-01 07:00:00          NA        NA
2023-01-01 08:00:00          NA        NA
2023-01-01 09:00:00          NA        NA
2023-01-01 10:00:00          NA        NA
2023-01-01 11:00:00          NA        NA
2023-01-01 12:00:00          NA        NA
2023-01-01 13:00:00          NA        NA
2023-01-01 14:00:00          NA        NA
2023-01-01 15:00:00          NA        NA
2023-01-01 16:00:00          NA        NA
2023-01-01 17:00:00          NA        NA
2023-01-01 18:00:00          NA        NA
2023-01-01 19:00:00          NA        NA
2023-01-01 20:00:00          NA        NA
2023-01-01 21:00:00          NA        NA
2023-01-01 22:00:00          NA        NA
2023-01-01 23:00:00   -0.008676  0.050402
2023-01-02 00:00:00   -0.011366  0.032807
2023-01-02 01:00:00   -0.072054 -0.019874
2023-01-02 02:00:00   -0.102092 -0.065516
2023-01-02 03:00:00   -0.098640  0.014980
2023-01-02 04:00:00   -0.151449  0.041429
2023-01-02 05:00:00   -0.170668 -0.099392
2023-01-02 06:00:00   -0.172103 -0.093969
2023-01-02 07:00:00   -0.131687 -0.097697
2023-01-02 08:00:00   -0.065771 -0.007655
2023-01-02 09:00:00   -0.010613  0.034093
2023-01-02 10:00:00   -0.027384  0.004856
2023-01-02 11:00:00   -0.013683  0.046588
2023-01-02 12:00:00   -0.007302  0.073629
2023-01-02 13:00:00   -0.014493  0.001153
2023-01-02 14:00:00   -0.004082 -0.017137
2023-01-02 15:00:00   -0.094389 -0.005849
2023-01-02 16:00:00   -0.144079  0.072783
2023-01-02 17:00:00   -0.070800  0.078748
2023-01-02 18:00:00   -0.152748  0.037470
2023-01-02 19:00:00   -0.042675  0.044093
2023-01-02 20:00:00    0.052149  0.064027
2023-01-02 21:00:00    0.014435  0.063328
2023-01-02 22:00:00    0.040398  0.013038
2023-01-02 23:00:00    0.051325  0.096388
2023-01-03 00:00:00    0.109867  0.074065
2023-01-03 01:00:00    0.176672  0.081244
2023-01-03 02:00:00    0.152319  0.084349
2023-01-03 03:00:00    0.144739 -0.052398
2023-01-03 04:00:00    0.190375 -0.101343
2023-01-03 05:00:00    0.195158 -0.023457
2023-01-03 06:00:00    0.167982  0.048252
2023-01-03 07:00:00    0.243463  0.127130
2023-01-03 08:00:00    0.141634  0.020360
2023-01-03 09:00:00    0.129404 -0.031834
2023-01-03 10:00:00    0.100332 -0.050145
2023-01-03 11:00:00    0.080637 -0.102662
2023-01-03 12:00:00    0.073375 -0.048537
2023-01-03 13:00:00    0.055024 -0.071010
2023-01-03 14:00:00    0.053889  0.043319
2023-01-03 15:00:00    0.027302  0.064331
2023-01-03 16:00:00    0.011590  0.004483
2023-01-03 17:00:00    0.032900 -0.024038
2023-01-03 18:00:00    0.104300 -0.052939
2023-01-03 19:00:00    0.016136 -0.117969
2023-01-03 20:00:00    0.004232 -0.176944
2023-01-03 21:00:00    0.136448 -0.137984
2023-01-03 22:00:00    0.132775 -0.083116
2023-01-03 23:00:00    0.056004 -0.185501
2023-01-04 00:00:00    0.065411 -0.199979
2023-01-04 01:00:00    0.039335 -0.190776
2023-01-04 02:00:00    0.000113 -0.076929
2023-01-04 03:00:00    0.044034 -0.070786
2023-01-04 04:00:00    0.033955 -0.054331
2023-01-04 05:00:00   -0.073933 -0.000816
2023-01-04 06:00:00   -0.056972 -0.051267
2023-01-04 07:00:00   -0.125945 -0.137409
2023-01-04 08:00:00   -0.061174 -0.149956
2023-01-04 09:00:00   -0.069479 -0.132218
2023-01-04 10:00:00   -0.090084 -0.118510
2023-01-04 11:00:00   -0.072233 -0.112037
2023-01-04 12:00:00   -0.097238 -0.163228
2023-01-04 13:00:00   -0.062483 -0.117141
2023-01-04 14:00:00   -0.002898 -0.132007
2023-01-04 15:00:00    0.057675 -0.131933
2023-01-04 16:00:00    0.088753 -0.095839
2023-01-04 17:00:00    0.123973 -0.038786
2023-01-04 18:00:00    0.146693 -0.019588
2023-01-04 19:00:00    0.167334 -0.028583
2023-01-04 20:00:00    0.138854 -0.042675
2023-01-04 21:00:00    0.027271 -0.080543
2023-01-04 22:00:00    0.104424 -0.092091
2023-01-04 23:00:00    0.175629 -0.005352
2023-01-05 00:00:00    0.224862  0.123052
2023-01-05 01:00:00    0.318271  0.208459
2023-01-05 02:00:00    0.337117  0.140575
2023-01-05 03:00:00    0.251617  0.114774
2023-01-05 04:00:00    0.233883  0.106036
2023-01-05 05:00:00    0.295449  0.030488
2023-01-05 06:00:00    0.277616  0.069904
2023-01-05 07:00:00    0.268922  0.166669
2023-01-05 08:00:00    0.229031  0.222881
2023-01-05 09:00:00    0.211102  0.144517
2023-01-05 10:00:00    0.193842  0.195569
2023-01-05 11:00:00    0.097495  0.200374
2023-01-05 12:00:00    0.090839  0.221987
2023-01-05 13:00:00    0.115307  0.221323
2023-01-05 14:00:00    0.045632  0.172558
2023-01-05 15:00:00    0.052831  0.177277
2023-01-05 16:00:00   -0.001000  0.062765
2023-01-05 17:00:00   -0.051182  0.067171
2023-01-05 18:00:00   -0.070936  0.087991
2023-01-05 19:00:00   -0.081238  0.140793
2023-01-05 20:00:00   -0.086782  0.167429
2023-01-05 21:00:00   -0.087315  0.193425
2023-01-05 22:00:00   -0.179413  0.200669
2023-01-05 23:00:00   -0.197074  0.214863
2023-01-06 00:00:00   -0.283311  0.110191
2023-01-06 01:00:00   -0.386648  0.052711
2023-01-06 02:00:00   -0.397267  0.106942
2023-01-06 03:00:00   -0.365170  0.224055
2023-01-06 04:00:00   -0.258742  0.287973
2023-01-06 05:00:00   -0.296611  0.260195
2023-01-06 06:00:00   -0.276524  0.308423
2023-01-06 07:00:00   -0.258795  0.271097
2023-01-06 08:00:00   -0.259221  0.320369
2023-01-06 09:00:00   -0.260316  0.322218
2023-01-06 10:00:00   -0.167422  0.287691
2023-01-06 11:00:00   -0.079112  0.358377
2023-01-06 12:00:00   -0.061551  0.321292
2023-01-06 13:00:00   -0.117447  0.286823
2023-01-06 14:00:00   -0.179026  0.229877
2023-01-06 15:00:00   -0.157219  0.184994
2023-01-06 16:00:00   -0.150667  0.236258
2023-01-06 17:00:00   -0.117521  0.219369
2023-01-06 18:00:00   -0.059617  0.119360
2023-01-06 19:00:00   -0.132327  0.139261
2023-01-06 20:00:00   -0.107489  0.185866
2023-01-06 21:00:00   -0.091718  0.265180
2023-01-06 22:00:00   -0.121820  0.310187
2023-01-06 23:00:00   -0.142259  0.273351
2023-01-07 00:00:00   -0.213891  0.210531
2023-01-07 01:00:00   -0.196534  0.178465
2023-01-07 02:00:00   -0.237001  0.115114
2023-01-07 03:00:00   -0.197667  0.100503
2023-01-07 04:00:00   -0.186990  0.048381
2023-01-07 05:00:00   -0.213452  0.019998
2023-01-07 06:00:00   -0.190437  0.006746
2023-01-07 07:00:00   -0.161642  0.041942
2023-01-07 08:00:00   -0.107723 -0.025951
2023-01-07 09:00:00   -0.146768  0.081091
2023-01-07 10:00:00   -0.211934  0.024435
2023-01-07 11:00:00   -0.242430 -0.000111
2023-01-07 12:00:00   -0.220690  0.007903
2023-01-07 13:00:00   -0.218605  0.067754
2023-01-07 14:00:00   -0.092345  0.104326
2023-01-07 15:00:00   -0.155092  0.174554
2023-01-07 16:00:00   -0.050369  0.248415
2023-01-07 17:00:00   -0.124916  0.234920
2023-01-07 18:00:00   -0.256968  0.361244
2023-01-07 19:00:00   -0.061763  0.302893
2023-01-07 20:00:00   -0.108373  0.221298
2023-01-07 21:00:00   -0.085022  0.199700
2023-01-07 22:00:00    0.007008  0.161209
2023-01-07 23:00:00    0.049962  0.044167

Align left

(rolling_mean1 <- rollapply(
  xts_data, width = 24, by = 1,
  align = 'left', FUN = mean))
                    Temperature  Humidity
2023-01-01 00:00:00   -0.008676  0.050402
2023-01-01 01:00:00   -0.011366  0.032807
2023-01-01 02:00:00   -0.072054 -0.019874
2023-01-01 03:00:00   -0.102092 -0.065516
2023-01-01 04:00:00   -0.098640  0.014980
2023-01-01 05:00:00   -0.151449  0.041429
2023-01-01 06:00:00   -0.170668 -0.099392
2023-01-01 07:00:00   -0.172103 -0.093969
2023-01-01 08:00:00   -0.131687 -0.097697
2023-01-01 09:00:00   -0.065771 -0.007655
2023-01-01 10:00:00   -0.010613  0.034093
2023-01-01 11:00:00   -0.027384  0.004856
2023-01-01 12:00:00   -0.013683  0.046588
2023-01-01 13:00:00   -0.007302  0.073629
2023-01-01 14:00:00   -0.014493  0.001153
2023-01-01 15:00:00   -0.004082 -0.017137
2023-01-01 16:00:00   -0.094389 -0.005849
2023-01-01 17:00:00   -0.144079  0.072783
2023-01-01 18:00:00   -0.070800  0.078748
2023-01-01 19:00:00   -0.152748  0.037470
2023-01-01 20:00:00   -0.042675  0.044093
2023-01-01 21:00:00    0.052149  0.064027
2023-01-01 22:00:00    0.014435  0.063328
2023-01-01 23:00:00    0.040398  0.013038
2023-01-02 00:00:00    0.051325  0.096388
2023-01-02 01:00:00    0.109867  0.074065
2023-01-02 02:00:00    0.176672  0.081244
2023-01-02 03:00:00    0.152319  0.084349
2023-01-02 04:00:00    0.144739 -0.052398
2023-01-02 05:00:00    0.190375 -0.101343
2023-01-02 06:00:00    0.195158 -0.023457
2023-01-02 07:00:00    0.167982  0.048252
2023-01-02 08:00:00    0.243463  0.127130
2023-01-02 09:00:00    0.141634  0.020360
2023-01-02 10:00:00    0.129404 -0.031834
2023-01-02 11:00:00    0.100332 -0.050145
2023-01-02 12:00:00    0.080637 -0.102662
2023-01-02 13:00:00    0.073375 -0.048537
2023-01-02 14:00:00    0.055024 -0.071010
2023-01-02 15:00:00    0.053889  0.043319
2023-01-02 16:00:00    0.027302  0.064331
2023-01-02 17:00:00    0.011590  0.004483
2023-01-02 18:00:00    0.032900 -0.024038
2023-01-02 19:00:00    0.104300 -0.052939
2023-01-02 20:00:00    0.016136 -0.117969
2023-01-02 21:00:00    0.004232 -0.176944
2023-01-02 22:00:00    0.136448 -0.137984
2023-01-02 23:00:00    0.132775 -0.083116
2023-01-03 00:00:00    0.056004 -0.185501
2023-01-03 01:00:00    0.065411 -0.199979
2023-01-03 02:00:00    0.039335 -0.190776
2023-01-03 03:00:00    0.000113 -0.076929
2023-01-03 04:00:00    0.044034 -0.070786
2023-01-03 05:00:00    0.033955 -0.054331
2023-01-03 06:00:00   -0.073933 -0.000816
2023-01-03 07:00:00   -0.056972 -0.051267
2023-01-03 08:00:00   -0.125945 -0.137409
2023-01-03 09:00:00   -0.061174 -0.149956
2023-01-03 10:00:00   -0.069479 -0.132218
2023-01-03 11:00:00   -0.090084 -0.118510
2023-01-03 12:00:00   -0.072233 -0.112037
2023-01-03 13:00:00   -0.097238 -0.163228
2023-01-03 14:00:00   -0.062483 -0.117141
2023-01-03 15:00:00   -0.002898 -0.132007
2023-01-03 16:00:00    0.057675 -0.131933
2023-01-03 17:00:00    0.088753 -0.095839
2023-01-03 18:00:00    0.123973 -0.038786
2023-01-03 19:00:00    0.146693 -0.019588
2023-01-03 20:00:00    0.167334 -0.028583
2023-01-03 21:00:00    0.138854 -0.042675
2023-01-03 22:00:00    0.027271 -0.080543
2023-01-03 23:00:00    0.104424 -0.092091
2023-01-04 00:00:00    0.175629 -0.005352
2023-01-04 01:00:00    0.224862  0.123052
2023-01-04 02:00:00    0.318271  0.208459
2023-01-04 03:00:00    0.337117  0.140575
2023-01-04 04:00:00    0.251617  0.114774
2023-01-04 05:00:00    0.233883  0.106036
2023-01-04 06:00:00    0.295449  0.030488
2023-01-04 07:00:00    0.277616  0.069904
2023-01-04 08:00:00    0.268922  0.166669
2023-01-04 09:00:00    0.229031  0.222881
2023-01-04 10:00:00    0.211102  0.144517
2023-01-04 11:00:00    0.193842  0.195569
2023-01-04 12:00:00    0.097495  0.200374
2023-01-04 13:00:00    0.090839  0.221987
2023-01-04 14:00:00    0.115307  0.221323
2023-01-04 15:00:00    0.045632  0.172558
2023-01-04 16:00:00    0.052831  0.177277
2023-01-04 17:00:00   -0.001000  0.062765
2023-01-04 18:00:00   -0.051182  0.067171
2023-01-04 19:00:00   -0.070936  0.087991
2023-01-04 20:00:00   -0.081238  0.140793
2023-01-04 21:00:00   -0.086782  0.167429
2023-01-04 22:00:00   -0.087315  0.193425
2023-01-04 23:00:00   -0.179413  0.200669
2023-01-05 00:00:00   -0.197074  0.214863
2023-01-05 01:00:00   -0.283311  0.110191
2023-01-05 02:00:00   -0.386648  0.052711
2023-01-05 03:00:00   -0.397267  0.106942
2023-01-05 04:00:00   -0.365170  0.224055
2023-01-05 05:00:00   -0.258742  0.287973
2023-01-05 06:00:00   -0.296611  0.260195
2023-01-05 07:00:00   -0.276524  0.308423
2023-01-05 08:00:00   -0.258795  0.271097
2023-01-05 09:00:00   -0.259221  0.320369
2023-01-05 10:00:00   -0.260316  0.322218
2023-01-05 11:00:00   -0.167422  0.287691
2023-01-05 12:00:00   -0.079112  0.358377
2023-01-05 13:00:00   -0.061551  0.321292
2023-01-05 14:00:00   -0.117447  0.286823
2023-01-05 15:00:00   -0.179026  0.229877
2023-01-05 16:00:00   -0.157219  0.184994
2023-01-05 17:00:00   -0.150667  0.236258
2023-01-05 18:00:00   -0.117521  0.219369
2023-01-05 19:00:00   -0.059617  0.119360
2023-01-05 20:00:00   -0.132327  0.139261
2023-01-05 21:00:00   -0.107489  0.185866
2023-01-05 22:00:00   -0.091718  0.265180
2023-01-05 23:00:00   -0.121820  0.310187
2023-01-06 00:00:00   -0.142259  0.273351
2023-01-06 01:00:00   -0.213891  0.210531
2023-01-06 02:00:00   -0.196534  0.178465
2023-01-06 03:00:00   -0.237001  0.115114
2023-01-06 04:00:00   -0.197667  0.100503
2023-01-06 05:00:00   -0.186990  0.048381
2023-01-06 06:00:00   -0.213452  0.019998
2023-01-06 07:00:00   -0.190437  0.006746
2023-01-06 08:00:00   -0.161642  0.041942
2023-01-06 09:00:00   -0.107723 -0.025951
2023-01-06 10:00:00   -0.146768  0.081091
2023-01-06 11:00:00   -0.211934  0.024435
2023-01-06 12:00:00   -0.242430 -0.000111
2023-01-06 13:00:00   -0.220690  0.007903
2023-01-06 14:00:00   -0.218605  0.067754
2023-01-06 15:00:00   -0.092345  0.104326
2023-01-06 16:00:00   -0.155092  0.174554
2023-01-06 17:00:00   -0.050369  0.248415
2023-01-06 18:00:00   -0.124916  0.234920
2023-01-06 19:00:00   -0.256968  0.361244
2023-01-06 20:00:00   -0.061763  0.302893
2023-01-06 21:00:00   -0.108373  0.221298
2023-01-06 22:00:00   -0.085022  0.199700
2023-01-06 23:00:00    0.007008  0.161209
2023-01-07 00:00:00    0.049962  0.044167
2023-01-07 01:00:00          NA        NA
2023-01-07 02:00:00          NA        NA
2023-01-07 03:00:00          NA        NA
2023-01-07 04:00:00          NA        NA
2023-01-07 05:00:00          NA        NA
2023-01-07 06:00:00          NA        NA
2023-01-07 07:00:00          NA        NA
2023-01-07 08:00:00          NA        NA
2023-01-07 09:00:00          NA        NA
2023-01-07 10:00:00          NA        NA
2023-01-07 11:00:00          NA        NA
2023-01-07 12:00:00          NA        NA
2023-01-07 13:00:00          NA        NA
2023-01-07 14:00:00          NA        NA
2023-01-07 15:00:00          NA        NA
2023-01-07 16:00:00          NA        NA
2023-01-07 17:00:00          NA        NA
2023-01-07 18:00:00          NA        NA
2023-01-07 19:00:00          NA        NA
2023-01-07 20:00:00          NA        NA
2023-01-07 21:00:00          NA        NA
2023-01-07 22:00:00          NA        NA
2023-01-07 23:00:00          NA        NA

SEE ALSO

?rollmean | ?rollmax | ?rollmedian | ?rollsum

Aggregation in time/IV

Rolling aggregation

Rolling mean with a 24-hour window and calculate FUN at every 3th time point.

width = 24, by = 3

(rolling_mean1 <- rollapply(
  xts_data, width = 24, by = 3,
  align = 'center', FUN = mean))
                    Temperature  Humidity
2023-01-01 00:00:00          NA        NA
2023-01-01 01:00:00          NA        NA
2023-01-01 02:00:00          NA        NA
2023-01-01 03:00:00          NA        NA
2023-01-01 04:00:00          NA        NA
2023-01-01 05:00:00          NA        NA
2023-01-01 06:00:00          NA        NA
2023-01-01 07:00:00          NA        NA
2023-01-01 08:00:00          NA        NA
2023-01-01 09:00:00          NA        NA
2023-01-01 10:00:00          NA        NA
2023-01-01 11:00:00   -0.008676  0.050402
2023-01-01 12:00:00          NA        NA
2023-01-01 13:00:00          NA        NA
2023-01-01 14:00:00   -0.102092 -0.065516
2023-01-01 15:00:00          NA        NA
2023-01-01 16:00:00          NA        NA
2023-01-01 17:00:00   -0.170668 -0.099392
2023-01-01 18:00:00          NA        NA
2023-01-01 19:00:00          NA        NA
2023-01-01 20:00:00   -0.065771 -0.007655
2023-01-01 21:00:00          NA        NA
2023-01-01 22:00:00          NA        NA
2023-01-01 23:00:00   -0.013683  0.046588
2023-01-02 00:00:00          NA        NA
2023-01-02 01:00:00          NA        NA
2023-01-02 02:00:00   -0.004082 -0.017137
2023-01-02 03:00:00          NA        NA
2023-01-02 04:00:00          NA        NA
2023-01-02 05:00:00   -0.070800  0.078748
2023-01-02 06:00:00          NA        NA
2023-01-02 07:00:00          NA        NA
2023-01-02 08:00:00    0.052149  0.064027
2023-01-02 09:00:00          NA        NA
2023-01-02 10:00:00          NA        NA
2023-01-02 11:00:00    0.051325  0.096388
2023-01-02 12:00:00          NA        NA
2023-01-02 13:00:00          NA        NA
2023-01-02 14:00:00    0.152319  0.084349
2023-01-02 15:00:00          NA        NA
2023-01-02 16:00:00          NA        NA
2023-01-02 17:00:00    0.195158 -0.023457
2023-01-02 18:00:00          NA        NA
2023-01-02 19:00:00          NA        NA
2023-01-02 20:00:00    0.141634  0.020360
2023-01-02 21:00:00          NA        NA
2023-01-02 22:00:00          NA        NA
2023-01-02 23:00:00    0.080637 -0.102662
2023-01-03 00:00:00          NA        NA
2023-01-03 01:00:00          NA        NA
2023-01-03 02:00:00    0.053889  0.043319
2023-01-03 03:00:00          NA        NA
2023-01-03 04:00:00          NA        NA
2023-01-03 05:00:00    0.032900 -0.024038
2023-01-03 06:00:00          NA        NA
2023-01-03 07:00:00          NA        NA
2023-01-03 08:00:00    0.004232 -0.176944
2023-01-03 09:00:00          NA        NA
2023-01-03 10:00:00          NA        NA
2023-01-03 11:00:00    0.056004 -0.185501
2023-01-03 12:00:00          NA        NA
2023-01-03 13:00:00          NA        NA
2023-01-03 14:00:00    0.000113 -0.076929
2023-01-03 15:00:00          NA        NA
2023-01-03 16:00:00          NA        NA
2023-01-03 17:00:00   -0.073933 -0.000816
2023-01-03 18:00:00          NA        NA
2023-01-03 19:00:00          NA        NA
2023-01-03 20:00:00   -0.061174 -0.149956
2023-01-03 21:00:00          NA        NA
2023-01-03 22:00:00          NA        NA
2023-01-03 23:00:00   -0.072233 -0.112037
2023-01-04 00:00:00          NA        NA
2023-01-04 01:00:00          NA        NA
2023-01-04 02:00:00   -0.002898 -0.132007
2023-01-04 03:00:00          NA        NA
2023-01-04 04:00:00          NA        NA
2023-01-04 05:00:00    0.123973 -0.038786
2023-01-04 06:00:00          NA        NA
2023-01-04 07:00:00          NA        NA
2023-01-04 08:00:00    0.138854 -0.042675
2023-01-04 09:00:00          NA        NA
2023-01-04 10:00:00          NA        NA
2023-01-04 11:00:00    0.175629 -0.005352
2023-01-04 12:00:00          NA        NA
2023-01-04 13:00:00          NA        NA
2023-01-04 14:00:00    0.337117  0.140575
2023-01-04 15:00:00          NA        NA
2023-01-04 16:00:00          NA        NA
2023-01-04 17:00:00    0.295449  0.030488
2023-01-04 18:00:00          NA        NA
2023-01-04 19:00:00          NA        NA
2023-01-04 20:00:00    0.229031  0.222881
2023-01-04 21:00:00          NA        NA
2023-01-04 22:00:00          NA        NA
2023-01-04 23:00:00    0.097495  0.200374
2023-01-05 00:00:00          NA        NA
2023-01-05 01:00:00          NA        NA
2023-01-05 02:00:00    0.045632  0.172558
2023-01-05 03:00:00          NA        NA
2023-01-05 04:00:00          NA        NA
2023-01-05 05:00:00   -0.051182  0.067171
2023-01-05 06:00:00          NA        NA
2023-01-05 07:00:00          NA        NA
2023-01-05 08:00:00   -0.086782  0.167429
2023-01-05 09:00:00          NA        NA
2023-01-05 10:00:00          NA        NA
2023-01-05 11:00:00   -0.197074  0.214863
2023-01-05 12:00:00          NA        NA
2023-01-05 13:00:00          NA        NA
2023-01-05 14:00:00   -0.397267  0.106942
2023-01-05 15:00:00          NA        NA
2023-01-05 16:00:00          NA        NA
2023-01-05 17:00:00   -0.296611  0.260195
2023-01-05 18:00:00          NA        NA
2023-01-05 19:00:00          NA        NA
2023-01-05 20:00:00   -0.259221  0.320369
2023-01-05 21:00:00          NA        NA
2023-01-05 22:00:00          NA        NA
2023-01-05 23:00:00   -0.079112  0.358377
2023-01-06 00:00:00          NA        NA
2023-01-06 01:00:00          NA        NA
2023-01-06 02:00:00   -0.179026  0.229877
2023-01-06 03:00:00          NA        NA
2023-01-06 04:00:00          NA        NA
2023-01-06 05:00:00   -0.117521  0.219369
2023-01-06 06:00:00          NA        NA
2023-01-06 07:00:00          NA        NA
2023-01-06 08:00:00   -0.107489  0.185866
2023-01-06 09:00:00          NA        NA
2023-01-06 10:00:00          NA        NA
2023-01-06 11:00:00   -0.142259  0.273351
2023-01-06 12:00:00          NA        NA
2023-01-06 13:00:00          NA        NA
2023-01-06 14:00:00   -0.237001  0.115114
2023-01-06 15:00:00          NA        NA
2023-01-06 16:00:00          NA        NA
2023-01-06 17:00:00   -0.213452  0.019998
2023-01-06 18:00:00          NA        NA
2023-01-06 19:00:00          NA        NA
2023-01-06 20:00:00   -0.107723 -0.025951
2023-01-06 21:00:00          NA        NA
2023-01-06 22:00:00          NA        NA
2023-01-06 23:00:00   -0.242430 -0.000111
2023-01-07 00:00:00          NA        NA
2023-01-07 01:00:00          NA        NA
2023-01-07 02:00:00   -0.092345  0.104326
2023-01-07 03:00:00          NA        NA
2023-01-07 04:00:00          NA        NA
2023-01-07 05:00:00   -0.124916  0.234920
2023-01-07 06:00:00          NA        NA
2023-01-07 07:00:00          NA        NA
2023-01-07 08:00:00   -0.108373  0.221298
2023-01-07 09:00:00          NA        NA
2023-01-07 10:00:00          NA        NA
2023-01-07 11:00:00    0.049962  0.044167
2023-01-07 12:00:00          NA        NA
2023-01-07 13:00:00          NA        NA
2023-01-07 14:00:00          NA        NA
2023-01-07 15:00:00          NA        NA
2023-01-07 16:00:00          NA        NA
2023-01-07 17:00:00          NA        NA
2023-01-07 18:00:00          NA        NA
2023-01-07 19:00:00          NA        NA
2023-01-07 20:00:00          NA        NA
2023-01-07 21:00:00          NA        NA
2023-01-07 22:00:00          NA        NA
2023-01-07 23:00:00          NA        NA
mean(xts_data[1:24, 1]) # -0.00867576
mean(xts_data[4:27, 1]) # -0.1020925
mean(xts_data[7:30, 1]) # -0.1706679

width = 5, by = 3

(rolling_mean2 <- rollapply(
  xts_data, width = 5, by = 3,
  align = 'center', FUN = mean))
                    Temperature Humidity
2023-01-01 00:00:00          NA       NA
2023-01-01 01:00:00          NA       NA
2023-01-01 02:00:00      0.1936  0.14033
2023-01-01 03:00:00          NA       NA
2023-01-01 04:00:00          NA       NA
2023-01-01 05:00:00      0.2221  0.06447
2023-01-01 06:00:00          NA       NA
2023-01-01 07:00:00          NA       NA
2023-01-01 08:00:00     -0.1425 -0.21051
2023-01-01 09:00:00          NA       NA
2023-01-01 10:00:00          NA       NA
2023-01-01 11:00:00      0.3299  0.09770
2023-01-01 12:00:00          NA       NA
2023-01-01 13:00:00          NA       NA
2023-01-01 14:00:00      0.4481 -0.25032
2023-01-01 15:00:00          NA       NA
2023-01-01 16:00:00          NA       NA
2023-01-01 17:00:00      0.1093 -0.02086
2023-01-01 18:00:00          NA       NA
2023-01-01 19:00:00          NA       NA
2023-01-01 20:00:00     -0.4166  0.33277
2023-01-01 21:00:00          NA       NA
2023-01-01 22:00:00          NA       NA
2023-01-01 23:00:00     -0.8569 -0.28206
2023-01-02 00:00:00          NA       NA
2023-01-02 01:00:00          NA       NA
2023-01-02 02:00:00     -0.4917  0.09727
2023-01-02 03:00:00          NA       NA
2023-01-02 04:00:00          NA       NA
2023-01-02 05:00:00      0.0801 -0.09000
2023-01-02 06:00:00          NA       NA
2023-01-02 07:00:00          NA       NA
2023-01-02 08:00:00      0.5452  0.28989
2023-01-02 09:00:00          NA       NA
2023-01-02 10:00:00          NA       NA
2023-01-02 11:00:00      0.5761  0.13998
2023-01-02 12:00:00          NA       NA
2023-01-02 13:00:00          NA       NA
2023-01-02 14:00:00     -0.1778 -0.12458
2023-01-02 15:00:00          NA       NA
2023-01-02 16:00:00          NA       NA
2023-01-02 17:00:00     -0.0759  0.27304
2023-01-02 18:00:00          NA       NA
2023-01-02 19:00:00          NA       NA
2023-01-02 20:00:00      0.1171  0.01737
2023-01-02 21:00:00          NA       NA
2023-01-02 22:00:00          NA       NA
2023-01-02 23:00:00     -0.2592 -0.19942
2023-01-03 00:00:00          NA       NA
2023-01-03 01:00:00          NA       NA
2023-01-03 02:00:00      0.1757 -0.85184
2023-01-03 03:00:00          NA       NA
2023-01-03 04:00:00          NA       NA
2023-01-03 05:00:00      0.5176  0.11535
2023-01-03 06:00:00          NA       NA
2023-01-03 07:00:00          NA       NA
2023-01-03 08:00:00      0.0901  0.16179
2023-01-03 09:00:00          NA       NA
2023-01-03 10:00:00          NA       NA
2023-01-03 11:00:00      0.1603 -0.29860
2023-01-03 12:00:00          NA       NA
2023-01-03 13:00:00          NA       NA
2023-01-03 14:00:00     -0.5093  0.38971
2023-01-03 15:00:00          NA       NA
2023-01-03 16:00:00          NA       NA
2023-01-03 17:00:00     -0.2571 -0.50114
2023-01-03 18:00:00          NA       NA
2023-01-03 19:00:00          NA       NA
2023-01-03 20:00:00      0.5965 -0.26621
2023-01-03 21:00:00          NA       NA
2023-01-03 22:00:00          NA       NA
2023-01-03 23:00:00     -0.0907 -0.26581
2023-01-04 00:00:00          NA       NA
2023-01-04 01:00:00          NA       NA
2023-01-04 02:00:00      0.0699 -0.22222
2023-01-04 03:00:00          NA       NA
2023-01-04 04:00:00          NA       NA
2023-01-04 05:00:00     -0.0875 -0.17496
2023-01-04 06:00:00          NA       NA
2023-01-04 07:00:00          NA       NA
2023-01-04 08:00:00      0.0126 -0.40315
2023-01-04 09:00:00          NA       NA
2023-01-04 10:00:00          NA       NA
2023-01-04 11:00:00      0.1541 -0.14109
2023-01-04 12:00:00          NA       NA
2023-01-04 13:00:00          NA       NA
2023-01-04 14:00:00      0.2635  0.46746
2023-01-04 15:00:00          NA       NA
2023-01-04 16:00:00          NA       NA
2023-01-04 17:00:00      0.5600 -0.00471
2023-01-04 18:00:00          NA       NA
2023-01-04 19:00:00          NA       NA
2023-01-04 20:00:00      0.5027 -0.52208
2023-01-04 21:00:00          NA       NA
2023-01-04 22:00:00          NA       NA
2023-01-04 23:00:00      0.7705  0.93963
2023-01-05 00:00:00          NA       NA
2023-01-05 01:00:00          NA       NA
2023-01-05 02:00:00      0.3495  0.31244
2023-01-05 03:00:00          NA       NA
2023-01-05 04:00:00          NA       NA
2023-01-05 05:00:00     -0.4148 -0.04971
2023-01-05 06:00:00          NA       NA
2023-01-05 07:00:00          NA       NA
2023-01-05 08:00:00     -0.4752  0.38924
2023-01-05 09:00:00          NA       NA
2023-01-05 10:00:00          NA       NA
2023-01-05 11:00:00     -0.3918 -0.14857
2023-01-05 12:00:00          NA       NA
2023-01-05 13:00:00          NA       NA
2023-01-05 14:00:00     -0.2093 -0.19306
2023-01-05 15:00:00          NA       NA
2023-01-05 16:00:00          NA       NA
2023-01-05 17:00:00     -0.0490 -0.15719
2023-01-05 18:00:00          NA       NA
2023-01-05 19:00:00          NA       NA
2023-01-05 20:00:00     -0.1128  0.11872
2023-01-05 21:00:00          NA       NA
2023-01-05 22:00:00          NA       NA
2023-01-05 23:00:00     -0.6689  0.38898
2023-01-06 00:00:00          NA       NA
2023-01-06 01:00:00          NA       NA
2023-01-06 02:00:00      0.0535  0.66337
2023-01-06 03:00:00          NA       NA
2023-01-06 04:00:00          NA       NA
2023-01-06 05:00:00      0.2498  0.73823
2023-01-06 06:00:00          NA       NA
2023-01-06 07:00:00          NA       NA
2023-01-06 08:00:00      0.1449  0.52122
2023-01-06 09:00:00          NA       NA
2023-01-06 10:00:00          NA       NA
2023-01-06 11:00:00      0.2887 -0.30959
2023-01-06 12:00:00          NA       NA
2023-01-06 13:00:00          NA       NA
2023-01-06 14:00:00     -0.5528 -0.77923
2023-01-06 15:00:00          NA       NA
2023-01-06 16:00:00          NA       NA
2023-01-06 17:00:00      0.1752 -0.59214
2023-01-06 18:00:00          NA       NA
2023-01-06 19:00:00          NA       NA
2023-01-06 20:00:00     -0.1335  0.55464
2023-01-06 21:00:00          NA       NA
2023-01-06 22:00:00          NA       NA
2023-01-06 23:00:00     -1.0963  0.35346
2023-01-07 00:00:00          NA       NA
2023-01-07 01:00:00          NA       NA
2023-01-07 02:00:00     -0.1612 -0.41649
2023-01-07 03:00:00          NA       NA
2023-01-07 04:00:00          NA       NA
2023-01-07 05:00:00      0.6116  0.38701
2023-01-07 06:00:00          NA       NA
2023-01-07 07:00:00          NA       NA
2023-01-07 08:00:00      0.1522  0.54252
2023-01-07 09:00:00          NA       NA
2023-01-07 10:00:00          NA       NA
2023-01-07 11:00:00     -0.2435  0.14020
2023-01-07 12:00:00          NA       NA
2023-01-07 13:00:00          NA       NA
2023-01-07 14:00:00      0.3691  0.41370
2023-01-07 15:00:00          NA       NA
2023-01-07 16:00:00          NA       NA
2023-01-07 17:00:00      0.3220  0.36098
2023-01-07 18:00:00          NA       NA
2023-01-07 19:00:00          NA       NA
2023-01-07 20:00:00      0.4998  0.20083
2023-01-07 21:00:00          NA       NA
2023-01-07 22:00:00          NA       NA
2023-01-07 23:00:00          NA       NA
mean(xts_data[1:5, 1])  # 0.1935703
mean(xts_data[4:8, 1])  # 0.2221432
mean(xts_data[7:11, 1]) # -0.1425156

Averages for the same months/days/hours.

What if I want the averages for the same months/days/hours etc.. in a time-series?

  • sapply
  • aggragate
  • split-apply-combine

Generate a toy dataset to play with

set.seed(123)  # For reproducibility
# 5 years of monthly data
dates <- seq(as.Date("2020-01-01"),
             by = "month", length.out = 60)
data <- matrix(rnorm(length(dates)*2), ncol = 2)
colnames(data) <- c("Temperature", "Humidity")
(xts_data <- xts(data, order.by = dates))
           Temperature Humidity
2020-01-01     -0.5605  0.37964
2020-02-01     -0.2302 -0.50232
2020-03-01      1.5587 -0.33321
2020-04-01      0.0705 -1.01858
2020-05-01      0.1293 -1.07179
2020-06-01      1.7151  0.30353
2020-07-01      0.4609  0.44821
2020-08-01     -1.2651  0.05300
2020-09-01     -0.6869  0.92227
2020-10-01     -0.4457  2.05008
2020-11-01      1.2241 -0.49103
2020-12-01      0.3598 -2.30917
2021-01-01      0.4008  1.00574
2021-02-01      0.1107 -0.70920
2021-03-01     -0.5558 -0.68801
2021-04-01      1.7869  1.02557
2021-05-01      0.4979 -0.28477
2021-06-01     -1.9666 -1.22072
2021-07-01      0.7014  0.18130
2021-08-01     -0.4728 -0.13889
2021-09-01     -1.0678  0.00576
2021-10-01     -0.2180  0.38528
2021-11-01     -1.0260 -0.37066
2021-12-01     -0.7289  0.64438
2022-01-01     -0.6250 -0.22049
2022-02-01     -1.6867  0.33178
2022-03-01      0.8378  1.09684
2022-04-01      0.1534  0.43518
2022-05-01     -1.1381 -0.32593
2022-06-01      1.2538  1.14881
2022-07-01      0.4265  0.99350
2022-08-01     -0.2951  0.54840
2022-09-01      0.8951  0.23873
2022-10-01      0.8781 -0.62791
2022-11-01      0.8216  1.36065
2022-12-01      0.6886 -0.60026
2023-01-01      0.5539  2.18733
2023-02-01     -0.0619  1.53261
2023-03-01     -0.3060 -0.23570
2023-04-01     -0.3805 -1.02642
2023-05-01     -0.6947 -0.71041
2023-06-01     -0.2079  0.25688
2023-07-01     -1.2654 -0.24669
2023-08-01      2.1690 -0.34754
2023-09-01      1.2080 -0.95162
2023-10-01     -1.1231 -0.04503
2023-11-01     -0.4029 -0.78490
2023-12-01     -0.4667 -1.66794
2024-01-01      0.7800 -0.38023
2024-02-01     -0.0834  0.91900
2024-03-01      0.2533 -0.57535
2024-04-01     -0.0285  0.60796
2024-05-01     -0.0429 -1.61788
2024-06-01      1.3686 -0.05556
2024-07-01     -0.2258  0.51941
2024-08-01      1.5165  0.30115
2024-09-01     -1.5488  0.10568
2024-10-01      0.5846 -0.64071
2024-11-01      0.1239 -0.84970
2024-12-01      0.2159 -1.02413

Using sapply

(monthly_means <- sapply(1:12, function(m) {
  month_data <- xts_data[format(index(xts_data), "%m") == sprintf("%02d", m)]
  colMeans(month_data, na.rm = TRUE)
}))
             [,1]   [,2]   [,3]    [,4]   [,5]   [,6]
Temperature 0.110 -0.390  0.358 0.32036 -0.250 0.4326
Humidity    0.594  0.314 -0.147 0.00474 -0.802 0.0866
              [,7]   [,8]    [,9]   [,10]  [,11]   [,12]
Temperature 0.0195 0.3305 -0.2401 -0.0648  0.148  0.0138
Humidity    0.3791 0.0832  0.0642  0.2243 -0.227 -0.9914
# Transpose matrix
(monthly_means <- t(monthly_means))
      Temperature Humidity
 [1,]      0.1098  0.59440
 [2,]     -0.3903  0.31437
 [3,]      0.3576 -0.14708
 [4,]      0.3204  0.00474
 [5,]     -0.2497 -0.80216
 [6,]      0.4326  0.08659
 [7,]      0.0195  0.37915
 [8,]      0.3305  0.08322
 [9,]     -0.2401  0.06416
[10,]     -0.0648  0.22435
[11,]      0.1481 -0.22713
[12,]      0.0138 -0.99142
class(monthly_means)
[1] "matrix" "array" 
dates <- as.Date(paste0("2023-", 1:12, "-01"))
(monthly_means_xts <- xts(
  monthly_means, order.by = dates))
           Temperature Humidity
2023-01-01      0.1098  0.59440
2023-02-01     -0.3903  0.31437
2023-03-01      0.3576 -0.14708
2023-04-01      0.3204  0.00474
2023-05-01     -0.2497 -0.80216
2023-06-01      0.4326  0.08659
2023-07-01      0.0195  0.37915
2023-08-01      0.3305  0.08322
2023-09-01     -0.2401  0.06416
2023-10-01     -0.0648  0.22435
2023-11-01      0.1481 -0.22713
2023-12-01      0.0138 -0.99142
  • are dates correct in monthly_means_xts?
  • do we really need to convert to xts?

Using aggregate

Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

library(lubridate)
# Define the aggregation function
monthly_aggregate <- function(x) {
  mean(x, na.rm = TRUE)
}

# Aggregate to monthly data
(monthly_mean <- aggregate(
  xts_data, lubridate::month(xts_data),
  monthly_aggregate))
   Temperature Humidity
1       0.1098  0.59440
2      -0.3903  0.31437
3       0.3576 -0.14708
4       0.3204  0.00474
5      -0.2497 -0.80216
6       0.4326  0.08659
7       0.0195  0.37915
8       0.3305  0.08322
9      -0.2401  0.06416
10     -0.0648  0.22435
11      0.1481 -0.22713
12      0.0138 -0.99142
# We use anonymous function to process aggregation.
# Aggregate to monthly data
monthly_quantiles <- aggregate(
  xts_data, lubridate::month(xts_data),
  quantile, probs = c(0.25, 0.5, 0.75), na.rm = TRUE)
class(monthly_quantiles)
[1] "zoo"
print(monthly_quantiles)
   Temperature.25% Temperature.50% Temperature.75% Humidity.25% Humidity.50% Humidity.75%
1          -0.5605          0.4008          0.5539     -0.22049        0.380        1.006
2          -0.2302         -0.0834         -0.0619     -0.50232        0.332        0.919
3          -0.3060          0.2533          0.8378     -0.57535       -0.333       -0.236
4          -0.0285          0.0705          0.1534     -1.01858        0.435        0.608
5          -0.6947         -0.0429          0.1293     -1.07179       -0.710       -0.326
6          -0.2079          1.2538          1.3686     -0.05556        0.257        0.304
7          -0.2258          0.4265          0.4609      0.18130        0.448        0.519
8          -0.4728         -0.2951          1.5165     -0.13889        0.053        0.301
9          -1.0678         -0.6869          0.8951      0.00576        0.106        0.239
10         -0.4457         -0.2180          0.5846     -0.62791       -0.045        0.385
11         -0.4029          0.1239          0.8216     -0.78490       -0.491       -0.371
12         -0.4667          0.2159          0.3598     -1.66794       -1.024       -0.600

Using Split-Apply-Combine/I

  • Split: Segmenting the time series data into groups based on a certain criterion, such as months, years, or quarters. This is often done using functions like split.

  • Apply: Performing a specific operation or function on each group independently. This could be a statistical summary (like mean, median, sum), a transformation, or any custom function.

  • Combine: Merging the results from each group back into a single data structure. This can be an aggregated time series or a different format depending on the desired output.

Using Split-Apply-Combine/II

Split

(split_by_month <- split(
  xts_data,
  format(index(xts_data), "%m")))
$`01`
           Temperature Humidity
2020-01-01      -0.560     0.38
2021-01-01       0.401     1.01
2022-01-01      -0.625    -0.22
2023-01-01       0.554     2.19
2024-01-01       0.780    -0.38

$`02`
           Temperature Humidity
2020-02-01     -0.2302   -0.502
2021-02-01      0.1107   -0.709
2022-02-01     -1.6867    0.332
2023-02-01     -0.0619    1.533
2024-02-01     -0.0834    0.919

$`03`
           Temperature Humidity
2020-03-01       1.559   -0.333
2021-03-01      -0.556   -0.688
2022-03-01       0.838    1.097
2023-03-01      -0.306   -0.236
2024-03-01       0.253   -0.575

$`04`
           Temperature Humidity
2020-04-01      0.0705   -1.019
2021-04-01      1.7869    1.026
2022-04-01      0.1534    0.435
2023-04-01     -0.3805   -1.026
2024-04-01     -0.0285    0.608

$`05`
           Temperature Humidity
2020-05-01      0.1293   -1.072
2021-05-01      0.4979   -0.285
2022-05-01     -1.1381   -0.326
2023-05-01     -0.6947   -0.710
2024-05-01     -0.0429   -1.618

$`06`
           Temperature Humidity
2020-06-01       1.715   0.3035
2021-06-01      -1.967  -1.2207
2022-06-01       1.254   1.1488
2023-06-01      -0.208   0.2569
2024-06-01       1.369  -0.0556

$`07`
           Temperature Humidity
2020-07-01       0.461    0.448
2021-07-01       0.701    0.181
2022-07-01       0.426    0.994
2023-07-01      -1.265   -0.247
2024-07-01      -0.226    0.519

$`08`
           Temperature Humidity
2020-08-01      -1.265    0.053
2021-08-01      -0.473   -0.139
2022-08-01      -0.295    0.548
2023-08-01       2.169   -0.348
2024-08-01       1.516    0.301

$`09`
           Temperature Humidity
2020-09-01      -0.687  0.92227
2021-09-01      -1.068  0.00576
2022-09-01       0.895  0.23873
2023-09-01       1.208 -0.95162
2024-09-01      -1.549  0.10568

$`10`
           Temperature Humidity
2020-10-01      -0.446    2.050
2021-10-01      -0.218    0.385
2022-10-01       0.878   -0.628
2023-10-01      -1.123   -0.045
2024-10-01       0.585   -0.641

$`11`
           Temperature Humidity
2020-11-01       1.224   -0.491
2021-11-01      -1.026   -0.371
2022-11-01       0.822    1.361
2023-11-01      -0.403   -0.785
2024-11-01       0.124   -0.850

$`12`
           Temperature Humidity
2020-12-01       0.360   -2.309
2021-12-01      -0.729    0.644
2022-12-01       0.689   -0.600
2023-12-01      -0.467   -1.668
2024-12-01       0.216   -1.024

Apply

(monthly_stats <- lapply(
  split_by_month,
  function(x) {
    apply(x, 2, mean, na.rm = TRUE)
  }))
$`01`
Temperature    Humidity 
      0.110       0.594 

$`02`
Temperature    Humidity 
     -0.390       0.314 

$`03`
Temperature    Humidity 
      0.358      -0.147 

$`04`
Temperature    Humidity 
    0.32036     0.00474 

$`05`
Temperature    Humidity 
     -0.250      -0.802 

$`06`
Temperature    Humidity 
     0.4326      0.0866 

$`07`
Temperature    Humidity 
     0.0195      0.3791 

$`08`
Temperature    Humidity 
     0.3305      0.0832 

$`09`
Temperature    Humidity 
    -0.2401      0.0642 

$`10`
Temperature    Humidity 
    -0.0648      0.2243 

$`11`
Temperature    Humidity 
      0.148      -0.227 

$`12`
Temperature    Humidity 
     0.0138     -0.9914 

Combine

(combined_results <- do.call(
  rbind, monthly_stats))
   Temperature Humidity
01      0.1098  0.59440
02     -0.3903  0.31437
03      0.3576 -0.14708
04      0.3204  0.00474
05     -0.2497 -0.80216
06      0.4326  0.08659
07      0.0195  0.37915
08      0.3305  0.08322
09     -0.2401  0.06416
10     -0.0648  0.22435
11      0.1481 -0.22713
12      0.0138 -0.99142

Using Split-Apply-Combine/III

Split

library(lubridate)
(split_by_month <- split(
  xts_data, lubridate::month(xts_data)))
$`1`
           Temperature Humidity
2020-01-01      -0.560     0.38
2021-01-01       0.401     1.01
2022-01-01      -0.625    -0.22
2023-01-01       0.554     2.19
2024-01-01       0.780    -0.38

$`2`
           Temperature Humidity
2020-02-01     -0.2302   -0.502
2021-02-01      0.1107   -0.709
2022-02-01     -1.6867    0.332
2023-02-01     -0.0619    1.533
2024-02-01     -0.0834    0.919

$`3`
           Temperature Humidity
2020-03-01       1.559   -0.333
2021-03-01      -0.556   -0.688
2022-03-01       0.838    1.097
2023-03-01      -0.306   -0.236
2024-03-01       0.253   -0.575

$`4`
           Temperature Humidity
2020-04-01      0.0705   -1.019
2021-04-01      1.7869    1.026
2022-04-01      0.1534    0.435
2023-04-01     -0.3805   -1.026
2024-04-01     -0.0285    0.608

$`5`
           Temperature Humidity
2020-05-01      0.1293   -1.072
2021-05-01      0.4979   -0.285
2022-05-01     -1.1381   -0.326
2023-05-01     -0.6947   -0.710
2024-05-01     -0.0429   -1.618

$`6`
           Temperature Humidity
2020-06-01       1.715   0.3035
2021-06-01      -1.967  -1.2207
2022-06-01       1.254   1.1488
2023-06-01      -0.208   0.2569
2024-06-01       1.369  -0.0556

$`7`
           Temperature Humidity
2020-07-01       0.461    0.448
2021-07-01       0.701    0.181
2022-07-01       0.426    0.994
2023-07-01      -1.265   -0.247
2024-07-01      -0.226    0.519

$`8`
           Temperature Humidity
2020-08-01      -1.265    0.053
2021-08-01      -0.473   -0.139
2022-08-01      -0.295    0.548
2023-08-01       2.169   -0.348
2024-08-01       1.516    0.301

$`9`
           Temperature Humidity
2020-09-01      -0.687  0.92227
2021-09-01      -1.068  0.00576
2022-09-01       0.895  0.23873
2023-09-01       1.208 -0.95162
2024-09-01      -1.549  0.10568

$`10`
           Temperature Humidity
2020-10-01      -0.446    2.050
2021-10-01      -0.218    0.385
2022-10-01       0.878   -0.628
2023-10-01      -1.123   -0.045
2024-10-01       0.585   -0.641

$`11`
           Temperature Humidity
2020-11-01       1.224   -0.491
2021-11-01      -1.026   -0.371
2022-11-01       0.822    1.361
2023-11-01      -0.403   -0.785
2024-11-01       0.124   -0.850

$`12`
           Temperature Humidity
2020-12-01       0.360   -2.309
2021-12-01      -0.729    0.644
2022-12-01       0.689   -0.600
2023-12-01      -0.467   -1.668
2024-12-01       0.216   -1.024

Apply

(monthly_stats <- lapply(
  split_by_month,
  apply, 2, mean, na.rm = TRUE))
$`1`
Temperature    Humidity 
      0.110       0.594 

$`2`
Temperature    Humidity 
     -0.390       0.314 

$`3`
Temperature    Humidity 
      0.358      -0.147 

$`4`
Temperature    Humidity 
    0.32036     0.00474 

$`5`
Temperature    Humidity 
     -0.250      -0.802 

$`6`
Temperature    Humidity 
     0.4326      0.0866 

$`7`
Temperature    Humidity 
     0.0195      0.3791 

$`8`
Temperature    Humidity 
     0.3305      0.0832 

$`9`
Temperature    Humidity 
    -0.2401      0.0642 

$`10`
Temperature    Humidity 
    -0.0648      0.2243 

$`11`
Temperature    Humidity 
      0.148      -0.227 

$`12`
Temperature    Humidity 
     0.0138     -0.9914 

Combine

(combined_results <- do.call(
  rbind, monthly_stats))
   Temperature Humidity
1       0.1098  0.59440
2      -0.3903  0.31437
3       0.3576 -0.14708
4       0.3204  0.00474
5      -0.2497 -0.80216
6       0.4326  0.08659
7       0.0195  0.37915
8       0.3305  0.08322
9      -0.2401  0.06416
10     -0.0648  0.22435
11      0.1481 -0.22713
12      0.0138 -0.99142

Using Split-Apply-Combine/IV

Split

library(lubridate)
(split_by_month <- split(
  xts_data, lubridate::month(xts_data)))

Apply

(monthly_stats <- lapply(
  split_by_month,
    function(x) {
    apply(x, 2, quantile,
          probs = c(0.25, 0.5, 0.75),
          na.rm = TRUE)
  }))
$`1`
    Temperature Humidity
25%      -0.560    -0.22
50%       0.401     0.38
75%       0.554     1.01

$`2`
    Temperature Humidity
25%     -0.2302   -0.502
50%     -0.0834    0.332
75%     -0.0619    0.919

$`3`
    Temperature Humidity
25%      -0.306   -0.575
50%       0.253   -0.333
75%       0.838   -0.236

$`4`
    Temperature Humidity
25%     -0.0285   -1.019
50%      0.0705    0.435
75%      0.1534    0.608

$`5`
    Temperature Humidity
25%     -0.6947   -1.072
50%     -0.0429   -0.710
75%      0.1293   -0.326

$`6`
    Temperature Humidity
25%      -0.208  -0.0556
50%       1.254   0.2569
75%       1.369   0.3035

$`7`
    Temperature Humidity
25%      -0.226    0.181
50%       0.426    0.448
75%       0.461    0.519

$`8`
    Temperature Humidity
25%      -0.473   -0.139
50%      -0.295    0.053
75%       1.516    0.301

$`9`
    Temperature Humidity
25%      -1.068  0.00576
50%      -0.687  0.10568
75%       0.895  0.23873

$`10`
    Temperature Humidity
25%      -0.446   -0.628
50%      -0.218   -0.045
75%       0.585    0.385

$`11`
    Temperature Humidity
25%      -0.403   -0.785
50%       0.124   -0.491
75%       0.822   -0.371

$`12`
    Temperature Humidity
25%      -0.467    -1.67
50%       0.216    -1.02
75%       0.360    -0.60

Combine

combined_results <- do.call(
  rbind, monthly_stats)
class(combined_results)
[1] "matrix" "array" 
print(combined_results)
    Temperature Humidity
25%     -0.5605 -0.22049
50%      0.4008  0.37964
75%      0.5539  1.00574
25%     -0.2302 -0.50232
50%     -0.0834  0.33178
75%     -0.0619  0.91900
25%     -0.3060 -0.57535
50%      0.2533 -0.33321
75%      0.8378 -0.23570
25%     -0.0285 -1.01858
50%      0.0705  0.43518
75%      0.1534  0.60796
25%     -0.6947 -1.07179
50%     -0.0429 -0.71041
75%      0.1293 -0.32593
25%     -0.2079 -0.05556
50%      1.2538  0.25688
75%      1.3686  0.30353
25%     -0.2258  0.18130
50%      0.4265  0.44821
75%      0.4609  0.51941
25%     -0.4728 -0.13889
50%     -0.2951  0.05300
75%      1.5165  0.30115
25%     -1.0678  0.00576
50%     -0.6869  0.10568
75%      0.8951  0.23873
25%     -0.4457 -0.62791
50%     -0.2180 -0.04503
75%      0.5846  0.38528
25%     -0.4029 -0.78490
50%      0.1239 -0.49103
75%      0.8216 -0.37066
25%     -0.4667 -1.66794
50%      0.2159 -1.02413
75%      0.3598 -0.60026

Organize

combined_results <- as.data.frame(combined_results)
rownames(combined_results) <- NULL
combined_results <- cbind(
  percentile = rep(c(25, 50, 75), nrow(combined_results) / 3),
  month = factor(rep(month.abb, each = 3)),
  combined_results)
print(combined_results)
   percentile month Temperature Humidity
1          25   Jan     -0.5605 -0.22049
2          50   Jan      0.4008  0.37964
3          75   Jan      0.5539  1.00574
4          25   Feb     -0.2302 -0.50232
5          50   Feb     -0.0834  0.33178
6          75   Feb     -0.0619  0.91900
7          25   Mar     -0.3060 -0.57535
8          50   Mar      0.2533 -0.33321
9          75   Mar      0.8378 -0.23570
10         25   Apr     -0.0285 -1.01858
11         50   Apr      0.0705  0.43518
12         75   Apr      0.1534  0.60796
13         25   May     -0.6947 -1.07179
14         50   May     -0.0429 -0.71041
15         75   May      0.1293 -0.32593
16         25   Jun     -0.2079 -0.05556
17         50   Jun      1.2538  0.25688
18         75   Jun      1.3686  0.30353
19         25   Jul     -0.2258  0.18130
20         50   Jul      0.4265  0.44821
21         75   Jul      0.4609  0.51941
22         25   Aug     -0.4728 -0.13889
23         50   Aug     -0.2951  0.05300
24         75   Aug      1.5165  0.30115
25         25   Sep     -1.0678  0.00576
26         50   Sep     -0.6869  0.10568
27         75   Sep      0.8951  0.23873
28         25   Oct     -0.4457 -0.62791
29         50   Oct     -0.2180 -0.04503
30         75   Oct      0.5846  0.38528
31         25   Nov     -0.4029 -0.78490
32         50   Nov      0.1239 -0.49103
33         75   Nov      0.8216 -0.37066
34         25   Dec     -0.4667 -1.66794
35         50   Dec      0.2159 -1.02413
36         75   Dec      0.3598 -0.60026

Reshaping data

  • Understanding Long and Wide Data Formats
  • Using reshape function
  • Using tidyr package

Understanding Long and Wide Data Formats

Long format

# Example long format data
data_long <- data.frame(
  City = c("Ankara", "Ankara", "Ankara", "Ankara",
           "Istanbul", "Istanbul", "Istanbul",
           "Istanbul"),
  Year = c(2020, 2021, 2022, 2023,
           2020, 2021, 2022, 2023),
  Pollution = c(40, 35, 47, 12, 50, 45, 61, 25)
)
print(data_long)
      City Year Pollution
1   Ankara 2020        40
2   Ankara 2021        35
3   Ankara 2022        47
4   Ankara 2023        12
5 Istanbul 2020        50
6 Istanbul 2021        45
7 Istanbul 2022        61
8 Istanbul 2023        25
str(data_long)
'data.frame':   8 obs. of  3 variables:
 $ City     : chr  "Ankara" "Ankara" "Ankara" "Ankara" ...
 $ Year     : num  2020 2021 2022 2023 2020 ...
 $ Pollution: num  40 35 47 12 50 45 61 25

Wide format

# Example wide format data
data_wide <- data.frame(
  City = c("Ankara", "Istanbul"),
  Y2020 = c(40, 50),
  Y2021 = c(35, 45),
  Y2022 = c(47, 61),
  Y2023 = c(12, 25)
)
print(data_wide)
      City Y2020 Y2021 Y2022 Y2023
1   Ankara    40    35    47    12
2 Istanbul    50    45    61    25
str(data_wide)
'data.frame':   2 obs. of  5 variables:
 $ City : chr  "Ankara" "Istanbul"
 $ Y2020: num  40 50
 $ Y2021: num  35 45
 $ Y2022: num  47 61
 $ Y2023: num  12 25

Using reshape Function/I

Long format

# from long to wide
long_to_wide <- reshape(data_long,
                        timevar = "Year",
                        idvar = "City",
                        direction = "wide")
# new_colnames <- sapply(
#   colnames(long_to_wide)[-1],
#   function(x) strsplit(x, ".", fixed = TRUE)[[1]][[2]])
# colnames(long_to_wide)[-1] <- new_colnames
print(long_to_wide)
      City Pollution.2020 Pollution.2021 Pollution.2022
1   Ankara             40             35             47
5 Istanbul             50             45             61
  Pollution.2023
1             12
5             25
str(long_to_wide)
'data.frame':   2 obs. of  5 variables:
 $ City          : chr  "Ankara" "Istanbul"
 $ Pollution.2020: num  40 50
 $ Pollution.2021: num  35 45
 $ Pollution.2022: num  47 61
 $ Pollution.2023: num  12 25
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: NULL
  ..$ timevar: chr "Year"
  ..$ idvar  : chr "City"
  ..$ times  : num [1:4] 2020 2021 2022 2023
  ..$ varying: chr [1, 1:4] "Pollution.2020" "Pollution.2021" "Pollution.2022" "Pollution.2023"

Wide format

# from wide to long
wide_to_long <- reshape(
  data_wide,
  direction = "long",
  varying = list(colnames(data_wide)[-1]),
  v.names = "Value",
  idvar = "City",
  timevar = "Year",
  times = 2020:2023,
  new.row.names = 1:1000)
print(wide_to_long)
      City Year Value
1   Ankara 2020    40
2 Istanbul 2020    50
3   Ankara 2021    35
4 Istanbul 2021    45
5   Ankara 2022    47
6 Istanbul 2022    61
7   Ankara 2023    12
8 Istanbul 2023    25
str(wide_to_long)
'data.frame':   8 obs. of  3 variables:
 $ City : chr  "Ankara" "Istanbul" "Ankara" "Istanbul" ...
 $ Year : int  2020 2020 2021 2021 2022 2022 2023 2023
 $ Value: num  40 50 35 45 47 61 12 25
 - attr(*, "reshapeLong")=List of 4
  ..$ varying:List of 1
  .. ..$ : chr [1:4] "Y2020" "Y2021" "Y2022" "Y2023"
  ..$ v.names: chr "Value"
  ..$ idvar  : chr "City"
  ..$ timevar: chr "Year"

Using reshape Function/II

Long format

# Revert back to long again
long_again = reshape(long_to_wide)
print(long_again)
                  City Year Pollution.2020
Ankara.2020     Ankara 2020             40
Istanbul.2020 Istanbul 2020             50
Ankara.2021     Ankara 2021             35
Istanbul.2021 Istanbul 2021             45
Ankara.2022     Ankara 2022             47
Istanbul.2022 Istanbul 2022             61
Ankara.2023     Ankara 2023             12
Istanbul.2023 Istanbul 2023             25
str(long_again)
'data.frame':   8 obs. of  3 variables:
 $ City          : chr  "Ankara" "Istanbul" "Ankara" "Istanbul" ...
 $ Year          : num  2020 2020 2021 2021 2022 ...
 $ Pollution.2020: num  40 50 35 45 47 61 12 25
 - attr(*, "reshapeLong")=List of 4
  ..$ varying:List of 1
  .. ..$ 1: chr [1:4] "Pollution.2020" "Pollution.2021" "Pollution.2022" "Pollution.2023"
  ..$ v.names: NULL
  ..$ idvar  : chr "City"
  ..$ timevar: chr "Year"

Wide format

# Revert back to wide again
wide_again = reshape(wide_to_long)
print(wide_again)
      City Y2020 Y2021 Y2022 Y2023
1   Ankara    40    35    47    12
2 Istanbul    50    45    61    25
str(wide_again)
'data.frame':   2 obs. of  5 variables:
 $ City : chr  "Ankara" "Istanbul"
 $ Y2020: num  40 50
 $ Y2021: num  35 45
 $ Y2022: num  47 61
 $ Y2023: num  12 25
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: chr "Value"
  ..$ timevar: chr "Year"
  ..$ idvar  : chr "City"
  ..$ times  : int [1:4] 2020 2021 2022 2023
  ..$ varying: chr [1, 1:4] "Y2020" "Y2021" "Y2022" "Y2023"

Using tidyr pivotting functions

Long format

long_to_wide <- tidyr::pivot_wider(
  data_long, names_from = Year, values_from = Pollution)
print(long_to_wide)
# A tibble: 2 × 5
  City     `2020` `2021` `2022` `2023`
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>
1 Ankara       40     35     47     12
2 Istanbul     50     45     61     25
str(long_to_wide)
tibble [2 × 5] (S3: tbl_df/tbl/data.frame)
 $ City: chr [1:2] "Ankara" "Istanbul"
 $ 2020: num [1:2] 40 50
 $ 2021: num [1:2] 35 45
 $ 2022: num [1:2] 47 61
 $ 2023: num [1:2] 12 25

Wide format

wide_to_long <- tidyr::pivot_longer(
  data_wide, cols = colnames(data_wide)[-1],
  names_to = "Year", values_to = "Pollution")
print(wide_to_long)
# A tibble: 8 × 3
  City     Year  Pollution
  <chr>    <chr>     <dbl>
1 Ankara   Y2020        40
2 Ankara   Y2021        35
3 Ankara   Y2022        47
4 Ankara   Y2023        12
5 Istanbul Y2020        50
6 Istanbul Y2021        45
7 Istanbul Y2022        61
8 Istanbul Y2023        25
str(wide_to_long)
tibble [8 × 3] (S3: tbl_df/tbl/data.frame)
 $ City     : chr [1:8] "Ankara" "Ankara" "Ankara" "Ankara" ...
 $ Year     : chr [1:8] "Y2020" "Y2021" "Y2022" "Y2023" ...
 $ Pollution: num [1:8] 40 35 47 12 50 45 61 25

Other ways to reshape data

  • Using melt() and cast() functions from reshape2 package
  • Using gather() and spread() functions from tidyr package
  • Using melt() and dcast() functions from data.table package

Data Transformation with dplyr package:

  1. dplyr Package: A part of the tidyverse, known for its easy-to-use syntax and speed.
  2. Selecting Columns: select() for choosing columns.
  3. Filtering Rows: filter() to select rows based on conditions.
  4. Arranging Data: arrange() to sort data.
  5. Mutating: mutate() and transmute() for adding new columns or transforming existing ones.
  6. Summarizing: summarise() to calculate summary statistics.
  7. Group Operations: group_by() for group-wise operations.

Further resources