Thursday, February 15, 2018

What is the difference between WIDE and LONG format in R Programming?

First of all look at the same data in WIDE and LONG formats:

This is data in WIDE format:

Day  Segment  Jack  Jane  Tom
1      AM          2.5    3.5     1.7
2      AM          3.5    4.2     3.9
3      PM           2.5    3.9     2.1
4      PM           3.0    4.0     2.5

The same data in LONG format:
   Day   Segment   condition   measurement
1    1      AM        Jack           2.30
2    2      AM        Jack           3.50
3    3      PM        Jack           2.50
4    4      PM         Jack          3.00
5    1      AM        Jane          3.51
6    2      AM       Jane           4.20
7    3      PM       Jane            3.90
8    4      PM       Jane            4.00
9    1      AM      Tom            1.70
10   2      AM       Tom          3.90
11   3      PM       Tom           2.10
12   4      PM       Tom           2.50

You can convert from WIDE to LONG format as shown here:

Read the WIDE data using read.table() as shown here:
-----------------------
> data_wide <-read .table="" br="" header="TRUE," text="<br>+ Day Segment  Jack  Jane  Tom <br>+ 1   AM       2.3   3.51  1.7<br>+ 2   AM       3.5   4.2   3.9<br>+ 3   PM       2.5   3.9   2.1<br>+ 4   PM       3.0   4.0   2.5 <br>+ ">> data_wide
  Day Segment Jack Jane Tom
1   1      AM  2.3 3.51 1.7
2   2      AM  3.5 4.20 3.9
3   3      PM  2.5 3.90 2.1
4   4      PM  3.0 4.00 2.5
---------------------------------

R_wideData.png

Now you need to use a function gather() to do this conversion. gather() is in the
package deplyr. I installed the package tidyr that installs deplyr from CRAN mirror here:
https://cran.cnr.berkeley.edu/bin/windows
-----------
Load the library
> library(dplyr)
--------
Now do the conversion:
--------
>  data_long <- br="" condition="" data_wide="" gather="" jack="" jane="" measurement="" tom="">> data_long
   Day Segment condition measurement
1    1      AM      Jack        2.30
2    2      AM      Jack        3.50
3    3      PM      Jack        2.50
4    4      PM      Jack        3.00
5    1      AM      Jane        3.51
6    2      AM      Jane        4.20
7    3      PM      Jane        3.90
8    4      PM      Jane        4.00
9    1      AM       Tom        1.70
10   2      AM       Tom        3.90
11   3      PM       Tom        2.10
12   4      PM       Tom        2.50
>

R_longData.png

No comments: