Showing posts with label LONG format data. Show all posts
Showing posts with label LONG format data. Show all posts

Thursday, February 15, 2018

What is the difference between WIDE and LONG format in R Programming?

First of all look at the same data in WIDE and LONG formats:

This is data in WIDE format:

Day  Segment  Jack  Jane  Tom
1      AM          2.5    3.5     1.7
2      AM          3.5    4.2     3.9
3      PM           2.5    3.9     2.1
4      PM           3.0    4.0     2.5

The same data in LONG format:
   Day   Segment   condition   measurement
1    1      AM        Jack           2.30
2    2      AM        Jack           3.50
3    3      PM        Jack           2.50
4    4      PM         Jack          3.00
5    1      AM        Jane          3.51
6    2      AM       Jane           4.20
7    3      PM       Jane            3.90
8    4      PM       Jane            4.00
9    1      AM      Tom            1.70
10   2      AM       Tom          3.90
11   3      PM       Tom           2.10
12   4      PM       Tom           2.50

You can convert from WIDE to LONG format as shown here:

Read the WIDE data using read.table() as shown here:
-----------------------
> data_wide <-read .table="" br="" header="TRUE," text="<br>+ Day Segment  Jack  Jane  Tom <br>+ 1   AM       2.3   3.51  1.7<br>+ 2   AM       3.5   4.2   3.9<br>+ 3   PM       2.5   3.9   2.1<br>+ 4   PM       3.0   4.0   2.5 <br>+ ">> data_wide
  Day Segment Jack Jane Tom
1   1      AM  2.3 3.51 1.7
2   2      AM  3.5 4.20 3.9
3   3      PM  2.5 3.90 2.1
4   4      PM  3.0 4.00 2.5
---------------------------------

R_wideData.png

Now you need to use a function gather() to do this conversion. gather() is in the
package deplyr. I installed the package tidyr that installs deplyr from CRAN mirror here:
https://cran.cnr.berkeley.edu/bin/windows
-----------
Load the library
> library(dplyr)
--------
Now do the conversion:
--------
>  data_long <- br="" condition="" data_wide="" gather="" jack="" jane="" measurement="" tom="">> data_long
   Day Segment condition measurement
1    1      AM      Jack        2.30
2    2      AM      Jack        3.50
3    3      PM      Jack        2.50
4    4      PM      Jack        3.00
5    1      AM      Jane        3.51
6    2      AM      Jane        4.20
7    3      PM      Jane        3.90
8    4      PM      Jane        4.00
9    1      AM       Tom        1.70
10   2      AM       Tom        3.90
11   3      PM       Tom        2.10
12   4      PM       Tom        2.50
>

R_longData.png