Showing posts with label tidyr. Show all posts
Showing posts with label tidyr. Show all posts

Thursday, February 15, 2018

What is the difference between WIDE and LONG format in R Programming?

First of all look at the same data in WIDE and LONG formats:

This is data in WIDE format:

Day  Segment  Jack  Jane  Tom
1      AM          2.5    3.5     1.7
2      AM          3.5    4.2     3.9
3      PM           2.5    3.9     2.1
4      PM           3.0    4.0     2.5

The same data in LONG format:
   Day   Segment   condition   measurement
1    1      AM        Jack           2.30
2    2      AM        Jack           3.50
3    3      PM        Jack           2.50
4    4      PM         Jack          3.00
5    1      AM        Jane          3.51
6    2      AM       Jane           4.20
7    3      PM       Jane            3.90
8    4      PM       Jane            4.00
9    1      AM      Tom            1.70
10   2      AM       Tom          3.90
11   3      PM       Tom           2.10
12   4      PM       Tom           2.50

You can convert from WIDE to LONG format as shown here:

Read the WIDE data using read.table() as shown here:
-----------------------
> data_wide <-read .table="" br="" header="TRUE," text="<br>+ Day Segment  Jack  Jane  Tom <br>+ 1   AM       2.3   3.51  1.7<br>+ 2   AM       3.5   4.2   3.9<br>+ 3   PM       2.5   3.9   2.1<br>+ 4   PM       3.0   4.0   2.5 <br>+ ">> data_wide
  Day Segment Jack Jane Tom
1   1      AM  2.3 3.51 1.7
2   2      AM  3.5 4.20 3.9
3   3      PM  2.5 3.90 2.1
4   4      PM  3.0 4.00 2.5
---------------------------------

R_wideData.png

Now you need to use a function gather() to do this conversion. gather() is in the
package deplyr. I installed the package tidyr that installs deplyr from CRAN mirror here:
https://cran.cnr.berkeley.edu/bin/windows
-----------
Load the library
> library(dplyr)
--------
Now do the conversion:
--------
>  data_long <- br="" condition="" data_wide="" gather="" jack="" jane="" measurement="" tom="">> data_long
   Day Segment condition measurement
1    1      AM      Jack        2.30
2    2      AM      Jack        3.50
3    3      PM      Jack        2.50
4    4      PM      Jack        3.00
5    1      AM      Jane        3.51
6    2      AM      Jane        4.20
7    3      PM      Jane        3.90
8    4      PM      Jane        4.00
9    1      AM       Tom        1.70
10   2      AM       Tom        3.90
11   3      PM       Tom        2.10
12   4      PM       Tom        2.50
>

R_longData.png

Friday, December 22, 2017

What is tidyverse package?

Tidyverse.org defines itself as an 'opinionated collection of R Packages'. This is a modest statment as tidyverse can do a lot of things.

Tidyverse is specialized group of packages for data science.  If you are familair with ggplot2, the most popular data visualizing packgage in R universe, then it is included in tidyverse in addition to more useful packages.

The 'Core' package consists of many useful packages but you can also bring in others in the package to work with it to enhance its usability above and beyond what it can do by itself.

Here are what you find in the core of tidyverse.


These are loaded when you load tidyverse. There are packages in tidyverse that you have to load separately using the library() such as:
readxl .xls and .xlsx
haven for SPSS, Stata and SAS data
jsonlite
xml2
httr
rvest
DBI
and many others.