Let us first of all get some data. Let us get the IndiaCensus.xlsx file created in an earlier post here. Let us save the data in that post to a IndiaCensus.csv file.
Use the following code to get the data into R:
Importing a CSV file:
> dataset <-read .csv="" esktop="" header="TRUE)</font" log2017="" ndiacensus.csv="" sers="" wner=""><-read .csv="" esktop="" header="TRUE)<br" log2017="" ndiacensus.csv="" sers="" wner="">> dataset
ï..State.UT.Code State.UT Total Male Female
1 1 Jammu and Kashmir 2008670 1080662 927982
2 2 Himachal Pradesh 763864 400681 363183
3 3 Punjab 2941570 1593262 1348308
4 4 Chandigarh 117953 63187 54766
5 5 Uttarakhand 1328844 704769 624075
6 6 Haryana 3297724 1802047 1495677
7 7 Delhi 1970510 1055735 914775
8 8 Rajasthan 10504916 5580212 4924004
9 9 Uttar Pradesh 29728235 15653175 14075060
10 10 Bihar 18582229 9615280 8966949
11 11 Sikkim 61077 31418 29659
12 12 Arunachal Pradesh 202759 103430 99330
13 13 Nagaland 285981 147111 138870
14 14 Manipur 353237 182684 170553
15 15 Mizoram 165536 83965 81571
16 16 Tripura 444055 227354 216701
17 17 Meghalaya 555822 282189 273633
18 18 Assam 4511307 2305088 2206219
19 19 West Bengal 10112599 5187264 4925335
20 20 Jharkhand 5237582 2695921 2541661
21 21 Odisha 5035650 2603208 2432442
22 22 Chhattisgarh 3584028 1824987 1759041
23 23 Madhya Pradesh 10548295 5516957 5031338
24 24 Gujarat 7564464 3974286 3519890
25 25 Daman and Diu 25880 13556 12314
26 26 Dadra and Nagar Haveli 49196 25575 23621
27 27 Maharashtra 12848375 6822262 6026113
28 28 Andhra Pradesh 8642686 4448330 4194356
29 29 Karnataka 6855801 3527844 3327957
30 30 Goa 139495 72669 66826
31 31 Lakshadweep 7088 3715 3373
32 32 Kerala 3322247 1695889 1626358
33 33 Tamil Nadu 6894821 3542351 3352470
34 34 Puducherry 127610 64932 62678
35 35 Andaman and Nicobar Islands 39497 20094 19403
------------
Let us get a overview of data:
> dim(dataset)
[1] 35 5
--------------
We already looked at the data. But we could have also looked at a sample as shown.-read>-read>
> head(dataset)
ï..State.UT.Code State.UT Total Male Female
1 1 Jammu and Kashmir 2008670 1080662 927982
2 2 Himachal Pradesh 763864 400681 363183
3 3 Punjab 2941570 1593262 1348308
4 4 Chandigarh 117953 63187 54766
5 5 Uttarakhand 1328844 704769 624075
6 6 Haryana 3297724 1802047 1495677
>
-----------------------------
Let us get the last three columns of data into x using following and a statistical summary of the three columns using the following:
> x=dataset[,3:5]
> summary(x)
Total Male Female
Min. : 7088 Min. : 3715 Min. : 3373
1st Qu.: 184148 1st Qu.: 93698 1st Qu.: 90450
Median : 2008670 Median : 1080662 Median : 927982
Mean : 4538846 Mean : 2370060 Mean : 2166757
3rd Qu.: 6875311 3rd Qu.: 3535098 3rd Qu.: 3340214
Max. :29728235 Max. :15653175 Max. :14075060
----
This is just a sample of the 'Total' column:
> head(x[1])
Total
1 2008670
2 763864
3 2941570
4 117953
5 1328844
6 3297724
-----------
Let us just BOXPLOT the "Total" column.
>boxplot(x[1])
-----------
Boxplot can be created for a group of columns as well.
Here is the Boxplot for the group, I just scaled it to log10.
> z=log10(x)
> boxplot(z)
For an explanation of BOXPLOT, read here.
Use the following code to get the data into R:
Data from this csv file:
Importing a CSV file:
> dataset <-read .csv="" esktop="" header="TRUE)</font" log2017="" ndiacensus.csv="" sers="" wner=""><-read .csv="" esktop="" header="TRUE)<br" log2017="" ndiacensus.csv="" sers="" wner="">> dataset
ï..State.UT.Code State.UT Total Male Female
1 1 Jammu and Kashmir 2008670 1080662 927982
2 2 Himachal Pradesh 763864 400681 363183
3 3 Punjab 2941570 1593262 1348308
4 4 Chandigarh 117953 63187 54766
5 5 Uttarakhand 1328844 704769 624075
6 6 Haryana 3297724 1802047 1495677
7 7 Delhi 1970510 1055735 914775
8 8 Rajasthan 10504916 5580212 4924004
9 9 Uttar Pradesh 29728235 15653175 14075060
10 10 Bihar 18582229 9615280 8966949
11 11 Sikkim 61077 31418 29659
12 12 Arunachal Pradesh 202759 103430 99330
13 13 Nagaland 285981 147111 138870
14 14 Manipur 353237 182684 170553
15 15 Mizoram 165536 83965 81571
16 16 Tripura 444055 227354 216701
17 17 Meghalaya 555822 282189 273633
18 18 Assam 4511307 2305088 2206219
19 19 West Bengal 10112599 5187264 4925335
20 20 Jharkhand 5237582 2695921 2541661
21 21 Odisha 5035650 2603208 2432442
22 22 Chhattisgarh 3584028 1824987 1759041
23 23 Madhya Pradesh 10548295 5516957 5031338
24 24 Gujarat 7564464 3974286 3519890
25 25 Daman and Diu 25880 13556 12314
26 26 Dadra and Nagar Haveli 49196 25575 23621
27 27 Maharashtra 12848375 6822262 6026113
28 28 Andhra Pradesh 8642686 4448330 4194356
29 29 Karnataka 6855801 3527844 3327957
30 30 Goa 139495 72669 66826
31 31 Lakshadweep 7088 3715 3373
32 32 Kerala 3322247 1695889 1626358
33 33 Tamil Nadu 6894821 3542351 3352470
34 34 Puducherry 127610 64932 62678
35 35 Andaman and Nicobar Islands 39497 20094 19403
------------
Let us get a overview of data:
> dim(dataset)
[1] 35 5
--------------
We already looked at the data. But we could have also looked at a sample as shown.-read>-read>
> head(dataset)
ï..State.UT.Code State.UT Total Male Female
1 1 Jammu and Kashmir 2008670 1080662 927982
2 2 Himachal Pradesh 763864 400681 363183
3 3 Punjab 2941570 1593262 1348308
4 4 Chandigarh 117953 63187 54766
5 5 Uttarakhand 1328844 704769 624075
6 6 Haryana 3297724 1802047 1495677
>
-----------------------------
Let us get the last three columns of data into x using following and a statistical summary of the three columns using the following:
> x=dataset[,3:5]
> summary(x)
Total Male Female
Min. : 7088 Min. : 3715 Min. : 3373
1st Qu.: 184148 1st Qu.: 93698 1st Qu.: 90450
Median : 2008670 Median : 1080662 Median : 927982
Mean : 4538846 Mean : 2370060 Mean : 2166757
3rd Qu.: 6875311 3rd Qu.: 3535098 3rd Qu.: 3340214
Max. :29728235 Max. :15653175 Max. :14075060
----
This is just a sample of the 'Total' column:
> head(x[1])
Total
1 2008670
2 763864
3 2941570
4 117953
5 1328844
6 3297724
-----------
Let us just BOXPLOT the "Total" column.
>boxplot(x[1])
-----------
Boxplot can be created for a group of columns as well.
Here is the Boxplot for the group, I just scaled it to log10.
> z=log10(x)
> boxplot(z)
For an explanation of BOXPLOT, read here.
No comments:
Post a Comment