Views 
   PDF Download PDF Downloads: 1102

 Open Access -   Download full article: 

Analyzing Varieties of Agricultural Data Using Big Data Tools Pig

Bankim L. Radadiya1 and Parag Shukla2*

1Director IT- Navsari Agricultural University - Navsari - Gujarat India.

2Head-Department of MCA, Atmiya Institute of Technology and Science, Rajkot – 360005, India.

Corresponding author Email: paragshukla007@gmail.com

DOI : http://dx.doi.org/10.13005/ojcst/10.04.16

Article Publishing History
Article Received on : 4-12-2017
Article Accepted on : 11-12-2017
Article Published : 12 Dec 2017
Article Metrics
ABSTRACT:

Day by day, data is growing rapidly. Analysis of the data is necessity. As per recent survey data generated in last 2 years is more than the data created in entire previous history of human. Data created in different form and in diversified manner. It can be structured, it can be semi-structured, or it can be unstructured. To analyze diversified by agricultural data we can use the tools of Big Data like Pig. Using Pig, we can analyze varieties of data. Pig is a platform for analysis of data. Biggest advantage of Pig is it can process any diversified data very quickly and allows us to use user defined functions. Use Case of Pig is ETL. It is used to extract the data from sources then after applying transformation we can load it into the data warehouse. We will do analysis of state wise proportion circulation of Numeral of operative properties for all societal collections in 2005-06 and 2010-11 using Pig.

KEYWORDS: Analysis; Agricultural data; Big Data Tools; Pig; Structured; Semi-Structured; Unstructured; Varieties.

Copy the following to cite this article:

Radadiya B. L, Shukla P. Analyzing Varieties of Agricultural Data Using Big Data Tools Pig. Orient.J. Comp. Sci. and Technol;10(4)


Copy the following to cite this URL:

Radadiya B. L, Shukla P. Analyzing Varieties of Agricultural Data Using Big Data Tools Pig. Orient. J. Comp. Sci. and Technol;10(4). Available from: http://www.computerscijournal.org/?p=7232


Introduction

Nowadays, data is growing very speedy. Analysis of the data is necessity for the many organization. As per recent survey data generated in last 2 years is more than the data created in entire previous history of human. Data created in different form and in diversified manner. It can be structured, it can be semi-structured, or it can be unstructured. To analyze diversified by agricultural data we can use the tools of Big Data like Pig. Using Pig, we can analyze varieties of data. Pig is a platform for analysis of data. Biggest advantage of Pig is it can process any diversified data very quickly and allows us to use user defined functions. Use Case of Pig is ETL. It is used to extract the data from sources then after applying transformation we can load it into the data warehouse.

Here, in this study we analyzed verities of agricultural data using the big data tools Pig.

What is Pig?

Fig-1 What is Pig?

Figure 1: What is Pig? 



Click here to View figure

 

Why Pig? & What Pig Supports?

Fig-2 Why Pig? & What Pig Supports?

Figure 2: Why Pig? & What Pig Supports? 



Click here to View figure

 

Analysis of Structured Agricultural Data Using Pig

To analyze structured data, first we must identify the source of data. Source of structured data can be any RDBMS like oracle, SQL Server, DB2, MySQL, Spreadsheets or OLTP Systems. Following are the source of structured data.

Fig-3 Sources of Structured Data

Figure 3: Sources of Structured Data 



Click here to View figure

 

Step-1 Load the structured data.

We took the data of state wise proportion circulation of Numeral of operative properties for all societal collections in 2005-06 and 2010-11 from government website.1

Once retrieve the comma separated values file from government website, we copied the file on linux platform. Once we copied on linux then we moved the same file on HDFS platform. Following is command to move the file from linux root directory to HDFS directory named PARAG. Copy From Local command is used to move the file from linux directory to HDFS directory.

hadoop fs -copyFromLocal /root/state_data.csv /PARAG

Figure 3a

Figure 3a



Click here to View figure

 

Step-2 Display the loaded data

We can use dump statement to display the data in Grunt Shell.

Fig-4 State wise proportion circulation of Numeral of operative properties

Figure 4: State wise proportion circulation of Numeral of operative properties 



Click here to View figure

 

Step-3 Filter Specific Data

For analysis of any data we can use filter or aggregate functions. Here, we are filtering the specific data from state Gujarat.

Fig-5 State wise data of 2005 which census marginal is more than 50

Figure 5: State wise data of 2005 which census marginal is more than 50 



Click here to View figure

 

Finding all state data which census_small of 2005 is more than 30

Fig-6 State wise data of 2010 which census marginal is more than 80

Figure 6: State wise data of 2010 which census marginal is more than 80 



Click here to View figure

 

Finding all state data which census_small of 2010 is more than 30

Fig-7 State wise data of 2010 which census small is more than 30

Figure 7: State wise data of 2010 which census small is more than 30 



Click here to View figure

 

Analysis of Unstructured Agricultural Data Using Pig

Conclusion

We did analysis of agricultural data of state wise proportion circulation of Numeral of operative properties for all societal collections in 2005-06 and 2010-11 using Pig. We analyzed structured agricultural data using Pig. As we know that day by day requirement of analysis of the data is increasing rapidly. To demonstrate the use of analysis using big data tools Pig we used the government agricultural data and did the analysis of data.

Analysis of the data is necessity for the many organization. Data created in different form and in diversified manner. It can be structured, it can be semi-structured, or it can be unstructured. To analyze diversified by agricultural data we can use the tools of Big Data like Pig. Using Pig, we can analyze varieties of data. Pig is a platform for analysis of data. Biggest advantage of Pig is it can process any diversified data very quickly and allows us to use user defined functions. Use Case of Pig is ETL. It is used to extract the data from sources then after applying transformation we can load it into the data warehouse.

Acknowledgment

We wish to thank Open Government Data Platform (OGD) for providing data for analysis & sincere thanks to our mentor.

References

  1. https://data.gov.in/resources/state-wise-percentage-distribution-number-operational-holdings-all-social-groups-during
  2. Apache Pig, https://pig.apache.org/
  3. Apache Pig Architecture and components of Pig [online resource] https://www.tutorialspoint.com/apache_pig/apache_pig_architecture.htm
  4. Pig Philosophy, https://pig.apache.org/philosophy.html
  5. Hive Vs Pig [online resource] http://www.bigdataanalyst.in/hive-vs-pig/
  6. Big Data and Analytics – Wiley Publication, Seema Acharya, Subhashini Chellapan
  7. Dr. Birendra Goswami, Pradip Kumar Chandra “The Evolution Of Big Data As A Research And Development” International Journal of Scientific Research and Engineering Studies (IJSRES) Volume 2 Issue 3, March 2015 ISSN: 2349-8862
  8. Online Resource https://data.gov.in/

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.