Showing posts with label Unstructured data. Show all posts
Showing posts with label Unstructured data. Show all posts

Saturday, October 10, 2015

What is U-SQL?

While SQL covered the RDBMS landscape U-SQL covers a much larger data landscape.

At the same time as the announcement of Azure Data Lake Services, a new language under development at Microsoft, the U-SQL language was also announced. For the Azure Data Lake Service and what it means to business read here.

With the advent of Big Data and the task of mining all kinds of data, RDBMS suddenly found itself at a disadvantage. Structured Query Language (SQL) could only address what is in a relational data store. U-SQL was born to address this challenge posed by Big Data defined by volume, velocity and variety.

What is U-SQL
U-SQL deep dives into Big Data to extract the most relevant information. It is a powerful language (in the words of Microsoft):
  • Process any type of data. From analyzing BotNet attack patterns from security logs to extracting features from images and videos for machine learning, the language needs to enable you to work on any data.
  • Use custom code easily to express your complex, often proprietary business algorithms. The example scenarios above may all require custom processing that is often not easily expressed in standard query languages, ranging from user defined functions, to custom input and output formats.
  • Scale efficiently to any size of data without you focusing on scale-out topologies, plumbing code, or limitations of a specific distributed infrastructure.
Compared to HIVE, a SQL-Based language, U-SQL is flexible and does not have the limited capability to address the 'variety' in non-structured data requiring schema generation prior to running queries. U-SQL should prove more easy to use than Hive for complex scenarios.

U-SQL has been designed as declarative SQL based language with native extensibility through user code in C#. This approach:
  • Unifies SQL and C#
  • Unifies structured and Unstructured
  • Unifies declarative and custom code
U-SQL is based on SCOPE which is based on  existing prior languages, ANSI-SQL, T-SQL and HIVE. U-SQL should present a less steeper curve for those who are using SQL already.

U-SQL is an important development that developer need to jump on.

Wednesday, October 7, 2015

What is Azure Data Lake?

Recently announced Azure Data Lake addresses the big data  3V challenges; volume, velocity and variety. It is one more storage feature in addition to blobs and SQL Azure database. Azure Data Lake (should have been Azure Data Ocean IMHO) is really omnipotent. Just look at the key capabilities of Azure Data Lake:

Any Data
Native format, distributed data store. No need to pre-define schema information. From unstructured to structured data handling.

Any Size
Kilo bytes to Exa bytes OK. Ready for read/write.

At any scale
Scale to match your needs; high volume data handling of small writes and low latency. Can Aaddress near real-time web analytics scenarios.

HDFS Compatible
Works out-of-the box with Hadoop including services such as HD Insight

Full integration with Azure Active Directory
Supporting identity and access management over all of the data.

Azure Data Lake Store  is therefore a hyper-scale HDFS repositiory designed specifically for big data analytics in the cloud. It is order made for IoT and thorughput-intensive analytics for high volume data.

Read more here.
The graphic is from a  Microsoft Technet site
I checked out the preview portal (https://portal.azure.com/), I do not see it. Possible by the end of the year.

Thursday, June 4, 2015

What is Polybase?

It is a Microsoft Data tool. It simplifies management of relational and non-relational data with the ability to query both.

You want to query non-relational data. Do you modify it and bring it into SQL Server which is relational and then query it? Or do you buy another product to query non-relational data (like data in Hadoop, blobs and files)?

Well Polybase provides the capability to query non-relational data in-situ using the SQL Server using T-SQL. You need not move the data over to SQL Server although SQL Server gives the option to store in SQL Server if you want to do so. Polybase is supported out of the box in SQL Server 2016 CTP2 which means it will be available in SQL Server 2016.

Polybase was not supported out of the box in earlier version. Of course Polybase can process the queries whether it is on the premises or in the cloud.

Here is a rough schematic of what it is about.