Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Wednesday, August 3, 2022

How is Bigdata handled, RDBMS or NoSQL?

 Here is a reasonable article comparing SQL vs. NoSQL. Here you can also look up the differences between RDBMS and Document Databases. 

https://phoenixnap.com/kb/sql-vs-nosql

While Amazon has its own DocumentDB, MongoDB is used in a lot of places (Forbes, Toyota, etc) and Amazon's DocumentDB is compatible with MongoDB. 

Of course, Microsoft's SQL Server is a mature product and it can even handle BigData using Polybase virtualization. You can query data from any SQL Server, Oracle, Teradata, MongoDB, and other data sources using external tables. 

https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-options?view=sql-server-ver16

Connectivity to HDFS now uses published REST APIs instead of the Java Hadoop client. all you need to do is to configure connectors while configuring the AZURE Storage.

Here is a schematic from Microsoft's documentation of BigData storage and processing in the Microsoft platform.


Also, storing data in itself is not sufficient and Microsoft has POWER BI which also visualization of data from a huge number of database products. It is hard to beat Microsoft at this game.

I am somewhat slanted towards Microsoft due to my association with Microsoft database products for a long time. I have not received any remuneration from Microsoft for this post.

Friday, August 5, 2016

What is needed to leverage R from Visual Studio?

Microsoft's strategy is to bind all its assets and stay in a continuous state of integration, be it SQL Server; Office or its Communication set of products such as Exchange, Live or Skype.

With the addition of R technology for SQL Server in its most recent, highly touted product SQL Server 2016, integration of R with SQL Server and by default with Visual Studio 2015 was a natural extension. Probably integrating R with Excel is not far behind.

As it moves forward embracing Open Source concepts, Microsoft R client is free, the details of which follow:
----------
Microsoft R Client (x64) - (English)
SHA1: B5A7053C7CBC1079091DD1420D04E0489F43AD00
File name: en_microsoft_r_client_x64_8839107.exe
--------------
You can download R Client from here:
http://aka.ms/rclient/download

With Microsoft R Client you are not limited to Microsoft R Open and you can use any open source R package. ScaleR is a powerful new technology in Microsoft Client. ScaleR's propriety functions can be used to great advantage for parallelization and remote computing. With R Client you can work locally using ScaleR but is somewhat constrained limited by local memory and speed. This is improved by pushing the compute context to Microsoft R Server(SQL Server R Services) and R Server for Hadoop.

Microsoft R Client is also part of the R Tools for Visual Studio which gets installed with Visual Studio Update 3 install.

Here are some screen shots from a Visual Studio 2015 Community Update 3 upgrade from Update 2. You can also install the latest version of Visual Studio 2015 Community.

As a default, you get notified of product updates right within Visual Studio 2015 Community (presently Update 2).

 In the Extensions and Updates window you see more details and you can launch the installation from here by clicking the Update button.


This is an intermediate window during updating. R Tools 0.3 for Visual Studio 2015 are being applied to Visual Studio.
This is the Help screen of Visual Studio 2015 after update. I am not sure why the window is stall showing Update 2!! However, notice that R Tools have been added.

R Tools appear to provide a comprehensive set programming support for using R with Visual Studio 2015.

A R Tools toolbar item is added to the menu as shown.

Looks like there are two independent ways of getting Microsoft R Client, the hyperlink which downloads the executable to your computer to being installing or upgrading Visual Studio 2015.

Please note before upgrading to Visual Studio 2015 Community Update 3 I might have started the independent client installation. I will have to check on that.

Here is an image from R Tools site.



Wednesday, October 7, 2015

What is Azure Data Lake?

Recently announced Azure Data Lake addresses the big data  3V challenges; volume, velocity and variety. It is one more storage feature in addition to blobs and SQL Azure database. Azure Data Lake (should have been Azure Data Ocean IMHO) is really omnipotent. Just look at the key capabilities of Azure Data Lake:

Any Data
Native format, distributed data store. No need to pre-define schema information. From unstructured to structured data handling.

Any Size
Kilo bytes to Exa bytes OK. Ready for read/write.

At any scale
Scale to match your needs; high volume data handling of small writes and low latency. Can Aaddress near real-time web analytics scenarios.

HDFS Compatible
Works out-of-the box with Hadoop including services such as HD Insight

Full integration with Azure Active Directory
Supporting identity and access management over all of the data.

Azure Data Lake Store  is therefore a hyper-scale HDFS repositiory designed specifically for big data analytics in the cloud. It is order made for IoT and thorughput-intensive analytics for high volume data.

Read more here.
The graphic is from a  Microsoft Technet site
I checked out the preview portal (https://portal.azure.com/), I do not see it. Possible by the end of the year.

Thursday, June 4, 2015

What is Polybase?

It is a Microsoft Data tool. It simplifies management of relational and non-relational data with the ability to query both.

You want to query non-relational data. Do you modify it and bring it into SQL Server which is relational and then query it? Or do you buy another product to query non-relational data (like data in Hadoop, blobs and files)?

Well Polybase provides the capability to query non-relational data in-situ using the SQL Server using T-SQL. You need not move the data over to SQL Server although SQL Server gives the option to store in SQL Server if you want to do so. Polybase is supported out of the box in SQL Server 2016 CTP2 which means it will be available in SQL Server 2016.

Polybase was not supported out of the box in earlier version. Of course Polybase can process the queries whether it is on the premises or in the cloud.

Here is a rough schematic of what it is about.