Python is one of the most popular programming languages. It is a general purpose high level language. It was created by Guido van Rossum, publicly released in 1991. SQL Server 2016 started to support R, and SQL Server 2017 adds support for Python. Now you can select your preferred language for the data science and even other tasks. R has even more statistical, data mining and machine learning libraries, because it is more widely used in the data science community; however, Python has broader purpose than just data science, and is more readable and might thus be simpler to learn. This is the first of the four articles that introduce Python to SQL Server developers and business intelligence (BI) specialists. This means that the articles are more focused on Python basics and data science, and less on general programming with Python.
Starting with Python
Python is an interpreted language. The philosophy of the language is about the code readability. For example, you use white spaces to delimit code blocks instead of special characters like semicolon or curly brackets. Python supports automatic memory management. It has a dynamic type system. You can use multiple program paradigms in Python, including procedural, object-oriented, and functional programming. You can find Python interpreters for all major operating systems. The reference implementation of Python, namely CPython, is open source software, managed by the non-profit Python Software Foundation. Of course, being open source, also means that there is a reach set of libraries available. Even the standard library is impressive and comprehensive.
In order to start working with Python and R, you need to do some installation. I am not covering general SQL Server and Visual Studio installation, I am just explaining what you need to do to start using Python with SQL Server.
Installing ML Services and VS 2017 for Data Science
You just start SQL Server setup, and then from the Feature Selection page select Database Engine Services, and underneath Machine Learning (ML) Services (In-Database), with Python only, or both languages, R and Python, selected. After that, all you need are client tools, and you can start writing the code. The following figure shows the SQL Server setup Feature Selection page with appropriate features selected.
The next step is installing client tools. Of course, you need SQL Server Management Studio (SSMS). In addition, you might want to install Visual Studio (VS) 2017. You can use either Professional or even free Community edition to develop python (and also R) code.
When installing Visual Studio 2017, be sure to select Python development workload, and then Data science and analytical applications, like the following figure shows. This will install Python language templates, including data science templates, and also R Tools for Visual Studio.
Selecting the Appropriate Python Engine
There you go, you are nearly ready. There is a small trick here. VS 2017 installs also its own Python interpreter. In order to use the scalable, the one installed with SQL Server, the one that enables executing code in the Database Engine context and includes Microsoft scalable libraries, you need to setup an additional Python environment, pointing to the scalable version of the interpreter. The path for this scalable interpreter is, if you installed the default instance of SQL Server, C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\python.exe. You can see how to setup this environment in the following figure.
That’s it. You are ready to start Python programming. Just start a new project and select the Python Application template from the Python folder. You can also explore the Python Machine Learning templates, which include Classifier, Clustering, and Regression projects. If you selected the Python Application template, you should have open the first empty Python script with default name the same as the project name and default extension py, waiting for you to write and interactively execute Python code.
Python Language Basics
Python uses the hash mark for a comment. You can execute Python code in VS 2017 by highlighting the code and simultaneously pressing the Ctrl and Enter keys. You can use either single or double apostrophes for delimiting strings. The first command you will learn is the print() command. Write and execute the following code:
# Hash starts a comment print("Hello World!") # This line ignored print('Printing again.') print('O"Hara') # In-line comment print("O'Hara")
You can observe the code you wrote and the results in the Interactive window, which is by default below the script window, at the bottom left side of the screen.
Python supports all basic mathematical and comparison operators, like you would expect. The following code introduces them. Note that you can combine strings and expressions in a single print() statement.
1 + 2 print("The result of 3 + 20 / 4 is:", 3 + 20 / 4) 10 * 2 - 7 10 % 4 print("Is 7 less or equal to 5?", 7 <= 5) print("Is 7 greater than 5?", 7 > 5)
The next step is to introduce the variables. Note that Python is case-sensitive. The following code shows how you can assign values to variables and use them for direct computations and as the arguments of a function.
# Integer a = 2 b = 3 a ** b # Float c = 7.0 d = float(5) print(c, d)
You can define strings inside double or single quotes. This enables you to use single quotes inside a double-quoted string, and vice-versa. You can use the “%?” operator for formatting strings to include variables, where the question mark stands for a single letter denoting the data type of the variable, for example “s” for strings and “d” for numbers. The str.format() method of the string data type allows you to do variable substitutions in a string. Here are some examples.
e = "String 1" f = 10 print("Let's concatenate string %s and number %d." % (e, f)) four_cb = "String {} {} {} {}" print(four_cb.format(1, 2, 3, 4))
The result of the previous code is:
Let’s concatenate string String 1 and number 10.
String 1 2 3 4
You can also create multi-lines strings. Just enclose the strings in a pair of three double quotes. You can also use special characters, like tab and line feed. Escape them with a single backslash character plus a letter, for example letter t for a tab and letter n for a line feed.
You can always get interactive help with the help() command. A Python module is a file with default extension .py containing Python definitions and statements. You can import a module into your current script with the import command, and then use the functions and variables defined in that module. Besides modules provided with the installation, you can, of course, develop your own modules, distribute them, and reuse the code.
Using Functions, Branches, and Loops
Like in any serious programming language, you can encapsulate your code inside a function. You define a function with the def name(): command. Functions can use arguments. Functions can also return values. The following code defines two functions, one that has no arguments, and one that has two arguments and returns a value. Note that there is no special ending mark of a function body – the correct indentation tells the Python interpreter where the body of the first function ends, and the definition of the second function starts.
def p_n(): print("No args...") def add(a, b): return a + b
When you call a function, you can pass parameters as literals, or through variables. You can also do some manipulation with the variables when you pass them as the arguments to a function. The following code shows these possibilities.
p_n() # Call with variables and math a = 10 b = 20 add(a / 5, b / 4)
You can make branches in the flow of your code with the if..elif..else: statement. The following code shows you and example.
a = 10 b = 20 c = 30 if a > b: print("a > b") elif a > c: print("a > c") elif (b < c): print("b < c") if a < c: print("a < c") if b in range(10, 30): print("b is between a and c") else: print("a is less than b and less than c")
The results of the code are:
b < c a < c b is between a and c
The simplest data structure is the list. Python list is a set of comma-separated values (or items) between square brackets. You can use a for or for each loop to iterate over a list. There are many methods supported by a list. For example, you can use the list.append() method to append an element to a list. The following code shows how to create lists and loop over them with the for and foreach loops. Finally, it shows a while loop.
animals = ["cat", "dog", "pig"] nums = [] for animal in animals: print("Animal: ", animal) for i in range(2, 5): nums.append(i) print(nums) i = 1 while i <= 10: print(i) i = i + 1
The last data structure presented in this introduction article is the dictionary. A dictionary is a set of the key – value pairs. You can see an example of a dictionary in the following code.
states = { "Oregon": "OR", "Florida": "FL", "Michigan": "MI"} for state, abbrev in list(states.items()): print("{} is abbreviated {}.".format(state, abbrev))
I mentioned that in Python you can also use object-oriented paradigm. However, going deeper with object-oriented programming with Python is beyond the scope of this and the following articles.
Conclusion
I guess the programming in Python introduced so far was not over exciting. However, you always need to start with basics, and only after you embrace the basics, the exciting part starts. Therefore, don’t miss my next article, when I will introduce the most important data structure for advanced analytics, the data frame structure.
Read the whole series:
Python for SQL Server Specialists Part 2: Working with Data
Python for SQL Server Specialists Part 3: Graphs and Machine Learning
Python for SQL Server Specialists Part 4: Python and SQL Server
This is just an introduction, for more on starting with data science in SQL Server please refer to the book “Data Science with SQL Server Quick Start Guide” (https://www.packtpub.com/big-data-and-business-intelligence/data-science-sql-server-quick-start-guide), where you will learn about tools and methods not just in Python, but also R and T-SQL.