# Basics of programming with Python
This tutorial covers the basic on how a Python program is built.

## 1. comments

In most cases, python syntax is quite intuitive and is similar to "pseudocode".
Comments are important to make your code readable for other, but especially for yourself, when you get back to it later in your thesis!
In notebooks and some editors, there is a toggle function: in Jupyter it is Ctrl+# to comment in or out quickly.

In [None]:
# comments are inserted like this

## 2. variable assignments and simple calculator operations

In [None]:
# assign a variable
a = 1

# multiple assignment
b = c = d = 0

# common math operators
e = a + b + c + d
f = 2*a
g = f**2
h = 2/g

# modulo operator
res = 13%5 
# integer division
h = 2//g

### print(...) - function for variable display
All python functions can be displayed during execution using the print(variable) function

In [None]:
# display variables e.g. using print(...)
print(h)
print(h, g, a)

## 3. python data type handling
python automatically converts data types as needed and "on the fly". Variables can be reassigned with a new value of any type at any time. So be careful with your variable names as this can lead to very confusing errors!!
The data type of something can be checked with type(...)

### numbers

In [None]:
# int and float are casted into each other as needed 
h = 2
type(h)

In [None]:
h = h/2
type(h)

### strings

In [None]:
# strings 
myText = 'this is a string'
print(type(myText))

# strings can be seen as dynamic lists compared to c++:
myText += ' for example' # newText += addition is equivalent to newText = myText + addition
print(myText)

# some operations can be performed on strings
go = 'Go! '
print(3*go)

In [None]:
# useful string operations
# join a list of strings into one long string separated by e.g. ','
', '.join(['Dortmund', 'Düsseldorf', 'Köln'])

In [None]:
# the inverse
# by default, it separates spaces and tabs
'Dortmund, Düsseldorf, Köln'.split()

In [None]:
# custom separator
'Dortmund, Düsseldorf, Köln'.split(', ')

In [None]:
# clean up a string from leading or trailing whitespaces (yes, you might need this)
'   Dortmund, Düsseldorf, Köln  '.strip()

### string format
formatting a variable or literally anything into a string for printing, saving to txt,... is very often used in python. It is quite easy done by one of the three following different methods:

In [None]:
# numbers can be inserted into strings in several ways:
#casting and appending:
print('Cast and append:')
print('the answer is '+ str(42))
print('the answer is ', 42)
print('\n')

#### 'new style' string format
Using the 'new style' string format, you can insert a {} keyword in your string. You can then append '{}'.format(var) to format the var into the string. The notation can cause trouble using latex at the same time. There are other ways to format a string, maybe you know about it or will learn about it later

In [None]:
# string format:
print('String format:')
'this is number {}'.format(1)

In [None]:
# You can use positional informations for the string formatting
'{0} {1}'.format('We are', 'learning programming')

In [None]:
'{1} {0}'.format('We are', 'learning programming')

In [None]:
# arguments' indices can be repeated
'{0} {1} {2}{3}{2}'.format('We are', 'learning programming', '!!', '!1!1')   

In [None]:
# after the :, you can specify how a number is supposed to be formatted for printing:
line_1 = 'i want it displayed like this: {:03}'.format(4)
line_2 = 'or maybe like this: {:.4f}'.format(4)
line_3 = 'or with the example above: {} / {}'.format(1, 10)

print(line_1)
print(line_2)
print(line_3)

#### inline formatting
If you are lazy to write the .format(...) behind the strings, you can either google the 'old' format style or use inline formatting by prepending an 'f' to the string:

In [None]:
f'This is a inline formatted string writing line_1: {line_1}'

In [None]:
# this works but it looks weird
integer_number = 15
float_number = 2.12353514
f'Inline formatting can also format numbers like {integer_number:04} or {float_number:.2f}'

### bool

In [None]:
A = True
B = False

# compare values with == or !=
print(A == B)
print('The answer for (A != B) is:', (A != B))

# greater, less and less/greater equal
print(2 <= 2)
print(3 > 2)

In [None]:
# logical connections are quite intuitive
print(True and False)
print(True or False)
print(not True)

# quite useful to test data types: isinstance(var, type)
a = False
print('Is this data a bool?', (isinstance(a, bool)))

### list
Lists work somehow very similar to dynamic c++ lists except they can contain any data type. Elements are accessed c++ style.

In [None]:
a = [ 1, 'number', True]
print(a)
print(a[1])

# 'slices' can be obtained with ':'. The last element is not included!
# without a specified start or end, the beginning or end of the list is used
print(a[0:1])
print(a[1:])

# indices can be negative, they are counted from the end of the list then
print(a[-1])

# often useful:
print(len(a))

In [None]:
# useful in many occasions:
print(1 in [2,3,4,5])
print(1 in [1,2,3,4,5])

In [None]:
print('part' in 'mywordlistparts')

### Example 3.1: what do these do?
experiment a bit with these to understand it!

In [None]:
print('mywordlistparts'[::-1])
print('mywordlistparts'[::-2])
print('mywordlistparts'[3::][::-1])
print('mywordlistparts'[3::-1])


### Example 3.2: Manual list sorting
Take the list and resort it from smallest to largest number. You should use indexing and slicing, including the third *step* parameter (e. g. for reversing)
- [9, 0, 8, 1, 7, 2, 6, 3, 5, 4]
- [4, 5, 6, 0, 1, 3, 2, 7, 9, 8]

### inplace operations
lists are modified inplace with their methods. This can be handy but also lead to strange results as the methods do not have return values to print them for example.
If you write some code yourself, you should probably avoid creating inplace methods.

In [None]:

# lists can be extended inplace
a.append('New element')
print(a)
#print('the list a has the length:', len(a)) # for showing this, reload the pad below 'list'

# appending a list leads to a list in a list...
a.append([2, 3])
print(a)
#print('the list a has the length:', len(a)) # for showing this, reload the pad below 'list'

# very useful: removing elements is very easy
a.remove([2, 3])
print(a)

# lists can be merged easily:
print(a+[2,3])
a.extend([2,3])
print(a)

In [None]:
# useful boolean operations for lists:
any([True, False, False])

In [None]:
all([True, False, False])

In [None]:
# other useful built-in functions that can apply to lists
a = [1,4,3,2,5,2,3,4,1,2,3]
print("This list has length", len(a) )

print("The max value is", max(a))

print("The min value is", min(a))

print("The sorted list is: ", sorted(a))

print("The sum of the list is: ", sum(a))

print("The unique entries of the list are: ", set(a))

# zip() can be extremely useful if you want to "draw" entries from two lists at the same time 
b = ['category a', 'category b', 'category c', 'category d']
print(list(zip(a, b)))

In [None]:
# just so you know:
print( len('teststring') )
print( max('teststring') )
print( min('teststring') )
print( sorted('teststring') )

### Example 3.3 string sorting
How does the string 'TESTtest' sort? [Why is that?](https://home.unicode.org/)

### dict
dicts are probably one of the greatest things in python! Like lists, they can contain any python object. The main difference is, that you don't access the items in a dict by index only but with a key. This makes storing data in it easy, intuitive and compact.

Dict-like structures are the basis of many modern NoSQL (Not only SQL) data bases. Key-Value data bases like MongoDB or Amazon's DynamoDB are built by similar principles. Also data graphs like for example for social media data analysis are often stored in dict-like structures.

In [None]:
# dicts are useful to have a single variable holding different alternatives you can call from
myDict = {
    'raw_data': 'fit_linear',
    'processed_data': 'fit_quadratic'
 }

# things like this can take out a lot of if/else work and allow you to pass for example options around easily

In [None]:
myDict['raw_data']

In [None]:
# you can literally store any python object in dicts
d = {
    'hello': "Hello there",
    'name': 'TestUser',
    'print': print,
    print: 'This is a print function'
 }

In [None]:
# if a dict returns a function, you can call it like you would normally do. 
# possible application: fit data sets with different fit functions depending on their filename
# And: Nothing stops you from using functions as dict key
d['print'](d['hello'])
d['print'](d['name'])
d[print]

In [None]:
# some important methods when working with dicts:
myDict.keys()

In [None]:
myDict.values()

In [None]:
myDict.items()

## 4. Functions
Functions are very useful to keep code short and reuse lines. The syntax for a function is shown below:

In [None]:
# 1. A function definition starts with the keyword 'def'.
# 2. The keyword is followed by the function name.
# 3. In brackets follow the function argument/s.
def function(argument1, argument2):
    value = argument1 + argument2
    return value

print(function(2, 3))

# you can return multiple values ...
def function2(argument1, argument2):
    value = argument1 + argument2
    return value, argument1, argument2

print(function2(2, 3))

# ... or directly assign them to different variables
def function3(argument1, argument2):
    value = argument1 + argument2
    return value, argument1, argument2

a, b, c = function3(2, 3)
print(a, b, c)

### Unpacking variable lists: the * operator
Sometimes you want to pass a list of parameters to a function:

    y = fit_function(x, a, b, c, d, e, f, g, h)

This can be really annoying to do... If you have the values for the parameters a - h stored in a list, you can 'unpack' them into the function:


In [None]:
def function_with_many_arguments(a, b, c, d, e):
    print("I received the arguments a={}, b={}, c={}, d={}, e={}".format(a, b, c, d, e))
    
function_with_many_arguments(1, 2, 3, 4, 5)

In [None]:
params = [1,2,3,4,5]
function_with_many_arguments(*params)

If you didn't know this: probably could have saved you from some pain in the lab exercises I think. Someone might remember that e. g. the curve_fit function very commonly used returns a list of parameters.

## 5. Control Structures
Control structures in python are quite similar to the ones in c++ except they look like pseudo code

### if/ else

In [None]:
v = 2.7
if v <= 1.5:
    print('A')
elif v <= 2.5:
    print('B')
elif v <= 3.5:
    print('C')
elif v <= 4:
    print('D')
else:
    print('not passed')

### Example 5.1: leap year calculator
Write a function which you can pass a year to and it tells you whether the year is a leap year or not. A year is a leap year, if it can be completely divided by 4, but not, if it can also be divided completely by 100, but again it is, if completely divisible by 400. The answer should have a similar form to:

    leap_year_calc(2000)
    --> "the year 2000 is a leap year!"

Test your function with the years: 2000 (True), 2004 (True), 2018 (False), 2100 (False)

ADDITIONAL task: Tell the function caller the reason.


### for

In [None]:
# very useful: range(x)
for i in range(3):
    print(i)
    
print('\n')
# python classic: loop over lists
primes = [2, 3, 5]
for n in primes:
    print(n)

print('\n')
#you can give the counting variable any name
for prime in primes:
    print(prime)

In [None]:
# you can stop a loop with 'break'
for i in range(10):

    if i == 4:
        break 
        
    print(i)

In [None]:
# you can skip one element of the loop with 'continue'
for i in range(10):
    
    if i == 4:
        continue 
        
    print(i)

In [None]:
print('\n')
# you can loop over strings as well:
for letter in 'word':
    print(letter)

print('\n')
# often very useful: enumerate(x)
for i, letter in enumerate('word'):
    print('The {}. letter is {}'.format(i, letter))
for i, letter in enumerate([1,4,5,8,3]):
    print('The {}. entry in the list is {}'.format(i, letter))
  
# you can choose where enumerate starts
for i, letter in enumerate([1,4,5,8,3], 1):
    print('The {}. entry in the list is {}'.format(i, letter))

print('\n')
# also usable for dicts:
for key, value in myDict.items():
    print(key, value)

print('\n')
for key in myDict:
    print(myDict[key])

#### iterables
The group of Python objects that allow those loop operations are called 'iterables'. You will encounter them here and there and they are very useful.

#### Example 5.2: Processing multiple objects using lists
Write a for loop, that processes all years given in the last example.


#### Example 5.3: Fibonacci numbers
Write a function that returns a list of fibonacci numbers up to a given threshold.
Fibonacci numbers are defined by starting from [1,1] always appending the sum of the two former elements to the list --> [1,1,2] --> [1,1,2,3]

### while

In [None]:
i = 1
while i < 4:
    i += 1
print(i)

## 6. Reading and writing files
Reading and writing files is a very common task in programming. Although a lot of packages you will use have their own routines on how to read and write data, we will briefly look at how to do it in a simple way

In [None]:
file_name = 'simple_data_set.csv'
f = open(file_name, 'r')

In [None]:
# read loads the whole file content into the memory
f.read()

In [None]:
# the file is at its end now, you cannot go on reading
f.read()

In [None]:
# you need to close a file after reading to avoid damaged files or unwanted behaviour
f.close()

In [None]:
# if you want to read a file twice, you need to go back to the beginning:
f = open(file_name, 'r')
print(f.read()) # print automatically converts \n into a line break
f.seek(0) # go back to position 0
print(f.read())
f.close()

In [None]:
# in many cases more useful: read a file line by line. It stops when it finds a \n (newline) in the string
f = open(file_name, 'r')
f.readline()

In [None]:
# for data files, readlines is often very useful
lines = f.readlines()

# let's print it line by line
for line in lines:
    print(line)

In [None]:
# a nice way to make sure the file is closed properly
with open(file_name, 'r') as f2:
    for line in f2.readlines():
        print(line)
    
# f2 is closed now

In [None]:
# writing to a file is very similar
# note the 'w' instead of the 'r'. 
# 'w' creates a new file if not present and overwrites existing ones. 'a' can append to existing files.
# if you want to create a new line, you need to append \n to the string (newline command)
with open('my_file_to_write.txt', 'w') as f3:
    f3.write("# this is a test data set\n")
    f3.write("# x f(x)\n")
    for x in range(100):
        f3.write(f"{x} {3*x+2}\n")
    

# Classes (small addition)

### Defintion

In [None]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        return f"Hello, my name is {self.name} and I'm {self.age} years old."

### Creating an object & acces variables

In [None]:
person1 = Person("Alice", 25)
print(person1.greet())

In [None]:
print(person1.name)
print(person1.age)

### Inheritance

In [None]:
class Student(Person):  # Inherits from Person
    def __init__(self, name, age, topic):
        super().__init__(name, age)  # Call parent constructor
        self.topic = topic

    def study(self):
        return f"{self.name} is studying {self.topic}."

In [None]:
student1 = Student("Bob", 22, "Physics")
print(student1.study())
print(student1.greet())

Classes may not seem that useful if you want to e.g. create a script that creates a histograms, but you should definetly consider using them because it makes it way easier to structure you code and manage you variables. At least i use them in basically every script i have.
For example you might have something like this:

In [None]:
import numpy as np

def load_data(n):
    return np.random.normal(0, 1, n)

def process_data(data, a, b):
    return a*data - b

def create_hist(data, bins):
    return np.histogram(data, bins)

data = load_data(1000)
processed_data = process_data(data, 2, 1)
hist = create_hist(processed_data, 10)

This is of course a fairly simple example, but you can imagine that the separate steps probably take way more lines of code and input variables in your actual use case. Maybe there also varriables that you need multiple times in different steps. With this setup you always have to pass the variables to the different function again and again. How you could also do this using a class:

In [None]:
import numpy as np

class HistCreator:
    def __init__(self, bins):
        self.bins = bins # it is up to you if you want to pass the number of
                         # bins (and other variables) already in the constructor
                         # or only in the function that needs it. Rule of thumb:
                         # if it is needed in multiple functions, pass it in the constructor

    def load_data(self, n):
        self.data = np.random.normal(0, 1, n)

    def process_data(self, a, b):
        self.data = a*self.data - b

    def create_hist(self):
        self.hist = np.histogram(self.data, self.bins)
        return self.hist
    
def main():
    hist_creator = HistCreator(10)
    hist_creator.load_data(1000)
    hist_creator.process_data(2, 1)
    hist = hist_creator.create_hist()
    print(hist)

In [None]:
main()

# Wrap-up task
You have completed the first part of the Python tutorial, congratulations! The following task will help you to check whether you understood what you learned.

### Simple data inspection from scratch
Write a python script that performs the following tasks:
- read the simple_data_set.csv into a list of lines
- use the first line to create a dictionary with the entries as keys (species, ...)
    - store the first line of the file in a variable
    - clean and split it (watch the separator)
    - create a dict with these names as keys and empty lists as items
- store the data in lists in the dictionary
    - you need to exclude the first line, remember how to index the list for this
    - for each line
        - clean and split it
        - append the entries to the respective keys, remember zip and append for this
        - the files are read as strings, you need to convert elements with float(x)
            - watch out: you cannot convert names to float like the species, you need to exclude this
- for each feature (sepal_length, ...) except "species" print 
    - number of entries
    - minimun value
    - maximum value
    - mean value 
        - time to get creative, write your own function function for this
        - pass the list as input parameter, return the mean value
        - in the print output, limit the mean value to three decimal points