Programming

1. Introduction

Computer programming (coding) is the process of designing and building an executable computer program - a set of instruction fetched and executed sequentially.

Contemporary desktop and laptop computers typically have a keyboard and mouse for user input, a visual display screen (monitor), smaller amounts of faster more volatile data storage (memory), and larger amounts of more persistent data storage (disk that is organised using a file system. Computers may have inbuilt or perpherally connected devices such as microphones, cameras, and additional data storage. The monitor may also be an touchscreen for user input. Networked computers typically communicate using standard Internet protocols. The basic functioning of a computer is handled by an operating system upon which additional software can be installed, including software that can interpret more readable source code into lower level less readable machine instructions.

Programming can be done visually by arranging and connecting pre-built components into executable workflows. However, this course is about programming using text based command instructions that have a formal syntax known as a programming language. Some programming language syntax and terms are similar, some are very different. This course is mostly based on the Python programming language (Python).

2. Data

2.1. Bits and Bytes

In most modern computers, data is encoded in binary: the smallest unit is a bit which encodes one of two possible states, which - for simplicity and brevity - are denoted '0' and '1'.

Typically computers work with fixed size collections of bits called bytes. The more bits there are in a byte, the more different unique combinations or arrangements of bits there can be. With each added bit, there are double the number of combinations. So, with 2 bits there are 4 possible combinations [00, 01, 10, 11]; with 3 bits there are 8 combinations [000, 001, 010, 100, 011, 101, 110, 111]; with 4, 16 combinations; with 5, 32 combination, and so on. With each added bit in a byte there are twice as many combinations compared to the smaller sizes byte. This doubling adds up exponentially and soon provides many combinations...

7 bits in a byte allows for (128 combinations) which is sufficient to represent all the letters in the English alphabet in both lower and upper case, the ten numeric digits 0 to 9, and 64 other symbols.

7 bit bytes is the basis of ASCII - a data encoding which is used often used for text, and is the basis of a number of different file formats.

Unicode is another commonly used encoding. As of Unicode version 15.0, there are 149,186 characters that are uniquely encoded, including various alphabets, mathematical symbols and emojis. Unicode uses between 1 and 4 bytes of length 8 for encoding.

Commonly, there are multiples of 8 bits in a byte, but there can be any number.

2.2. File Formats

Data stored in a file is often stored in a standard file format - typically based on a versioned specification which details what encodings are used and how the data is organised.

Some file formats use different encodings in different parts , a complication that makes the data more usable and more compact - requiring less storage space.

Often the suffix of a filename indicates the file format, for example the file format of a file named "index.html" is expected to be in HTML format. Some file formats contain an identifying code (known as a magic number typically at the start of the file. If it is not clear from the filename or any external metadata what the format of the file is, sometimes it can be discerned from a magic number.

File formats are revisited in IO Section 5.

2.3. Integers and Floating Point

All the integer numbers in a specific range are often represented individually using bytes of a length sufficient for that range. The encoding will detail how this is done. Usually, zero is either in the middle or at the start of the range. If the byte size is minimal, to represent integer numbers in a wider range, either the byte size must increase, or multiple bytes must be used in a more complicated encoding.

Floating point numbers are a subset of fractions typically encoded using bytes of length 32 or 64. The density of fractions within any part of the range varies. In general, the density is greater towards the centre of the range, which with standard floating point numbers is zero.

Floating point arithmetic is standardised and the result of a calculation gets rounded to the nearest value. Most of the time, the standardisation has ensured that calculations on different computers are the same, but there can be variation. Anyway, for some floating point calculations the result is completely accurate and precise, other times it is rounded either up or down. It is important to be aware that there can be significant error in this standardised arithmetic.

Single precision floating point is a standard encoding that uses bytes of length 32 to represent each number.

Double precision floating point is a standard encoding that uses bytes of length 64 to represent each number.

3. Learning to Program

Learning to program takes time and energy. It is highly recommended that you organise to learn new programming concepts when you are well rested and have good concentration. Mistakes and misunderstandings are more likely if you are tired or distracted.

Take breaks as you learn. They do more than help avoid fatigue. They can help conceptually and overall they can save you a lot more time than they take, and they can make the whole process of learning a lot more healthy and enjoyable.

Save your work often and use version control as this avoids losing work and provides a track of progress that you and others might find useful.

Once you have a good grip of programming basics, (which you should have after this course), good ways to improve your skills are by getting involved in open source software development projects, engaging in code review, writing code, reading documentation and doing other programming courses.

Being familiar with one programming language helps in learning others. Many concepts are shared and the language syntax and workflows are often similar.

Some programming languages are particularly well geared for particular types of application. This can be a consequence of the language fundamentals, but often it is because something similar has already been done with that language.

When given a choice, experienced programmers will often choose a language for an implementation because they either know that language well - and can envisage what to do, or because they know that a particular language is well suited to the task, or because they want to learn/try something new.

Programming and programming language development are typically community activities. It is normal to ask others for help and to provide others with help and work collaboratively to develop software. There are various online systems that help with this including online forums. You are at liberty to engage in online forums, but please do not post questions about this course, particularly about the assignments. Ask a tutor if you want help.

Asking for help with programming is a skill. Whilst it may be easiest for you to show someone what is happening and talk about it. It is often not so easy to arrange an interactive help session. Often the best way to get help is to document the issue - describe and detail with text and pictures what is happeneing and why this is confusing or not what you expect/want to happen. Often the act of describing and detailing the issue can help with understanding and trying different things which may ultimately resolve the issue before you ask for help. Don't see this as wasted effort, the more you practise preparing to ask for help, the better you should be at it when you do!

Often detailing an issue involves consulting documentation and providing information about your environment. Sometimes the issue is not that you have done something wrong, but that some other code or software is not working as it should, or that the way that things are set up to work is somehow causing the issue.

Sometimes the issue is a result of a 'software bug' - an error, flaw or fault in the design, development, or operation that causes incorrect or unexpected things to happen. Sometimes issues happen in the same way each time something is attempted, other times the fault only sometimes happens. A fault that only sometimes happens is known as a 'glitch' and these can be difficult to troubleshoot.

Often it is worth restarting software or rebooting the operating system to attempt to stop unexpected behaviour happening. If you start having to do this frequently it becomes more of a pain, so that you either want a better work around, or for the bug to get fixed.

Reporting a bug is an important activity in software and language development. Many bug reports are made openly available. A 'known bug' is one that has been reported already. This may already be being worked on or be 'resolved' or 'fixed' or there may be 'workarounds' - ways of coping.

A new version of software is typically released with one or more bug fixes. Sometimes you have to decide when it is worth doing the work to change to this later version or whether you can cope with a workaround.

Whether you are filing a bug report, or just asking for help, often you should aim to provide a minimum working example to replicate the bug/issue. And as with all data exchanges, you should think carefully before sharing data.

In learning to program, some things you might comprehend instantly, other things might take several attempts to grasp or fully understand. Some things you might understand, but they seem strange. Usually, there is a good reason why something works and is written the way it is, but no language is perfect and there may well be a better way...

4. Language Evolution, Deprecation and Versions

High level computer programming languages like Python evolve and new ones occasionally get developed. Some programming languages are retired or become obsolete, and some older versions of languages become unsupported over time.

Supporting backward compatibility - interoperability with older versions - has both costs and benefits. These costs and benefits are weighed up by those developing languages that tyically have a process of deciding how things change.

Changes that are not backward compatible can create a lot of work! It is also discouraging if old code does not work with newer language interpreters as this results in reliance on old versions which can be problematic and have security implications.

Languages compete for users and developers. Often new features in one language are implemented in other languages soon after. The pace of language evolution is related to the scale of investment in resources, and the skill, and design decisions taken by the developer community.

As new syntax, new functionality and more efficient ways of doing things evolve in a language, some code either becomes obsolete, or is best changed to use the new ways. This can require considerable effort to retire or update (refactor) code.

Fundamental changes in language syntax are associated with major new versions of a language. Minor versions may add new language features. Minor-minor version changes are usually associated with one or more bug fix.

Deprecation is a common part of modern high level languages and third party software. It is part of a process of phasing things out. Things are first marked as deprecated in a version, then in subsequent versions the things are removed.

5. Considerata

For many reasons, a key one in science and for evidence based policy being 'reproducibility' - it is important to know what version of a language and any third party components a program has been tested with and results have been produced with.

With any language, there are often several ways to achieve the same or a similar thing. Some ways may work faster, can handle larger volumes of data, or might be more concise in terms of the amount of source code. There might be no obvious right way to do something - so sometimes, independent programming efforts produce significantly different source code that essentially does the same thing. In other instances, and especially where there are style and naming guidelines, source code produced independently may be identical.

In learning to program, and in programming generally, code review is an important way to transfer skills and knowledge, develop good practise and improve code and software.

Throughout this course, you are encouraged to produce easy to understand, easy to maintain, efficient, reliable, well tested and well documented code/software. Not all code and software in use today is like this!

Remember to take care and think about the trustworthiness of any code you run and if in doubt, consult your tutor.

Please adhere to the terms and conditions of software licenses. And keep in mind that it is important to keep track of what you consult and avoid plagiarising (presenting other's work as your own).