Programming Page

1. Introduction

Computer programming is the process of developing and running computer programs, and is often abbreviated to 'programming' anprogad sometimes referred to as 'coding' (although coding is perhaps a more general term). Computer programs are often referred to as 'programs', 'executables' or 'software'. Programs run in an operating system environment on computational hardware.

A contemporary general-purpose computer is sometimes called a workstation, desktop or laptop. These typically have a keyboard and mouse for user input, a visual display screen (monitor), smaller amounts of faster more volatile data storage (memory), and larger amounts of more persistent data storage (disk) that is organised using a file system. Such computers may have other inbuilt or perpherally connected devices such as microphones, speakers, cameras, and additional data storage. The monitor may also be a touchscreen user interface.

The basic functioning of a computer is handled by the operating system upon which additional software can be installed for programming, including compilers and interpreters that convert more readable source code into machine code instructions executed by processors. Contemporary general-purpose workstations are typically multi-core - in that they contain multiple processors.

Networked computers typically communicate using standard Internet protocols and can be organised to work together. Whilst networked general purpose computers collectively have a lot of power. It is not easy to utilise this power efficiently, so research facilities often make available Supercomputers for High Performance Computing.

2. Data

2.1. Bits and Bytes

In most modern computers, data is encoded in binary: the smallest unit is a bit which encodes two possible states, denoted 0 and 1.

Typically computers work with fixed size collections of bits called bytes.

The more bits there are in a byte, the more different unique combinations or arrangements of bits there can be. With each added bit, there are double the number of combinations. So, with 2 bits there are 4 possible combinations - [00, 01, 10, 11]; with 3 bits there are 8 possible combinations - [000, 001, 010, 100, 011, 101, 110, 111] , and so on. With each added bit in a byte there are twice as many combinations, soon provides many combinations...

7 bits in a byte allows for (128 combinations) which is sufficient to represent all the letters in the English alphabet in both lower and upper case, the ten numeric digits 0 to 9, and 64 other symbols.

Commonly, there are multiples of 8 bits in a byte, but there can be any number.

2.2. File Formats

Data stored in the file system is often encoded using a standard file format - a specification which details the encodings used and how data are arranged.

Text is commonly encoded as ASCII or Unicode. (ASCII encoding uses 7 bit bytes. Unicode uses between 1 and 4 bytes of length 8 for encoding. As of Unicode version 15.0, there are 149,186 characters that are uniquely encoded, including various alphabets, mathematical symbols and emojis.)

Some file formats use different encodings in different parts, a complication that can make the data more usable and more compact.

Often the suffix of a filename indicates the file format, for example the file format of a file with a name ending ".html" is expected to be in HTMLformat. Some file formats contain an identifying code (known as a magic number (typically at the start of the file) which specifies the format. The file format can also be detailed in external metadata.

File formats are revisited in IO Section 5.

2.3. Integers and Floating Point

All the integer numbers in a specific range are often represented individually using bytes of a length sufficient for that range. An encoding will detail how this is done. Often, zero is either in the middle or at the start of the range. If the byte size is minimal, to represent integer numbers in a wider range, either the byte size must increase, or multiple bytes must be used in a more complicated encoding.

Floating point numbers are a subset of rational numbers often encoded using bytes of length 32 or 64. Single precision is a standard encoding that uses 32 bit bytes. Double precision is a standard encoding that uses 64 bit bytes. The number of rational numbers within any part of the range varies.

Floating point arithmetic is standardised. The result of a calculation gets rounded to the nearest value that can be represented in the encoding. Most of the time, the standardisation results in the same calculations on different computers, but there can be rounding variations. Anyway, for some arithmetic calculations the result is completely accurate and precise, for others the result cannot be stored exactly in the encoding and is effectively rounded either up or down.

3. Learning to Program

Learning to program takes time and energy. It is highly recommended that you organise to learn programming concepts when you are well rested, have good concentration and can work without distraction.

Take breaks as you learn. They help you learn, avoid fatigue, be efficient, and should make the learning experience healthier and more enjoyable.

Save your work often and use version control as appropriate to avoid losing work and to help you and other track progress.

Good ways to improve your skills are by getting involved in open source software development projects, reviewing code, writing code, reading documentation and doing other programming courses.

Being familiar with one programming language helps in learning another. Many concepts are shared and language syntax and programming workflows are often similar.

Programming and programming language development are typically community activities. It is normal to ask others for help and to provide others with help and work collaboratively to develop software. There are various online systems that help with this including online forums.

Asking for help with programming is a skill in itself. Whilst it may seem easiest for you to show someone what is happening and talk to someone else about it. It is often not so easy to arrange an interactive help session. Often the best way to get help is to document the issue - describe and explain with text and pictures what you are trying to do, what happens, and why this is confusing or not what you expect/want to happen. Often documenting the issue can help you better understand and come up with ideas of different things to try, which may ultimately resolve the issue before you ask for help. Don't see this as wasted effort, the more you practise preparing to ask for help, the better you should be at it when you do! Such prompting is also the basis of Vibe Programming where developers use AI assistance to generate, refactor, debug, test and document code.

Often detailing an issue involves consulting documentation and providing information about your environment. Sometimes the issue is not that you have done something wrong, but that some other code or software is not working as it should, or that the way that things are set up to work is somehow causing the issue.

Sometimes the issue is a result of a 'software bug' - an error, flaw or fault in the design, development, or operation that causes incorrect or unexpected things to happen. Sometimes issues happen in the same way each time something is attempted, other times the fault only sometimes happens. A fault that only sometimes happens is known as a 'glitch' and these can be difficult to troubleshoot.

Often it is worth restarting software or rebooting the computer operating system to attempt to stop unexpected behaviour happening. Having to do this frequently becomes annoying, and it can be worth spending time trying to get a better workaround in place, or to report and help fix the bug.

Reporting a bug is an important activity in software and language development. Many bug reports are made openly available. A 'known bug' is one that has been reported already. This may already be being worked on or be 'resolved' or 'fixed' or there may be 'workarounds' - ways of coping.

A new version of software is typically released with one or more bug fixes. Sometimes you have to decide when it is worth doing the work to change to this later version or whether you can cope with a workaround.

Whether you are filing a bug report, or just asking for help, often you should aim to provide a minimum working example to replicate the bug/issue. And as with all data exchanges, you should think carefully before sharing data.

In learning to program, some things you might comprehend instantly, other things might take several attempts to grasp or fully understand. Some things you might understand, but they seem strange. Usually, there is a good reason why something works and is written the way it is, but no language is perfect and there may well be a better way...

4. Language Evolution, Deprecation and Versions

High level computer programming languages like Python, Java and C++ tend to evolve more rapidly than lower level languages like C. New languages get developed and released which is exciting, but keep up with all developments is challenging! Why develop new languages rather than modify existing ones? Well, it depends on what the goals are and to some extent who is in control. Some programming languages get retired or become obsolete. Most old versions of languages become unsupported in time. How often a new version of a language or some software is release is referred to as the "release cadence". The more rapid the release cadence, the faster a language is evolving. Some changes though are breaking changes and whilst multiple releases might be supported for a time, old versions typically become obsolete after a few years.

Supporting backward compatibility - interoperability with older versions - has both costs and benefits. These costs and benefits are weighed up by those developing languages that typically have a process of deciding how things change.

Changes that are not backward compatible can create a lot of work! Also, old code that does not work with newer language interpreters can result in reliance on old versions which is problematic and can have cyber security implications.

Languages compete for users and developers. Often new features in one language are implemented in other languages soon after. The pace of language evolution is related to the scale of investment in resources, and the skill, and design decisions taken by the developer community.

As new syntax, new functionality and more efficient ways of doing things evolve in a language, some code either becomes obsolete, or is best changed to use the new ways. This can require considerable effort although there are often tools that help.

Fundamental changes in language syntax are associated with major new versions of a language. Minor versions may add new language features. Minor-minor version changes are usually associated with one or more bug fix.

Deprecation is a common part of modern high level languages and third party software. It is part of a process of phasing things out. Things are first marked as deprecated in a version, then in subsequent versions the things are removed.

5. Considerata

For many reasons, a key one in science and for evidence based policy being reproducibility, it is important to know what version of a language and any third party components a program has been tested with, and results have been produced with.

With any language, there are often several ways to achieve the same or a similar thing. Some ways may work faster, can handle larger volumes of data, or might be more concise in terms of the amount of source code. There might be no obvious right way to do something - so sometimes, independent programming efforts produce significantly different code that essentially does the same thing. In other instances, and especially where there are style and naming guidelines, source code produced independently may be almost identical.

In learning to program, and in programming generally, code review is an important way to transfer skills and knowledge, develop good practise and improve code and software.

Scientific research software source code should be easy to understand, easy to maintain, efficient, reliable, well tested, well documented and open source. Not all scientific research software in use today is like this!

Remember to take care and think about the trustworthiness of any code you run.

Please adhere to the terms and conditions of software licenses.

Programming

Contents