Input and Output

1. Introduction

The Python standard library io module provides the main facilities for dealing with input and output (IO). Numerous other standard library modules have functionality for dealing with specific aspects and types of IO, including:

fileinput, os, pathlib, contextlib, csv, json, pickle markup, internet, tempfile, and shutil. There are also various third party libraries with useful functionality for read and writing data in specific file formats and processing data on the Web.

2. Standard Input

The builtins module input function reads input from the standard input 'stdin' which is usually the keyboard. This continues until the <ENTER> (or <RETURN>) key is pressed. The function returns a string (without a newline character). A string provided as a parameter is displayed to prompt user input. Run the following at the Python prompt and enter some keyboard input:

n_agents = input("Enter n_agents (a number between 10 and 100) and press the <ENTER> or <RETURN> key:")
print("The input detected is:", n_agents)

3. Streams

The builtins module print function writes to the standard output 'stdout' which is usually the terminal/console/screen.

Stdin and stdout are streams - flows of data. Standard error 'stderr' is also a stream - one where error messages go. Like stdout, stderr is usually written to the screen by default.

The streams stdin and stdout can be redirected to come from a file (in the case of stdin), or go to a file (in the case of stdout). From the Anaconda Prompt the following will stream data from 'stdin.txt' into 'a.py' as it runs:

python a.py < stdin.txt

The following will send output from running 'a.py' into 'stdout.txt':

python a.py > stdout.txt

This would overwrite 'stdout.txt' if it already existed. To append to the end of any existing 'stdout.txt', the following could be used:

python a.py >> stdout.txt

To stream data in and out, the following can be used:

python a.py < stdin.txt > stdout.txt

The stdout of one program can be piped to the stdin of another program using the pipe symbol '|'.

The print function also includes an option to direct stout to a file. That file has to be open in order for the writing to be successful and the open file should be closed to ensure all the data gets written. Typically reading and writing files uses 'buffers' that read or write a certain number of bytes in one go which is more efficient. Closing a file that is being written to forces a flush of any partially filled buffer. There is an option in the print function to force a flush which can also sometimes be useful...

4. Reading and Writing Files Part 1

The following code uses the builtins module open function to open a file in the current directory called 'a.in' and read the file one line at a time streaming the output to the screen before then closing the file:

f = open("a.in")
for line in f:
    print(line)
f.close()

The File Object returned from the above open function call 'f' is best closed once it is read by calling it's 'close' method as done on line 4 of the code snippet. After this line is executed, 'f' becomes unusuable. Closing releases system resources and is recommended as good practice altough code will work without doing that.

An alternative where a 'close' method call is not necessary uses the keyword 'with'. The following does effectively the same as the previous snippet:

with open("a.in") as f:
    for line in f:
        print(line)

This saves having to close the file, but it is awkward this way to read several files simultaneously.

The standard library fileinput module helps with reading multiple files simultaneously.

The readlines method

can read some or all lines of a file into a list of strings (each item being a line of the file. Whilst this can be convenient, if a file is large and not everything is wanted, then this can use a lot of memory and risks a MemoryError being raised. So, often it is better to parse a file in portions, such as line by line. Parsing might involve processing or simply storing parts or all of the line in one or more variables.

Files are opened for writing in a similar way. Writing to a file can be done as follows:

# Create something to write
a = []
for i in range(10):
    a.append("Coding is fun!");
# Open a file for writing
f = open("a.out", 'w')
# Write a to the file
for line in a:
    f.write(line)
# Close the file
f.close()

Note the additional 'w' parameter passed into the Open Function which prepares the file a.out to be written to. Writing to a file requires different preparation to reading from a file.

There are other optional arguments that can be used with the open function, follow the link for details...

An alternative using the keyword 'with' that does not require closing the file is:

# Create something to write
a = []
for i in range(10):
    a.append("Coding is fun!");
# Write a to file
with open("a.out", 'w') as f:
    # Write a to the file
    for line in a:
        f.write(line)

5. File formats

File formats were briefly introduced in Programming Section 2.2.

All files are binary files, but some binary files are known as 'text files' - the encoding is generally recognised as text. Text files are typically delimited into lines by a 'newline' code which in Python is '\n'. File formats are usually defined in a file format specification that details the structure of the file. In this section three types of text file formats are described (CSV, JSON and Markup).

5.1. CSV

CSV format files are text files of comma separated values. The values are text, but this text might represent numbers. If a value contains a comma then the value is usually enclosed in quotation marks. If the value also contains quotation marks or newline then there can be difficulty parsing the file.

5.2. JSON

JavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values.

Here is an example of some GeoJSON data:

GeoJSON example
{
    "type": "FeatureCollection",
    "features": [ {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            "coordinates": [42.0, 21.0]
        },
        "properties": {
            "prop0": "value0"
        }
    }]
}

5.3. Markup

Markup is essentially tags and content. Tags often note the ontological context of the content helping define it's meaning. Tags can be nested. Examples formats include: HTML; and, XML. Style information can be embedded about how to portray the data, but this is better kept separate. XML is extensible in that new tags can be added to extend the langauge in what are known as profiles. There are lots of standard profiles of XML for different kinds of information including GML the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features.

6. Reading and Writing Files Part 2

The following code reads a file line by line, parsing each line by splitting it using a comma and converting each part into a Float which is appended to a list which is then appended to another list called 'data':

with open("data.txt") as f:
data = []
for line in f:
    parsed_line = str.split(line,",")
    data_line = []
    for word in parsed_line:
        data_line.append(float(word))
    data.append(data_line)
print(data)

It is easier to read and write CSV format files using functions from the Python standard library csv module written specifically to do this. The following is an example of reading some numeric data:

import csv
f = open('data.csv', newline='')
reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
for row in reader: # A list of rows
    for value in row: # A list of value
        print(value) # Floats
f.close()

The keyword argument 'quoting=csv.QUOTE_NONNUMERIC' not only puts quotation marks around non numeric data, but also converts some number formats into Floats.

The following is an example of using the csv module to write data:

import csv
f = open('data.csv', 'w', newline='')
writer = csv.writer(f, delimiter=' ')
for row in data:
    writer.writerow(row) # List of values.
f.close()

The optional delimiter kwarg specified here delimits using a space ' ' instead of the default comma ',', so this would actually not generate CSV format data!

The following code uses the standard library json module to read a JSON file:

import json
f = open('data.json')
data = json.load(f)
f.close()
print(data)

The following code can be used to write a JSON file:

import json
f = open('data.json', 'w')
json.dump(data, f)
f.close()

The standard library for processing HTML and XML is:markup, but the third party Beautiful Soup package is arguably easier to use, and is used later in the course to parse some HTML.

7. Serialisation/Deserialisation

Serialisation is the conversion of program data into data stored typically in a file. Deserialisation is the opposite process that converts data back into working code. Essentially any Python object can be serialised and later deserialised. For details on how to do this, see the standard library pickle module

8. OS and File Systems

3. OS and File Systems

The standard library (os module) allows for interaction with the underlying computer operating system (OS), including 'environment variable' manipulation and file system navigation.

Environment variables are variables at the OS level. The mapping object 'os.environ' allows for accessing environment information from a Python program. For example, the following will print the PATH:

import os
print(os.environ["PATH"])

And the following will print a list of every file and directory in the current working directory:

print os.listdir(path='.')

The standard library pathlib module helps with handling file paths and managing file systems

The standard library tempfile module is useful for creating temporary files.

The standard library shutil module is useful for copying files and directory structures.

The standard library glob module is useful for pattern hunting in files and directories.