Why code cannot be written in simple natural language
This is a question I remember asking myself when I was first introduced to computer programming. The basic description of programming is that it is a way of communicating with a computer to get some desired output. But when we look at a programming language it seems somewhat too complex to be a means of communication even with a computer. But just like mathematics, a programming language’s technicality is what makes it a very powerful form of communication. In this blog, I will cover some basic reasons why a programming language cannot be made to be like natural language.
First of all, it is important to understand what is meant by natural language. Natural language refers to how humans communicate and express themselves through spoken and written words. Examples of natural languages include various languages spoken throughout the whole globe like English, Chinese, Ndebele, Venda, Zulu, Spanish, etc. They are called “natural” languages because unlike “artificial” languages (languages created for a specific purpose), natural languages evolve organically over time within a particular culture and/or community. We can then infer that what the blog title is asking is: “Why not just write code in English (since it is the dominant language for several reasons) or in any other language?”. There are several reasons why that is not possible (well, not entirely impossible but rather infeasible).
One of the most important things a freshman computer science student has to learn is digital electronics which is a branch of electronics that deals with digital signals and circuits which are in turn the bedrock of computational devices like calculators, laptops, smartphones, etc. Digital electronics shows us how electronic devices process, store and transmit digital information in the form of discrete, binary signals or bits. This simply means that electronic devices only understand information that is in the form of discrete, binary signals and bits — which we usually see as a stream of 1s and 0s. At the very lowest level of a machine, there only exists information in the form of 1s and 0s. It is somewhat obvious how this proves difficult for the programmer to instruct a computer. To put this into context, consider the different ways in which we can instruct the computer to print out a statement as simple as “Hello World”.
The first code snippet is actually written in Python, however, this is what a computer sees when given the string “Hello World”. Breaking this down, to a computer, H is actually 01001000, E is 01100101, etc. Compare this snippet with other snippets to see how infeasible it is to write code in this manner.
print(chr(int('01001000', 2)), end='')
print(chr(int('01100101', 2)), end='')
print(chr(int('01101100', 2)), end='')
print(chr(int('01101100', 2)), end='')
print(chr(int('01101111', 2)), end='')
print(chr(int('00101100', 2)), end='')
print(chr(int('00100000', 2)), end='')
print(chr(int('01010111', 2)), end='')
print(chr(int('01101111', 2)), end='')
print(chr(int('01110010', 2)), end='')
print(chr(int('01101100', 2)), end='')
print(chr(int('01100100', 2)), end='')
print(chr(int('00100001', 2)), end='')
print(chr(int('00001010', 2)))
The snippet below is assembly language (x86 architecture) code for printing “Hello World”. Though it is better to write code in assembly langauge compared to using hard-coding seen in the above snippet, it was still difficult to write code. Assembly language allowed programmers to use mnemonic codes to represent machine instructions. Assembly language provided a more human-readable abstraction over the binary instructions understood by the hardware.
section .data
hello db 'Hello World', 0
section .text
global _start
_start:
; write syscall
mov eax, 4
mov ebx, 1
mov ecx, hello
mov edx, 11
int 0x80
; exit syscall
mov eax, 1
xor ebx, ebx
int 0x80
The following snippets are coded using C++, Java and Python respectively. We can see how “less” complex they are from the above two code snippets.
#include <iostream>
int main() {
std::cout << "Hello World" << std::endl;
return 0;
}
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World");
}
}
print('Hello World')
It is because of something called abstraction that we have overcome the complexity of machine language, particularly because of the use of high-level programming languages like C++, Java and Python. It is quite understandable for one to still think that these high-level languages are still complex and ask “Why not evolve from them as we evolved from lower-level languages towards languages that are more readable and close as possible to natural language”?
1. Ambiguity
Natural language is open to interpretation and words can have different meanings. There is always a need to first provide some context before certain words are used in a particular manner. This leads to errors and confusion during the interpretation process. In programming, precise and unambiguous instructions are not only necessary but actually form the basis of programming. Their necessity is so that computer can understand and easily execute desired actions.
2. Lack of Syntax — Semantics
Natural languages do not follow a strict syntax or grammar like programming languages. Natural languages depend on context whereas with programming languages, one statement means exactly one thing as they have rules and structures that ensure code correctness. If syntax is not specific, the computer will have a hard time interpreting and executing the code.
3. Machine Interpretation
As already stated, computers only understand machine language. But to allow for interaction between computers and humans there was a need for a language which a computer can correctly interpret whilst being human readable. And this language is the programming language. And so we can realize that the closer we are to human language, the harder the computer has to work to understand our instructions — and this results in what is called overhead which can prove detrimental to computational processes. Considering once more our code snippets, we can see the easiest code to replicate is the Python code — though being favourable for the user, it is not so favourable for the computer as Python lacks what is called low-level control. Python, being a high-level language, sacrifices some low-level control and optimization opportunities for ease of use and productivity.
4. Consistency and Efficiency
Programming languages were designed to promote the consistency and efficiency of code. They partly achieve this by providing mechanisms for code reuse, modularization and optimization. Natural language lacks the systematic structure and tools required for these purposes, making it difficult to write code that is maintainable, scalable, and efficient.
These are some of the main reasons why coding cannot be done in natural language, though there have been efforts to develop simplified programming environments or natural language interfaces, they often rely on a specific set of predefined commands or restrict the range of expressiveness, limiting their applicability to general-purpose programming tasks.