Duplicate of:

Defend zero-based arrays

Why did the computer programming inventors have the brilliant idea of making arrays start with index zero and leave us all doing myArray.length()-1 when they knew that the logical thing would be to have the index start from 1?

Is it some sort of backwards compatibility due to which all (even modern) languages have arrays start from 0, or is there actually any logic behind this?


Excerpt from Wikipedia:

The zero-based array is more natural in the root machine language and was popularized by the C programming language, where the abstraction of array is very weak, and an index n of a one-dimensional array is simply the offset of the element accessed from the address of the first (or "zeroth") element (scaled by the size of the element).

So basically it's because index math is easier given a zero lower bound.

52 accepted

Specifically to annoy you.


How often do you really have to write myArray.length()-1? I find it very rare that I need to do that.

If you think that we all naturally start everything at 1, ask yourself:

  • What's the first minute of each hour?
  • How old are we during our first year of life?

There are counterexamples to this, of course (day of month, month of year, all kinds of stuff) - my point is it's not as obvious as natural as you seem to think. By contrast, in mathematical and computing terms it's absolutely natural to use 0 for all the reasons given in other answers.


This is nearly a duplicate of this question: http://stackoverflow.com/questions/393462/defend-zero-based-arrays

There are some good answers and links to other articles there. Specifically this one called "Why numbering should start at zero" by Dijkstra.


I'm not an expert but I believe in the old days, arrays were stored in contiguous memory blocks, and the array index was also the offset to find a particular element. So if you add 0 bytes to the array's memory location, you get the first element. Adding 1 (byte, word, dword, etc.) gets the memory location for the 2nd element, and so on.


It is because an array is essentially a pointer to a memory location. Therefore, if the array actually contains a pointer to memory_location_1, the array + 0 bytes will point to the first element in the array.

Furthermore, if the array is a 32-bit integer array, array + 1 * 4-bytes ( array[1] ) will point to memory_location_1 + 4 bytes, thus landing you at the second element in the array.

My response is very breif and presumably difficult to understand if you're new to programming. The following tut is a bit more indepth, but it may require some general knowledge of C++: http://www.cplusplus.com/doc/tutorial/arrays.html


I disagree that it's not logical...

it makes constructs like

for (int i = 0; i < array.Length; i++)

prettier than the alternative

for (int i = 1; i <= array.Length; i++)

Also, in the old days of C (and C++)

*(array + offset) is equivalent to array[offset] for zero-based arrays.

for one-based arrays, this would be

*(array + offset) -> array[offset + 1]


Because each byte consists of a series of 0 and 1's. The lowest (non negative) number representable is 0. So why waste a perfectly valid number.

But besides that, there are languages that let you start an array at 1, or 13 or 42, whichever you like, It really does not make a difference.

The problem is that it is commonly accepted that we start counting at 0.


Obligatory Quote:

Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.
- Stan Kelly-Bootle


Simple answer, the index value is "how from do I move from the start?"
(See pointer arithmetic)

The first item is at the start, you don't need to move...
The second item is one further onwards...
etc, etc


The reason is that array access (like a[i]) can simply add i*sizeof(element) to the starting address. It's (slightly) faster than having to adjust it by 1 first.


Maybe because 0 is the first number on a non-negative number line. When you count something you start at 0 you just don't say it out loud despite what you seem to be claiming the system starts at 0 you just don't use it.


Because if they started at 1, then the variable storing the array index would never be zero. This is wasteful.


I would typically answer this with the "offset from an index in memory" response but I'm pretty sure that's not what you're looking for.

How about this speculation? Natural numbers is the set of non-negative numbers (especially in math/CS, some disciplines define them as the positive integers) -- {0, 1, 2, ...} -- so the math geniuses defining the "older" languages stuck with what they knew.


It has nothing to do with performance. If arrays started at index 1 (which they do in some languages), the compiler could just shift the address of the array down by one element. In fact, you could do this yourself in C or C++ (it's not technically defined behavior, and not recommended though):

int zero_array[10];
int *one_array = zero_array - 1; // This is the address a 1-based language would use

assert(&zero_array[0] == &one_array[1]);

I prefer arrays that start with zero, because it's more convenient. Generally, ranges are easier to work with when they are half-open, i.e. start <= index < end.


I don't know the exact answer, but to me it makes sense because unsigned integers start at 0. So why waste a number? It also makes the underlying memory calculations easier.


If this really bothers you, use Pascal. Arrays can start with whatever you want. The Wikipedia link I gave shows an example of an array starting with 1.


The array index is the number of memory addresses that "cell" of the array is offset from the start of the array.

To get an element out of an array, all you need is the start location and the number of "spaces" to move - the index.

So, array[5] is five steps past the start of the array, or element 6, and array[0] is just the start of the array.

And yes, modern languages all really work like this. Start indexes other than one are just syntactic sugar. (Snazzy syntactic sugar, though.)


Because everything that I know would collapse if indices didnt start at zero.

I've been doing them that way for over 20 years, changing them now would set me back irreperably.


Because arrays in C are pointer arithmetic. myArray[0] indicates the start of the array, as does &myArray.

Also see this answer


It goes way back to the roots. If you think you hate that, take a look at this, because that's how we used to do it in the old days: http://en.wikipedia.org/wiki/Assembly_language#Assembler

It's also where the "0 based arrays" come from. It's not all that bad, really. It takes getting used to but it is by far not the most irritating of all language features.

And yes, you are right, it can easily be 1 based, the compiler can solve that for you because a computer language is just a "presentation layer" on top of some machine-code generating process.


I agree with the answers of CodeMonkey1, Gamecat and mwigdahl

However I'd like to add that not all languages start at zero, Ada starts at 1. And note this behaviour is not just for arrays but also enums in Ada start at 1 but languages like C start their enums at 0 by default


Zero-indexing is more economical due to the way numbers are represented in binary digital computers. If you have a byte available for storing an index, you can store 256 indices using the range 0-255 but only 255 indices using 1-255.

As far as high-level programming languages go, I program about equally much in 0-based and 1-based languages and I can't say there's an advantage to either. I'm actually probably somewhat less prone to commit off-by-1 errors when working with 1-based indices, but that's by a small margin.


VB also uses 1-based arrays.

This is a good example of why VB is an even more modern language than C#. The language designers really improved the usability of the language and dumped a lot of the ancient legacy cruft that C# continues.

I think the compiler (and IDE) should work for me rather than me work for the compiler.