The Wide World of Coding: The People and Careers behind the Programs - page 8

Companies such as Amazon, which have hundreds of millions of customers, take
collecting and storing information seriously. Amazon tracks virtually every customer
behavior: searches, items viewed, items purchased, time spent considering each item,
payment methods, shipping decisions, returns, reviews, and even every word highlighted
in a Kindle book.
Amazon could arrange all of that data in one massive spreadsheet, adding a
new line to the table for each item a customer views. This sample table shows just a
few of the columns a spreadsheet would need to track customer behavior. The real
spreadsheet would need thousands of columns and would be trillions of lines long.
First
Name
Last
Name
Birthdate Address
Phone
Date
Item ID Time Viewed Qty. Payment Method
Jane Doe 07/01/2011 123 Oak St. 123-456-7890 10/23/19 125730 12:30 p.m.
1 Visa ending in 1234
Jane Doe 07/01/2011 123 Oak St. 123-456-7890 10/23/19 257103 12:32 p.m.
4 Visa ending in 1234
Jane Doe 07/01/2011 123 Oak St. 123-456-7890 10/24/19 967234 12:40 p.m.
0
Juan Garcia 01/01/1972 321 Elm Rd. 987-654-3210 11/04/18 190375 7:14 a.m.
1 Gift card 5678
Juan Garcia 01/01/1972 321 Elm Rd. 987-654-3210 11/04/18 089264 7:17 a.m.
3 Gift card 5678
Juan Garcia 01/01/1972 321 Elm Rd. 987-654-3210 11/04/18 375103 7:23 a.m.
1 Visa ending in 9999
Over the years, an Amazon customer may view thousands of items. Listing their
name, address, and phone number on each of those thousands of rows would waste a lot
of storage space. Repeating that information also makes updates and corrections difficult.
What if Jane Doe was born in 2001 but accidentally entered 2011 when she made her
first purchase? If Amazon stored all of its data in one giant spreadsheet, it would have to
correct her birthdate in a separate row for every item she had ever viewed. That correction
would be time-consuming and could easily miss some of the thousands of times the wrong
date appeared. Amazon would end up with an enormous, mistake-filled spreadsheet.
To prevent that problem, developers design databases to avoid repeated
information. Instead of creating one giant spreadsheet, they create several separate
tables, such as a customer table, an order table, and a shipment table. Jane Doe’s
identifying information would only be stored once, in a single row in the customer
table, so correcting her birthdate would only require one change. Databases that
minimize repeated information are called normalized databases. They are more
complicated than a single spreadsheet but require less storage space and make
updates easy.
58
THE WIDE WORLD OF CODING
1,2,3,4,5,6,7 9,10,11,12,13,14
Powered by FlippingBook