I am wondering if there is some sort of “standard” for storing US addresses in a database? It seems this is a common task, and there should be some sort of a standard.
What I am looking for is a specific schema of how the database tables should work and interact, already in third normal form, including data types (MySQL). A good UML document would work.
Maybe I’m just being lazy, but this is a very common task, and I am sure someone has published an efficient way to do this somewhere. I just don’t know where to look and Google isn’t helping. Please point me to the resource. Thanks.
Although this is more of a general question, I would like to clarify my specific needs.
Addresses will be used to specify road addresses of locations of events. These addresses will need to be in a format that can be best broken down and searched, and also used by any third-party applications I may end up linking my data source to.
ALSO. Data will be geo-coded (long, lat) on entry and stored separately, so it must fit the (yet undecided) protocol of whatever geocoder / application / library does that.
http://www.upu.int has the format standards for international addresses. Publication 28 at http://usps.com has the U.S. format standards. CASS software like http://semaphorecorp.com validates and standardizes U.S. addresses.
The USPS wants the following unpunctuated address components concatenated on a single line:
* house number * predirectional (N, SE, etc) * street * suffix (AVE, BLVD, etc) * postdirectional (SW, E, etc) * unit (APT, STE, etc) * apartment/suite number
Eg, 102 N MAIN ST SE APT B.
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it’s a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it’s a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won’t want on the address line).
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it’s common for each address to have a different ZIP code.
It’s quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
* name prefix (DR, MS, etc) * first name and initial * last name * name suffix (III, PHD, etc) * mail stop * company name * address (one line only per Pub 28 for USA) * city * state/province * ZIP/postal code * country
I typically print mail stops somewhere between the person’s name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.
First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street
are the same address, the database program will have a challenge.
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
- Address 1
- Address 2
- Address 3
- Zip+4 (I would carry this so lookups are easier when checking for duplicates)
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.
Addresses are messy – at best.
It partly depends on what you want to do with the addresses. If you’re going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you’re going to analyze the address, you have to work a lot harder.
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.
I looked into this a while ago, but for international addresses. I didn’t find much in the way of a consensus. However, for the US, I found the succinctly named United States Thoroughfare, Landmark, and Postal Address Data Standard (Draft):
I don’t think that they actually provide any specific database schema ideas, but it might be a good starting point.
First, the “best” means of storing an address depends greatly on how it will be used. Is it just for reference or searches on say city? Do you plan on addressing envelopes? Are you going to integrate with a shipping system like FedEx or UPS? Will you store non-US addresses? Once you get into the realm of integrating with something that ships, you should start looking at CASS. This is a specification for handling the USPS addresses. There are applications out there that are CASS certified which will store and verify addresses. Thus, the second best practice would be to try to avoid reinventing the wheel and see if there is a system out there that will solve your problem especially if you are going to go international. You want to leverage the fact that someone else has worked out all the details about how to properly and efficiently store addresses for many countries around the world instead of having to do that investigation yourself.
I’ve had to try to do this before and I’d found this document that gives you some pointers. I ended up shelving my schema since my application does have to deal with international addresses.