Skip to content

Glossary of Common Terms

This glossary is adapted from a variety of sources such as the open data handbook and data.gov. We will curate this glossary over time. 


API

Application Programming Interface (API) is a set of instructions and standards used by an application program to communicate with the operating system or some other control program such as a database management system. 

 

Application

An application or app is a software program that is designed to connect to large databases and often provides real-time information on a computer, mobile phone, and other similar platforms. 

 

Attribution

Acknowledging the source of data when consuming or re-publishing it. Data licenses may include this requirement when publishing open data. 

 

Beta

A software platform released in beta is a preview of a website's full functionality. While in beta, development and data curation will continue to optimize usability. 

 

bulk

If an entire dataset can be downloaded easily and efficiently to a user's own system than it is considered to be available in bulk.

 

CKAN

The Comprehensive Knowledge Archive Network (CKAN) is an open source data management system for storage and distribution of data across the web, built and maintained by Open Knowledge. It serves as the official data publishing platform for about 20 national governments and powers the data publishing efforts of a variety of local, community, and scientific organizations. For more information click here.

 

Copyright

A right for the creators of creative works to restrict others’ use of those works. A copyright holder is entitled to determine how others may use or restrict use of that work through a license. 

 

CSV

Comma separated values (CSV) file is a standard open format that is commonly used to publish open data in spreadsheet like formats. CSV's are commonly opened with applications such as Microsoft Excel.

 

DATA

Data can be thought of as facts, statistics, or other values systematically collected for reference, analysis, and calculation. Data becomes informative when it's structured, often combined with other data, and analyzed to extract meaning. 

 

DATA catalog

An authoritative listing of available open data organized in a manner that makes it easy to search and navigate. 

 

DATASET

An organized collection of data commonly presented in a spreadsheet (where columns represent variables and each row contains values for those variables) or in a form of a map. 

 

File format

The format of a file is associated with the last part of the file name or extension. For example, a CSV file could be called totalcars.csv. Other types of file formats found on the Boston Open Data Hub are KML, JSON, and GEOJSON.

 

GEOJSON

GeoJSON is an open standard designed to represent geographical features (and non-spatial features) based on JavaScript Object Notation (JSON). This format was written and is maintained by an Internet working group of developers. 

 

GEOSPATIAL 

Geospatial refers to data that has a geographic component to it such as coordinates, address, city, or zip code.

 

Hackathon

A social event that brings together programmers, subject experts, and advocates to share information and work together to build applications, visualizations, or prototypes to often address an issue or a set of inter-related issues. 

 

Html

Hyper Text Markup Language (HTML) is a language that describes the skeletal structure of a webpage. Internet browsers reference HTML to render the contents of a webpage to a user. 

 

JSON

JavaScript Object Notation (JSON) is a lightweight data-interchange format. It is a text format that is completely language independent but uses conventions similar to the C, C++ and C# family of languages. 

 

KML

Keyhole Markup Language (KML) is a XML-based language for managing the display of three dimensional data on applications such as Google Earth. KML was developed for use with Google Earth and is accepted as an Open Geospatial Consortium Standard. 

 

Legacy

A system that will be superseded by another platform that is more up-to-date. 

 

Licence

The licence that accompanies the publication of a dataset to convey how a user can use or reference the data. Boston open datasets are published with the Open Data Commons Public Dedication and License (PDDL).

 

Machine Readable 

Information or data that is in a format that can be easily read or processed by a computer without human intervention. Machine readable data must be structured data and are often found in CSV, JSON, and XML file formats. 

 

Metadata

Provides descriptive information about data to give it context. Descriptive elements such as title, description, publisher, and license information are important to the discovery and usability of the data that is published. 

 

Open Data

Open data is the proactive release of government collected data that is made publicly available through an open license to enable citizens to freely access, reuse, and redistribute. From a technical perspective, open data is available in machine readable file formats and allows users to download data related to government operations or service delivery in bulk.

 

Open Government

Open government is a governing principle that serves to support transparency, collaboration, and engagement with the public by implementing policies and utilizing technology to emphasize the sharing of government information. 

 

OPEN source

Open source software is freely available to the public. Users are free to inspect it, modify it, and use the code for their own purposes. 

 

Public domain

No copyright exists over the work and users can utilize available data for their own purposes without restrictions. 

 

PUBLIC record

The Massachusetts General Laws broadly define public records to include all "books, papers, maps, photographs, recorded tapes, financial statements, statistical tabulations, or other documentary materials or data, regardless of physical form or characteristics, made or received by any officer or employee" of any Massachusetts governmental entity. Click here for more information.

 

Structured data

Structured data refers to data where structural relationships between elements are retained and stored on a computer disk. PDFs and word processing documents are not structured forms of data because the logical structure cannot (or is nearly impossible) to extract automatically. 

 

Tags

Keywords that help users discover datasets of interest.

 

ToPICS

An organizational framework to group like datasets together to provide context and meaning to a user.

 

URL

Uniform resource locator (URL) when used with HTTP is a character string or web address that references a web page. 

 

XML

Extensible Markup Language (XML) defines rules or standards for encoding content in a format that is easily readable for both human and machine. XML can be used by any individual or group that wants to share information in a consistent way. 

 

zip

A computer file whose contents are compressed to facilitate storage or transmission. It often carries the .zip file extension.