Creating Datasets and Resources¶
DKAN’s data publishing model is based on the concept of datasets and resources. A dataset is a container for one or more resources; a resource is the actual “data” being published, such as a CSV table, a GeoJSON data file, or a TIFF aerial image. The dataset and resource content types in DKAN are provided by the DKAN Dataset module.
In our example, we’ll be adding a dataset with Wisconsin polling places to a DKAN site. The data may look familiar; it’s one of the sample datasets provided with DKAN upon installation.
Step 1: Create the Dataset¶
By default, only authenticated (“logged-in”) users can add new Datasets and Resources to a DKAN website.The default DKAN user permissions allows Site managers, Editors, and Content Creators access to the administration menu. From here a user may navigate to the Content » Add Content » Dataset link to access the “Create Dataset” form.
The Dataset is the container for the actual data resource files and contains basic information about the data, such as title, description, category tags, and license. Once we’ve entered information about the data, we can click the “Next: Add data” button to begin adding data.
Step 2: Add one or more Resources to the Dataset¶
After creating a dataset, we’re prompted to add one or more data resources to it. There are three types of Resources that can be added to a Dataset, depending on the type and location of the Resource:
|Upload:||This option allows publishers to upload data files to the DKAN site. As in the “link to a file” option, the data within the file will be imported into your DKAN site’s Datastore for preview and analysis by your users. See The DKAN Datastore for more information.|
|API or Website URL:|
|Some data resources aren’t standalone files but queryable online databases; the interface to these databases is known as an API. Adding links to these types of online database interfaces to your DKAN data catalog can be very useful for developers interested in working with your data.|
|Remote file:||This option allows publishers to create a link to a data file published on another Internet website. Although the file itself will remain on the other site, the data within the file can be imported into your DKAN site’s Datastore for preview and analysis by your users. See Datastore for more information.|
To provide previews for your resources, they must contain either a local or remote file (Link to a file or Upload a file). If you use API or Website URL your link will be displayed in an iFrame but not further previewing will be possible.
To continue with our Wisconsin Polling Places example, we’ll add one resource file to the Dataset we created in Step 1. Our resource file is a CSV, that is, comma-separated values format; this is a popular file format for exchanging tabular data. Let’s explore the example resource shown here and the various fields within:
|Resource / Choose File:|
upload a file from your local hard drive.
|Resource / Recline Views:|
DKAN’s “Data Preview” feature allows visitors to preview published data in three views:
this is the title of the individual data file, not the parent dataset container.
a rich-text editor field is provided so publishers can offer detailed and useful descriptions
entering the file format here will allow users the ability to search for data by specific format
this is the parent dataset container; this field should already be populated if you’re adding a Resource subsequent to adding a Dataset
At the bottom of the Add Resource page, we can choose:
|Save:||Save progress on this resource and immediately return to it for further editing|
|Save and add another:|
|Save this resource and add another resource to the same dataset|
|Next: Additional Info:|
|Save this resource and move to the third stage in adding a complete dataset, entering optional metadata about the dataset|
In our example, we’re only adding a single resource, so we’ll click “Next: Additional Info” to move onto Step 3. If we had more than one resource to add to this dataset, we would choose the “Save and add another” option. Simply clicking “Save” would end the Dataset creation process and save the dataset, for now, with no additional metadata.
Step 3: Adding Metadata to a Dataset¶
We now come to a third form which allows us to add additional metadata to the dataset. All these fields are optional, but provide valuable information about your dataset to both human visitors to the website and machines discovering your dataset through one of DKAN’s public APIs.
Let’s take a closer look at some of the metadata fields available on this form:
|Author:||The Dataset’s author, in plain text.|
|Spatial / Geographical Coverage Area:|
|Lets us define what region the data applies to. In this case, the US State of Wisconsin. You can use the map widget to draw an outline around the state borders, or, click the “Add data manually” button if you already have a GeoJSON string you can paste in.|
|Spatial / Geographical Coverage Location:|
|The region the data applies to, written in plain text. This can be used instead of or in addition to the Coverage Area field.|
|Frequency:||How often is this dataset updated? We might expect our list of polling places to be updated every year, so we could select “annually.” However, often we don’t expect the data to be updated (even in this case, perhaps we plan to post the next version of the data as a separate dataset), in which case we can leave this blank.|
|Like Geographic Coverage, this field lets us give some context to the data, but now for the relevant time period. Here we could enter the year or years for which our polling places data is accurate.|
|Granularity:||This is a somewhat open-ended metadata field that lets you describe the granularity or accuracy of your data. For instance: “Year”. Note, this field is depreciated in DCAT and Project Open Data, and may be removed from DKAN.|
|This should be a URL to a resource that provides some sort of description that helps understanding the data. See Project Open Data data dictionary for more info.|
|Lets us arbitrarily define other metadata fields. See Additional Info field for more information.|
|Resources:||This field is a reference to the resources you have already added.|
After you click “Save”, the metadata we enter will appear on the page for this Dataset: