Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published for DOCU-2414, 0.26.0.

Because Files in CSV, TSV (column delimited) XLS, JSON, and Parquet formats become datasets in Neebo. Technically, because all data in Neebo is virtualized, every data asset in Neebo is technically a "dataset" - a reference to another data asset. A dataset is Dataspace specific - it can only exist inside one Dataspace.

However, a central principle in Neebo is that when data is first added, to either the tool or to a Dataspace, it is a non-editable data source. A data sources is a specific type of dataset that represents the external data source (like a table in a database) and can be referenced from multiple Dataspaces. Data source icons differ according to their connection type (S3, PostgreSQL, etc.).

Only when you create a reference to the data source does it become editable, as a dataset. Dataset icons are the same regardless of their data type. Any Neebo dataset can in turn be used as a data source in another Dataspace, in which case a new copy is created that does not reference or reflect the content of any other datasets.

Once a dataset is added to a Dataspace (see Add Asset-deprecated page) it it can be combined and enriched. viewed and interacted with inside a Dataspace's Workbench.

Creating datasets

To create a dataset, right-click on the desired data source or dataset in the Flow area of the Workbench and choose "Create Reference" from the context menu. The new dataset will be highlighted, and you will see arrows that indicate the upstream and downstream reference connections. You can create an unlimited number of datasets from any given data source or dataset. 

...

Neebo default is to name downstream datasets by appending hierarchical, sequential numbering where the original is the implicit number 1. This numbering is helpful for tracking data lineage, as shown below. Dataset "Blue Eyes 2" is created directly from the data source "Blue Eyes". The datasets created from "Blue Eyes 2" are named "Blue Eyes 2 2," "Blue Eyes 2 3," and "Blue Eyes 2 4." The dataset created from "Blue Eyes 2 4" is in turn named "Blue Eyes 2 4 2".

Adding datasets

The Add Asset-deprecated page describes Assets and Connect pages describe the various ways to add a dataset data to a Dataspace. The simplest method is the "Add" button on the Dataspace details page. Alternately, you can use the context menu on one of the home page search result cards. You can also use a dataset link in the Activities panel to open a dataset's Detail page, where you can use the "Add to Dataspace" button.

Note that when a dataset used in one Dataspace is added to another, it is added as a copy that does not have upstream dependencies. As always, you will be required to authenticate that copy against the data source. 

Neebo. See the Dataspaces page for details concerning adding an asset to a Dataspace.

Deleting and removing datasets

...

When a dataset is deleted, it no longer exists in Neebo. From the dataset's details page, use the rightmost Image Modified button and choose "Delete." If the dataset has no downstream dependencies, you will be prompted to confirm the action. If the dataset is referenced in one or more other Dataspaces, it must be removed from those Dataspaces in order to be deleted. Note that a specific dataset reference is being removed from a Dataspace. A data source cannot be deleted.

...

There is also a five row, horizontally scrollable preview table that can be expanded to scroll vertically through all rows with the button"View in Fullscreen" button. You can add Comments to a dataset at any time by clicking on the Image Removed Image Added "plus" button on that panel. 

The breadcrumb path in the header shows the Dataspace name, then (separated with a carat >) the dataset name, so you can identify the specific dataset you are viewing.

The Details page also provides a set of button buttons that allow you to add a dataset to one or more other dataspacesDataspaces, open the dataset in the Workbench for the current Dataspace, or to delete the dataset .

...