Datasets are the data assets in Neebo. They enable you to preview your data, describe it, cache it inside Neebo, discuss it with colleagues, and they can be opened in external tools for further analysis. Datasets can be added to Neebo or created inside a Workspace's Workbench.
Added datasets are tables on external data systems and uploaded files that are in one of the supported data file formats (CSV, JSON, XLS, or Parquet). Neebo creates virtual copies of all this added data so that the original is never edited or deleted. Only datasets you have created inside Neebo can have their content edited or deleted.
Created datasets are built in Neebo Workspaces using the Workbench tool to combine, filter, and otherwise transform the data from added datasets (or existing created ones) into the new forms you need for analysis. Created datasets can only have their contents modified inside the original Workbench where they were created.
If you cannot find the data you need in Neebo, you can either add it from outside or create a new dataset by combining, filtering, and transforming data already in Neebo.
Add datasets to Neebo#
Add to Neebo and Connect describe the various ways to add data to Neebo. See Workspaces: References for details on how to reference these datasets in a Workspace so you can combine, filter, and otherwise transform that data by creating new datasets.
Create datasets in Neebo#
Creating datasets gives you the ability to pull data from Neebo datasets (whether added or created) and apply operations that transform that data to fit your project's needs.
The first step in creating a new dataset is gathering references to all the data and other analytic assets for your project inside a Workspace. Data can be in existing Neebo datasets, you can connect new external data systems, or you can upload files directly to Neebo (see Workspaces: References).
Once everything you need is in a Workspace, you can use the attached Workbench tool to transform the data you have gathered into the form that your project needs. In your Workspace, click the button then follow the step-by-step instructions in "Workbench: Create new dataset".
Buttons at the top of the dataset details page let you:
- add the dataset to one or more other Workspaces
- context menu (includes Delete option)
- opens the dataset in the Workbench where it was created
- configure how this data file should be processed into a dataset. This includes specifying the field delimiter, new line delimiter, quoting character, and whether the data contains a header row.
The dataset's owner can add a description to each column in a dataset. This enables you to produce a basic data dictionary directly integrated with your dataset and discoverable through the find tool.
From the dataset detail page, click the button at the top of the Table Preview area.
This will open the Column view of your dataset. Click on the Description field next to any column to enter information about that column. Column descriptions can be up to 400 characters long. Changes are saved automatically as you type.
To switch back to the Table Preview view of your dataset, click the button at the top of the Column view.
Comments appear on the right side of the dataset details page with the most recent comments at the top. Any user can comment on a dataset or reply to existing comments. Users following the dataset will receive a notification in their Activities panel for new comments and replies.
Add a new comment by clicking on the button on the top right of the Comment section.
Delete a comment, including any replies, by clicking the dark conversations menu in the top right of the comment box.
Reply to an existing comment by clicking on it and typing in the new comment entry box that will appear underneath the comment. You can reply to comments from the dataset detail page and also from the notification of a comment in your Activities panel.
Notifications of new comments go to anyone following the dataset. Owners follow datasets by default.
Note: notifications are tied to the individual asset, so comments on a dataset will not be sent to users following Workspaces where that dataset was created or referenced.
Make a copy inside Neebo of a dataset's contents. This can be useful to reduce access time or spread system load on connected data systems (databases, S3, etc) and to create a snapshot of volatile data. If a dataset in Neebo normally requires authenticating with connected data systems, the cached version will also require authenticating against those systems before you can access it.
The Cache section of your dataset's detail page shows the current status of caching for the dataset. You can control whether the dataset is cached and optionally schedule when to refresh the cached copy. Uploaded files cannot be cached.
The available cache statuses are:
- "Disabled" - (default) - no cache is active for this dataset
- "Queued" - Neebo is in the process of caching this dataset
- "Available" - a cached version of the dataset is available and in use
- "Outdated" - there has been a structural modification to the dataset since the last time it was cached, rendering the cache unusable even though it is available. Follow the "Refresh" steps below to make your cache available again.
Click on the status to open the cache configuration dialog. Inside the dialog:
- Cache a dataset by clicking the button. This will automatically toggle the "Use Cached Data" switch in that same dialogue.
- Refresh an existing cached copy of a dataset by clicking the button. Toggle the "Enable Scheduler" switch in that same dialogue to have Neebo refresh the cached copy on a schedule you set.
- Disable caching for a dataset by switching off the "Use Cached Data" switch in the dialogue that opens. Make sure to also switch off the scheduler in that dialogue if you will no longer need to have the cached copy updated.
For datasets created in a Neebo Workbench, you can access this same dialog by selecting the dataset in your Flow area and clicking the or buttons in the asset toolbox.
Neebo has a default maximum dataset size of 3 million records for datasets that applies both when adding new tables and when a dataset grows as a result of an operation.
Datasets that exceed the record count limit will still appear in Neebo search results but cannot be referenced in Workspaces or opened in external tools through Neebo. These datasets will be flagged with an alert on search cards and on their detail pages.
If a dataset grows beyond the record count limit, it will stop feeding data to any downstream datasets that have been built from it in Neebo.
The dataset detail page displays user and machine-created metadata to help you better understand where a dataset comes from, what it contains, and how it is being used in Neebo.
Displays the dates on which the dataset was created and last modified. A dataset is considered to have been modified whenever the title or description is edited. The last modified date is used to sort some search results.
The ten row table can be scrolled horizontally or expanded with the button to view the dataset's full 1000 row data preview.
Names can be up to 100 characters, cannot start with an (_) character or a space, and cannot contain a (`) character.
400-characters to provide as much description as possible about the dataset. The more a dataset is documented, the more useful it becomes for collaboration.
Tags help quickly identify, search, and categorize assets in Neebo. Add some here to make this dataset easier to find and easier to tell apart from the rest of your related work. A tag can be up to 80 characters (spaces are not allowed). Clicking on a tag launches a search for all assets with that tag in Neebo.
Shows all Workspaces that reference the dataset. If the dataset is referenced from multiple Workspaces, a clickable number will appear in this section. Click it to open the full list of Workspaces referencing this dataset.
Indicates how this data got into Neebo. For data that lives on a connected external system, you will see the type of system. For uploaded files, you will see which user originally uploaded the file to Neebo. For datasets created in a Neebo Workspace, you will see a list of all the upstream datasets that feed information into this dataset.
Shows the file type for this dataset, based on the file's media type (previously MIME type). Only shown for file uploads.
Shows the number of users following the dataset and is also a toggle button so that you can follow/un-follow the dataset.
Displays the initials of the five users who most frequently open or edit the dataset. Hover over these initials to see each user's full name.
This section uses graphics to indicate the number of times a dataset is opened or edited directly via Neebo or an external tool (JDBC or ODBC queries). Queries by all Neebo users are included in this count. Click the Direct square to toggle histogram display of dataset queries.
Click the Update square to toggle a histogram display of the number of changes to the dataset operations stack and metadata. You can hover over a histogram bar to see the per-day count. The timeline pulldown lets you set histogram display for the last 7, 30, or 90 days.
Commonly Used Together#
Below the preview table is a section showing other datasets that this dataset has been blended with in Neebo.
Delete a dataset#
Only datasets created in a Neebo Workbench can be deleted and only if they are not referenced in any other Workspaces. Datasets representing external database tables or uploaded files cannot be edited or deleted in Neebo.
To completely remove a dataset from Neebo:
- Open the dataset's detail page
- Click the button
- Choose the "Delete" option
- Confirm that you want to delete the dataset in the warning message that opens
- Your dataset has been deleted from Neebo
An error will inform you if the dataset you are trying to delete is still referenced in other Workspaces.