Access the Workbench from the Workspace detail page by clicking the button or double click on any of the dataset nodes in your Workspace's Flow area. To go back to the Workspace details page, click the button or the Workspace's name at the very top of the screen in the navigation breadcrumbs.
Create new dataset#
From the Workbench you can create new datasets from the data in any of the references or other datasets in your Workspace. See "What is a reference" for details.
- Click on an item in the Flow area to make it your active item. This can be a reference to data elsewhere in Spotlight or another dataset you have already created in this Workspace. Just make sure it has data you want to include in your new dataset.
- Your active item will be colored green and have have a button next to it. Clicking the green button will feed the data from your active item into a newly created dataset in this Workspace. You can also right-click an item in the flow and select the "Create Dataset" option from the context menu there.
- Your new dataset is ready! It can be edited with operations in the asset toolbox, used elsewhere in Spotlight, or opened in external tools for further analysis.
Your newly created dataset will become your active item and you will see an arrow from the flow item that is feeding data to your dataset. Spotlight names new datasets using the name of this parent item followed by a number starting at 2 and increasing by 1 for each additional dataset created from the same parent (see figure in Flow area below).
Add or edit operations#
The Workbench shows how data moves between the different references and datasets in your Workspace. This is the the same Flow area as on the Workspace overview page.
Click on any item in the Flow area to make it your active item. Your active item will be highlighted in green and have a green button next to it enabling you to feed the data from it into a newly created dataset in this Workspace. All items in the Flow that feed data into or pull data from your active item are colored dark gray. Right-click on any item to open its context menu.
Use the buttons in the corner of the Flow to zoom in/out or scale the area to fit. Move your view area by clicking anywhere in the Flow and dragging in any direction.
The data sample includes the first 1000 rows of data as well as some column metrics that Spotlight has calculated for the sample (see "column metrics" below). Scroll the sample up and down or left and right to view all columns. Click the button at the bottom of the flow area and drag up or down to resize the sample area.
Column metrics provide a graphic summary of the data schema in a dataset. They are shown by default, but can be hidden by clicking the button to the left of the column metrics. If a dataset is close to the system size limit or if system resources are being heavily used, Spotlight may be temporarily unable to generate the column metrics display.
Click on the "more..." link to open a larger window displaying see all values for columns with cardinality ("Uniques") of less than 20. For dates and number values, the histogram will show either start and end dates or minimum, average, and maximum values respectively. Hover over the horizontal indicator on each column to see the total number of records in the dataset ("valid"), the number of "empty" records, and the number of "unique" record values. Note that when cardinality is greater than 90% unique, no chart will be displayed.
The initial display reflects the data sample, but you can right-click in the column metrics area and select "Load Full Metrics" from the context menu to update the display to reflect the full record set. Note that if the dataset is small or the confidence score is 100%, the full and sample graphics will be the same.
If your active item was created in this Workspace then it is a dataset and you can edit and configure it in the asset toolbox. Everything else in the Flow is a reference and cannot be modified here. See "What is a reference" for details.
Rename your dataset by clicking the name in the top of the asset toolbox and entering the new name in the field that appears.
Below the dataset's name you will see information about where the data for this dataset ultimately originated as well as a list of any downstream datasets in this Workspace.
All operations applied to create the dataset will show at the bottom of the toolbox, underneath an button for adding additional operations. Click on any operation to view or edit its details. Change the active dataset's name by clicking on it in the toolbox area.
Configure caching for the active dataset underneath its name. See "Datasets: Cache" for details