Data Has Continuous Attributes Which Will Be Skipped
Loading your Data¶
Orange comes with its own data format, but can also handle native Excel, comma- or tab-delimited data files. The input data set is usually a table, with data instances (samples) in rows and data attributes in columns. Attributes can be of different types (numeric, categorical, datetime, and text) and have assigned roles (input features, meta attributes, and class). Data attribute type and role can be provided in the data table header. They can also be changed in the File widget, while data role can also be modified with Select Columns widget.
In a Nutshell¶
-
Orange can import any comma- or tab-delimited data file, or Excel's native files or Google Sheets document. Use File widget to load the data and, if needed, define the class and meta attributes.
-
Types and roles can be set in the File widget.
-
Attribute names in the column header can be preceded with a label followed by a hash. Use c for class and m for meta attribute, i to ignore a column, w for weights column, and C, D, T, S for continuous, discrete, time, and string attribute types. Examples: C#mph, mS#name, i#dummy.
-
An alternative to the hash notation is Orange's native format with three header rows: the first with attribute names, the second specifying the type (continuous, discrete, time, or string), and the third proving information on the attribute role (class, meta, weight or ignore).
Data from Excel¶
Here is an example dataset (sample.xlsx) as entered in Excel:
The file contains a header row, eight data instances (rows) and seven data attributes (columns). Empty cells in the table denote missing data entries. Rows represent genes; their function (class) is provided in the first column and their name in the second. The remaining columns store measurements that characterize each gene. With this data, we could, say, develop a classifier that would predict gene function from its characteristic measurements.
Let us start with a simple workflow that reads the data and displays it in a table:
To load the data, open the File widget (double click on the icon of the widget), click on the file browser icon ("…") and locate the downloaded file (called sample.xlsx) on your disk:
Select Columns: Setting the Attribute Role¶
Another way to set the data role is to feed the data to the Select Columns widget:
Opening Select Columns reveals Orange's classification of attributes. We would like all of our continuous attributes to be data features, gene function to be our target variable and gene names considered as meta attributes. We can obtain this by dragging the attribute names around the boxes in Select Columns:
To correctly reassign attribute types, drag attribute named function to a Class box, and attribute named gene to a Meta Attribute box. The Select Columns widget should now look like this:
Change of attribute types in Select Columns widget should be confirmed by clicking the Apply button. The data from this widget is fed into Data Table that now renders the data just the way we intended:
We could also define the domain for this dataset in a different way. Say, we could make the dataset ready for regression, and use heat 0 as a continuous class variable, keep gene function and name as meta variables, and remove heat 10 and heat 20 from the dataset:
By setting the attributes as above, the rendering of the data in the Data Table widget gives the following output:
Data from Google Sheets¶
Orange can read data from Google Sheets, as long as it conforms to the data presentation rules we have presented above. In Google Sheets, copy the shareable link (Share button, then Get shareable link) and paste it in the Data File / URL box of the File widget. For a taste, here's one such link you can use: http://bit.ly/1J12Tdp, and the way we have entered it in the File widget:
Data from LibreOffice¶
If you are using LibreOffice, simply save your files in Excel (.xlsx) format (available from the drop-down menu under Save As Type).
Datetime Format¶
To avoid ambiguity, Orange supports date and/or time formatted in one of the ISO 8601 formats. For example, the following values are all valid:
2016 2016 - 12 - 27 2016 - 12 - 27 14 : 20 : 51 16 : 20
Source: https://orange3.readthedocs.io/projects/orange-visual-programming/en/latest/loading-your-data/index.html
0 Response to "Data Has Continuous Attributes Which Will Be Skipped"
Post a Comment