Modeling CSV data by using DFDL
View Details

IBM Knowledge Center

Learn how to model CSV data by using Data Format Description Language (DFDL).
Back To Gallery

Import projects

Click Import and a shared library is imported into your workspace.

Library CommaSeparatedValues contains an example CSV file in 'Other resources'. Open the file and look at the records. The first record is a header, and there are five fields in each record: last name, first name, middle name, address, and date of birth. Note that address field has double quotation marks so that the comma in the address is not treated as a delimiter.

You will now create a DFDL schema to model the example CSV file, by using the New Message Model wizard.

Tip: Ensure that steps 1 to 6 below are all visible in the Tutorial Steps View view before proceeding.

  1. Click New... in the Application Development view, and select Message Model.... The New Message Model wizard opens.
  2. Select CSV text and click Next.
  3. Select the option Create a DFDL schema file using this wizard to guide you and then click Next.
  4. Click the Browse button and select the CommaSeparatedValues library.
  5. In the DFDL schema file name field, type CSV, and in the Message name field, type CSV_message, then click Next.
  6. Next, specify the details of the CSV file. In the End of record character list, select Any newline. Select the check box The first record is a header. In the Number of fields field, type 5. Select CSV Escape Scheme, then click Finish.

Two DFDL schemas are created in the library. The CSV.xsd schema models the overall CSV message. The CommaSeparatedFormat.xsd schema defines suitable default values for DFDL properties.

Exploring the schemas

The CommaSeparatedValues library is shown in the Application Development view of your workspace.

The CSV.xsd DFDL schema opens in the DFDL editor. If it does not, double-click the DFDL schema to open it in the editor.

CSV_message is highlighted and models a CSV file with a header record and an unbounded number of body records. Each record has five fields. The main editor view shows the logical components of the message, such as elements and sequences. You can explore the CSV_message structure by expanding the elements.

The physical rendering of each logical component is described by the DFDL properties in the Representation Properties tab. DFDL properties can either be specified locally on the component, or can be inherited from pre-defined sets of DFDL properties. Inherited properties have an icon shown next to them, and hovering the cursor over the icon reveals where the property is defined. In this schema, inherited properties are obtained from the CommaSeparatedFormat.xsd file.

You will be test parsing the example CSV file using CSV_message. The test parsing takes place entirely within the DFDL editor, so there is no message flow in this tutorial, and no Deploy step. Before test parsing, switch to the DFDL Test perspective by clicking the menu item Window > Open Perspective and then selecting DFDL Test. (Alternatively, you can use the toolbar icon ).

Tip: If this Tutorial Steps View tab disappears when you switch perspectives, use the Window > Show View menu option to add it back.

Follow these steps to complete the tutorial

The CSV.xsd DFDL schema is open in the DFDL editor. CSV_message should be highlighted in green.

  1. Test parse the example CSV file:
    1. Click the Test Parse Model button in the DFDL editor toolbar. The Test Parse Model window opens.
    2. In the Message section, select CSV_message.
    3. In the Parser Input section, select Content from a data file then click Browse.
    4. Select the simpleCSV.txt file from CommaSeparatedValues then click OK.
    5. Set the Encoding to ASCII.
    6. Click OK. If asked to confirm switching to the DFDL Test perspective, click Yes.
  2. The results of the test parse are displayed. You should see the message Parsing completed successfully. You can dismiss this message.
  3. You can view the parsed data file in the DFDL Test - Parse view. The results of the parse can be viewed in the DFDL Test - Logical Instance view, as a tree or as XML. You can view a log of the parser actions in the DFDL Test - Trace view.
  4. Test serialize the logical instance that resulted from the parse:
    1. Click the Test Serialize Model button in the DFDL editor toolbar. The Test Serialize Model window opens.
    2. In the Serializer Input section, select Content from a DFDL Test - Logical Instance.
    3. Set the Encoding to ASCII.
    4. Click OK.
  5. The results of the test serialize are displayed. You should see the message Serialization completed successfully. You can dismiss this message.
  6. You can view the serialized data file in the DFDL Test - Serialize view. You can view a log of the serializer actions in the DFDL Test - Trace view.
  7. As an alternative to the toolbar buttons, you can also test from the DFDL Test - Parse and DFDL Test - Serialize views, using the Browse button to select a data source and the green play icon to run the DFDL parser or serializer.

Optional: You can further refine the DFDL schema that you created.

  1. Switch back to the Integration Development perspective.
  2. Give the fields more descriptive names. Expand header and record, then click into each field name and type the new field name.
  3. Model the 'date of birth' field as xs:date. Expand record, select the 5th field, click on string and select date from the list of types.
  4. Save the updated schema. Check the Problems view; there should be no errors.
  5. Use the Test Parse Model button again to test parse using the updated schema.
  6. Check the DFDL Test - Logical Instance view. The new field names and data types are displayed.