Scott Matthewman

Creating a CSV Export library for Swift (Part 1)

For My SwiftUI app I needed to export data to CSV. Public open source packages didn't quite fit my needs – so why not write my own?

I'm currently building a large and rather complicated app in SwiftUI that involves quite a lot of data. While the app handles most tasks internally, there are some features of spreadsheets like Excel or Numbers that could do a better job of analysing the data, and there are some tabular views that I'd want to share with people who don't have a copy of the app.

One of the best supported file formats for sharing tabular data is CSV (comma separated value). This represents tabular data as one line of text per row, with each item separated by commas. For example, the table:

IDDoctorNotesFirst episode
1William HartnellLater played by Richard Hurndall and David Bradley23 November 1963
2Patrick Troughton5 November 1966
3Jon Pertwee3 January 1970
4Tom BakerNo relation to the Sixth Doctor, Colin Baker28 December 1974

would be expressed as

ID,Full Name,Description,First appearance
1,William Hartnell,Later played by Richard Hurndall and David Bradley,1963-11-23
2,Patrick Troughton,,1966-11-05
3,Jon Pertwee,,1970-01-03
4,Tom Baker,"No relation to the Sixth Doctor, Colin Baker",1974-12-28

The Swift Package Index has a fair few CSV parsers available, and there are several that would do the job. But none of them quite works in the way I want, or has the exact feature set that I require. So, why not write my own?

Enter SwiftCSVEncoder.

Basic principles

From the start, I wanted my encoder system to support a number of features:

  1. Strongly typed. A CSV file is a collection of string data, but the information we use to build it may not be. I want to be able to rely on the compiler ensuring that I don't accidentally pass something to the encoder that it doesn't know how to handle.

  2. Easy to understand syntax. Code that is understood by computers is all very well – for it to be maintainable, it absolutely needs to be readble by humans.

  3. Allows for multiple CSV formats from each object. Codable is the gold standard for encoding objects in Swift – but it's defined at the object level: there is one and only one way in which an object marked Codable is going to be structured in output files.

    In contrast, I may need multiple CSV files with different combinations of columns for different needs. So while each CSV output definition needs to know about its source data, it shouldn't be coupled to it as closely as Codable is.

  4. Handles primitive types automatically. CSV has very common ways of encoding numbers and text strings, so we shouldn't need to to do anything extra to work with those data types.

  5. Handles other common data types sensibly. Some data types used commonly in Swift applications, including UUID and Date, might need a little bit of massaging to get them into a format that CSV encoding can understand. This should be easy to do, even when needs might change from file to file. For instance, the CSV specification doesn't specify how dates should be formatted, so we may have to tweak those values depending on which application is going to import them.

  6. Doesn't worry about importing CSV files. Parsing CSV and knowing how to convert those into data objects isn't something I need, so I don't want to overcomplicate things by worrying about that.

Devising a syntax

CSV files represent a table of rows and columns, so let's use those terms in our syntax. Let's start at the end, with an idea about the sort of syntax we want to use:

let table = CSVTable(
              columns: [
                CSVColumn("ID", /* how to get the ID */),
                CSVColumn("Name", /* how to get the name */),
                // etc.
              ]
            )
table.export(rows: myData)

When it comes to "how to get the attribute" code, this needs to be something that can be applied consistently for every record in the collection we're exporting. So let's provide a block that takes the row object, and returns a value:

let table = CSVTable(
               columns: [
                  CSVColumn("ID", attribute: { row in row.id }), // { $0.id } would also work
                  // trailing block syntax also works
                  CSVColumn("Name") { row in row.name } 
                  // etc.
               ]
            )
table.export(rows: myData)

We can instantly see this is flexible: if we want to add a person's full name but only have their separate first and last names, we could calculate this in the block:1

CSVColumn("Full Name") { [$0.firstName, $1.lastName].joined(separator: " ") }

We might also want a shorthand approach - if all the block is doing is retrieving an attribute, we could have the option of using a key path. For example:

CSVColumn("ID", \.id)

For all this to work, though, CSVTable and CSVColumn will need to know the data type of the row we're using as a source. Thankfully Swift's generics will be able to help us here:

let myData: [Person] = /* ... */

let table = CSVTable<Person>(
               columns: [
                  CSVColumn("ID", \.id),
                  CSVColumn("Name", \.name) 
                  // etc.
               ]
            )
table.export(rows: myData)

CSVColumn will also need to know the type of record it's dealing with, but we should be able to construct our definitions in such a way that the compiler will automatically infer that for us. If we're defining a CSVTable for Person, then it stands to reason that the table's CSVColumn records also relate to Person objects.

So that's the basic top level syntax done. Time to get into the code.

Building up the code

Before we define the structure for CSVTable and CSVColumn we need to think about what those definition blocks will actually return. We could define the block as having to return a String, which would work just fine for properties that were strings themselves:

CSVColumn<Person>("Name") { $0.name }

But for a property that was any other type that'd be a bit of a faff:

CSVColumn<Person>("Age") { String($0.age) }

What we can do instead is a define a protocol, which I'm going to call CSVEncodable. Data types that conform to CSVEncodable will need to define how to turn themselves into a string. For example:

protocol CSVEncodable {
   func encode() -> String
}

extension String: CSVEncodable {
   func encode() -> String { self }
}
extension Int: CSVEncodable {
   func encode() -> String { String(self) }
}
extension UUID: CSVEncodable {
   func encode() -> String { uuidString }
}
// etc.

This means that I can then define my columns to accept a block that takes a record, and returns any data type that conforms to CSVEncodable:

struct CSVColumn<Record> {
   var header: String
   var attribute: (Record) -> CSVEncodable
   
   init(_ header: String, attribute: @escaping (Record) -> CSVEncodable) {
      self.header = header
      self.attribute = attribute
   }
}

And we can add an alternative initializer for the keypath-based approach, telling the compiler that the keypath must point to an attribute of our Record generic type, and the attribute must conform to CSVEncodable:

extension CSVColumn {
   init<T: CSVEncodable>(_ header: String, _ keyPath: KeyPath<Record, T>) {
      self.init(header, attribute: { $0[keyPath: keyPath] })
   }
}

This makes CSVTable's initial definition relatively easy:

struct CSVTable<Record> {
   var columns: [CSVColumn<Record>]
}

Putting it together

So now we have our CSV table defined, it's time to generate the output!

That basically boils down to:

  1. Loop through a collection of items
  2. For each item, loop through all the CSVColumn entries and collect the output of the attribute blocks
  3. Join those items up with commas to form a single line per item
  4. Join those lines with line breaks

So that may look like:

extension CSVTable {
   func export(rows: [Record]) -> String {
      // loop through all rows, and collect the results
      let csvRows = rows.map { record in
         // loop through all columns, encode them to Strings and collect the results 
         let values = columns.map { column in
            column.attribute(record).encode()
         }
         return values.joined(separator: ",")
      }
      return csvRows.joined(separator: "\r\n")
   }
}

And that, at its core, is our CSV encoder pretty much done. We create a definition of how the table of CSV elements is created, we give it a collection of data objects, and get a list of comma-separated results back.

There's slightly more to account for before this encoder is fully working though. Some of the things not covered here:

  • Adding a header row to the start of the CSV file
  • Dealing with differing date format needs and other variations
  • How to ensure that strings including special characters (including commas and quote marks) don't break the CSV format

Those things are already handled by SwiftCSVEncoder so if you want to see how I've handled them, feel free to browse the source code. Otherwise, I'll be back to cover these and other tune-ups in a second blog post.

Footnotes

  1. It's worth pointing out that treating names in this way is poor for internationalisation purposes, as it presumes a lot about how people's names are structured across the globe. A better approach would be to think of "given name" and "family name", and use Apple's PersonNameComponentsFormatter to build up a culturally appropriate construction. Or if you don't really have a reason why first name and last name need to be treated separately, keep everything in a name field.

© 2024 Scott Matthewman. All rights reserved.