Importing Data from Access and Excel into Arca - Part I
Important Update: Version 1.5 of the Arca Database Browser application can now import CSV files created in Excel or Access directly, using the FILE->IMPORT->TABLE FROM CSV FILE menu. The ability to export data in CSV format is now also available. Arca Database Browser is a free download. This technote is maintained for historical reasons, and for developers that want a better understanding of the CSV file format.
In some projects it is necessary to bring data previously available in an MS Access database or MS Excel spreadsheet into Arca. The process explained below can be used to import text data directly using the SQL COPY command, and a text file saved in the CSV (comma separated values) format. Other database engines and applications can also export data as CSV text files, and you will be able to use them as well to bring your records into an Arca table.
This technote explains one specific process you can use to prepare data originally available in Access or Excel. It can be adapted to any other tool that exports data as text files.
Let's start with Access. For this tutorial we will export the CUSTOMERS table in the sample Northwind Access database, included with MS Access 2000. Start Access and load the Northwind sample.
Select the 'Tables' control, and double click the 'Customers' table to see its contents:
As you can see, data is also organized in rows and columns in Access. You can only export one table at a time from Access to Arca, and only text and numeric information. The first step is to export the table from Access. For the purposes of this tutorial we will export it as an XLS (Excel ) Document. Choose the FILE->EXPORT menu in Access:
Now select the Excel 97/2000 format, and save your table.
The resulting document can be loaded in MS Excel or any other compatible program for further editing. The image below shows the table data loaded in MS Excel. If you are trying to export an Excel document you can follow the tutorial starting from here, of course.
In order to get this data into Arca we will need to export it as a CSV file, as we mentioned before. But we also need to make sure the records do not contain characters that may confuse the importer. This is true for Arca or any other tool using the CSV format. In this format the newline character is used to separate each row, while a different character (usually a comma) is used to separate each field. The typical final text file may look like
John,Connor,3455445
Maria,Martins,7656565
Anthony,Margel,98989
However, what happens if one of the records have commas and or paragraphs in its content? This can easily break the importer, as it will receive something like:
John,Connor,his address
Maria,Martins,12, Main Street <<< comma in the address
Anthony,Margel,his address
In this case the importer may get confused, since the second line has actually 4 itens instead of the three expected, due to the comma inside the address field. Or worse, something like:
John,favorite quote here,his age
Mary,favorite quote here but it contains a big text with
a paragraph in it, her age
Anthony,another quote here,his age
The newline character inside Mary's favorite quote will cause the importer to believe a new record is starting, messing up the operation.
That's the reason why it is necessary to prepare the data to be exported before you save it as a CSV file. This is done with 3 simple search and replace operation. However, some versions of MS Excel (the 2000 version for example) do not have a customizable CSV exporter, and the ability to find the newline character inside records. In order to do this we suggest that you download OpenOffice , it is available as a free tool and offers a powerful exporter. This is also a way for you to convert Excel files you may receive from your customers even if you do not have Excel installed. The first step is to simply open the file in Open Office Calc, and save it as an OpenOffice Spreadsheet. Below you can see our spreadsheet opened in Calc:
We will not need the first row, which contains the name of the fields in the database. You can delete it easily by simply right-clicking the mouse button in the row number, and selecting DELECT ROWS in the menu that appears.
The next steps are very simple. Open the FIND AND REPLACE menu item. The first operation you need to do is to substitute all occurrences of the \ character by \\ . This is called "escaping". The \ character is used internally by Arca to mark special characters, and we will be using it to mark our newlines and commas so that these will be preserved inside the records. But first we need to make sure the backslash characters themselves are also preserved, and this is accomplished with this simple replace operation. See the screenshot below, and click REPLACE ALL.
The next step is to escape the commas, so that they will be preserved. Simply FIND AND REPLACE again, this time using the parameters shown in the image below. This will replace , by \, in your records. Click REPLACE ALL to apply.
The final step is similar, but here we take advantage of a special feature in OpenOffice Calc that lets us find and replace the invisible newline (or linebreak) character that may exist inside your records. In order to do this you MUST remember to check the REGULAR EXPRESSIONS option in the FIND AND REPLACE window. We will be replacing \n by \n . This is a special replacement mode that will take care of locating and handling the invisible newline characters properly. Click REPLACE ALL to perform the operation.
We are almost done now. We just need to export the file to CSV format. Select FILE->SAVE AS, and choose the TEXT CSV format as shown in the figure below.
A dialog box will appear asking for the configuration of field and text delimiters. The field delimiter should be a comma. We don't use a text delimiter in Arca, so just clear the value (leave it blank.) Click OK to save the file.
If you want you can check the final exported file using a text editor like Word or Notepad. You will observe that each record will be exported in one line of text, and commas are used to delimit the fields. You can also observe that the backslash character will appear to escape the commas inside your records, and also preserve original backslash characters.
In technote AR004 we will show you how to import this final .csv text file into Director using Arca.
|