Please upload this excel file to the database

An occasional request to a website I maintain at a non-for-profit organization is to upload records to a MySQL database. The data is given to me as an attachment in an Excel file. These are the steps that get the task done in the fastest time.

  1. Make sure that I get the file to my Gmail account (either directly, or I just forward it from my .edu email to my personal Gmail).
  2. Click on the attachment (in Gmail), not the download, but the “Edit with Google Sheets” button.
  3. As needed, “fix” data, such as date values, and remove formatting (if any).
    1. Arrange the columns to match those of the table into which you want to load the records to.
  4. Download (from Google Sheets) as tab-separated values. “File” -> “Download as” -> “Tab-separated values (.tsv, current sheet)”
    • The benefits are:
      • not having to open Excel locally at all (or OpenOffice, LibreOffice)
      • not having to deal with windows/unix line endings upon exporting from your favorite (local) spreadsheet
  5. Open the terminal (ideally OSX, or Linux, but cywin gets the job done too under Windows).
  6. Get rid of the first line, which usually is the header line.
  7. “Copy” the file to the linux box where MySQL is located (such as via scp, or sftp).
  8. Login to the linux box where MySQL is located.
  9. Backup the database (and copy the backup to another linux box before the next step)
  10. Login to MySQL, and use the command LOAD DATA INFILE to load the data
    • Example: Load data infile ‘/tmp/strains.txt’ into table strains;
    • Pay attention to the message indicating how many records were inserted. It should match the number of records that you expected to insert.
    • Pay attention to the other messages such as “Warnings”. When “Warnings” is not a zero, then do a command: “Show warnings;” to help you identify what the problem was. It could be that lines are truncated because the field in the table was too small, or that you don’t have a value in a column that does not accept nulls.
  11. Check your data. Then tell your boss (or yourself) that the task is “done”.
  12. Disadvantages of this approach:
    • Your google-sheets will be cluttered with the opened excel files. (Not easy to get rid of those things, I’ve tried and just decided not to bother; if needed, just rename the file to something that starts with “z_useless_” so that it is listed alphabetically at the end, and you can remember that you don’t need it).
    • You may have to read the additional options of the LOAD DATA command, when you have columns that have values that depend on other fields. However, MySQL is quite powerful, with functions for strings, and simple arithmetic.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s