Reading a File With Delimeters in Python

Introduction

A tab-delimited file is a well-known and widely used text format for data commutation. By using a structure similar to that of a spreadsheet, information technology also allows users to present data in a way that is like shooting fish in a barrel to empathize and share beyond applications - including relational database management systems.

The IANA standard for tab-separated values requires the first line of the file to contain the field names. Additionally, other lines (which stand for separate records) must take the same number of columns.

Other formats, such equally comma-separated values, often pose the challenge of having to escape commas, which are frequent inside text (every bit opposed to tabs).

Opening Files with Python

Before we swoop into processing tab-separated values, we will review how to read and write files with Python. The following example uses the open() congenital-in function to open a file named players.txt located in the current directory:

                                  1                                                      with                                                      open up                  (                  'players.txt'                  )                                                      equally                                      players_data                  :                                                      2                                      players_data                  .                  read                  (                  )                              

python

The open() function accepts an optional parameter that indicates how the file will be used. If not present, read-only mode is causeless. Other alternatives include, simply are not limited to, 'w' (open for writing in truncate manner) and 'a' (open for writing in append style).

After pressing Enter twice to execute the above suite, we volition see tabs (\t) betwixt fields, and new line breaks (\n) equally tape separators in Fig. one:

Fig. 1

Although we will be primarily concerned with extracting data from files, we can too write to them. Again, notation the use of \n at the beginning to signal a new record and \t to separate fields:

                                  1                                                      with                                                      open                  (                  'players.txt'                  ,                                                      'a'                  )                                                      as                                      players_data                  :                                                      ii                                      players_data                  .                  write                  (                  '\n{}\t{}\t{}\t{}\t{}\t{}\t{}'                  .                  format                  (                  'Trey'                  ,                                                      'Burke'                  ,                                                      '23'                  ,                                                      'one.85'                  ,                                                      '2013'                  ,                                                      '79.iv'                  ,                                                      '23.2'                  )                  )                              

python

Although the format() function helps with readability, there are more efficient methods to handle both reading and writing - all available within the same module in the standard library. This is particularly important if we are dealing with big files.

Introducing the CSV Module

Although it was named afterwards comma-separated values, the CSV module tin can manage parsed files regardless of the field delimiter - be it tabs, vertical bars, or just almost anything else. Additionally, this module provides two classes to read from and write data to Python dictionaries (DictReader and DictWriter, respectively). In this guide we volition focus on the one-time exclusively.

Get-go off, we will import the CSV module:

Next, nosotros will open the file in read-only mode, instantiate a CSV reader object, and use it to read i row at a time:

                                  1                                                      with                                                      open                  (                  'nba_games_november2018_visitor_wins.txt'                  ,                                      newline                                    =                                                      ''                  )                                                      as                                      games                  :                                                      ii                                      game_reader                                    =                                      csv                  .                  reader                  (                  games                  ,                                      delimiter                  =                  '\t'                  )                                                      3                                                      for                                      game                                    in                                      game_reader                  :                                                      4                                                      print                  (                  game                  )                              

python

Although it is not strictly necessary in our case, we volition pass newline = '' every bit an argument to the open up() function as per the module documentation. If our file contains newlines within quoted fields, this ensures that they volition be processed correctly.

Fig. ii shows that each row was read into a list afterwards the above suite was executed:

Fig. 2

Although this undoubtedly looks much improve than our previous version where tabs and new lines were mixed with the actual content, at that place is however room for improvement.

The DictReader Course

To begin, we will create an empty list where nosotros will store each game every bit a separate lexicon:

Finally, we will echo the aforementioned code every bit in a higher place with merely a minor change. Instead of printing each row, we will add information technology to games_list. If you are using Python 3.5 or older, y'all can omit dict() and use games_list.append(game) instead. In Python 3.6 and newer, this function is used to plow the ordered dictionary into a regular i for better readability and easier manipulation.

                                  one                                                      with                                                      open                  (                  'nba_games_november2018_visitor_wins.txt'                  ,                                      newline                                    =                                                      ''                  )                                                      as                                      games                  :                                                      2                                      game_reader                                    =                                      csv                  .                  DictReader                  (                  games                  ,                                      delimiter                  =                  '\t'                  )                                                      3                                                      for                                      game                                    in                                      game_reader                  :                                                      4                                      games_list                  .                  append                  (                  dict                  (                  game                  )                  )                              

python

We can go 1 step further and use list comprehension to return only those games where the visitor score was greater than 130. The following statement creates a new listing called visitor_big_score_games and populates it with each game within games_list where the condition is truthful:

                                  i                                      visitor_big_score_games                                    =                                                      [                  game                                    for                                      game                                    in                                      games_list                                    if                                                      int                  (                  game                  [                  'Visitor score'                  ]                  )                                                      >                                                      130                  ]                              

python

Now that we have a list of dictionaries, we tin can write it to a spreadsheet as explained in Importing Data from Microsoft Excel Files with Python or manipulate information technology otherwise. Some other option consists of writing the list converted to string into a obviously text file named visitor_big_score_games.json for distribution in JSON format:

                                  1                                                      with                                                      open                  (                  'visitor_big_score_games.json'                  ,                                                      'w'                  )                                                      as                                      games                  :                                                      two                                      games                  .                  write                  (                  str                  (                  visitor_big_score_games                  )                  )                              

python

The write() role requires a string as an argument. That is why nosotros had to convert the entire list into a string before performing the write performance.

If you just want to view the list, not turn it into a spreadsheet or a JSON file, you tin alternatively utilise pprint() to display it in a convenient format every bit shown in Fig. iii:

                                  1                                                      import                                      pprint                                    every bit                                      pp                                    2                                      pp                  .                  pprint                  (                  visitor_big_score_games                  )                              

python

Fig. 3

As y'all can see, the possibilities are endless and the merely limit is our imagination!

Summary

In this guide we learned how to import and manipulate information from tab-delimited files with Python. This non only is a highly valuable skill for data scientists, but for web developers and other IT professionals as well.

hernandezrefoll.blogspot.com

Source: https://www.pluralsight.com/guides/importing-data-from-tab-delimited-files-with-python

Related Posts

0 Response to "Reading a File With Delimeters in Python"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel