2020-02-13

Standard C function to read lines from a file

[update: As I hoped, there is a simple answer, getline(), see comments, thank you Charles Lecklider]

The classic is fgets(), it is simple, and easy to use...


Of course, for some reason, fgets() gives you the line endings, so I usually end up with more like.


The problem, of course, is you have a line length. This is also an advantage in that you constrain the lines and don't have random memory allocation issues, but computers have so much memory and VM these days. How many times have I seen this code, and seen someone have to change 1000 to 10000 one day?

What I would like is a simple function that reads a line and mallocs space as needed. Indeed, it could return the allocated space or NULL for error (EOF, or malloc fail). You'd have to free it, but no big issue. Would also be nice if (a) it stripped the line ending as I literally NEVER want that in the line, and (b) seamlessly handled bloody DOS style carriage returns...


So whilst trying to explain some basic C to my mates, whilst at sea, in the middle of the Atlantic, I tried to explain this whilst making a simple CSV file parsing program for them. We did some googling, and found that I am not alone in trying to find such a function. It seems that fscanf() may be the answer. [update: clearly I did not google well enough!]

To be honest fscanf() is a function I just don't use enough. It is very powerful, but I always find myself parsing things more directly. However, I had not considered it as a means to just get a line.

The magic incantation is something like...


This reads any characters up to a newline, allocates space (that is what m is), and stores in line. Just what we need. A minor variation to handle carriage returns seems to work too...


Bingo, we have our magic line malloc file reader function. Perfect.

And get this, reading the man page it is clear that using the [ function does not consume the leading white space, which is perfect... So all good


Except that is not what happens. We did the CSV stuff, and then went on to TSV (tab separated) and magically leading TABs (i.e. empty first field) were stripped by fscanf()

Why?!?!?!?!?!

Please someone tell me I am being thick and that there is a standard function to do just this. Yes, I could write my own, but this is surely so basic it should be standard C library stuff.

[code mistakes in examples left in for the reader to find]


8 comments:

  1. Perhaps separate the functions of reading a line & then processing the resulting string? Essentially this is how it works in Python:

    import fileinput
    for line in fileinput.input()
    line = line.strip()
    ...

    or other ways of processing the line. Python has a lot of methods on string objects for munging them - better than the standard ones in C. If you are writing little scripts to do stuff then Python is the way to go unless speed or volume is of the essence. Even then libraries with C under the hood help a lot - I'm just now using FFTs in numpy whereas I would have used fftw3 in C previously.

    ReplyDelete
  2. ssize_t getline(char **lineptr, size_t *n, FILE *stream);

    After the call n contains the length of the line so stripping the line ending is always O(n) - where n in this case is the number of terminating chars.

    It'll malloc() or even realloc() - I think it ticks all the boxes you're looking for.

    ReplyDelete
    Replies
    1. Indeed: it's designed so that you can just call it in a loop with the same buffer and n over and over again:

      size_t size = 0;
      char *line = NULL;
      while ((size = getline (&line, &size, stream)) >= 0)
      ...
      /* then check for feof(), ferror() etc. */

      Delete
    2. It does indeed work perfectly. Looks like 2008, so long after I learned C, which is what sometimes catches me out. Thank you - just what I hoped for.

      Delete
    3. It was very annoying when it was introduced, because, y'know, getline() was not a reserved identifier before POSIX.1 2008, and it's a pretty obvious name, so a *lot* of programs had used it, as a function name, as a variable name, you name it... and they all broke. Nice forward planning!

      Delete
    4. (... oh, obviously, you'll probably want to free() the buffer after the loop is over, too. It gets reused repeatedly by getline, so if you stash it somewhere, remember to strdup() it first. Yes, I made that mistake.)

      Delete
  3. // One of the many reason I love C++!
    #include
    #include
    #include

    int main(int argc, char** argv) {
    if (argc < 2) {
    std::cout << "Usage:" << argv[0] << " \n";
    return -1;
    }

    std::string line;
    std::ifstream infile(argv[1]);

    if (infile) {
    while (getline(infile, line)) {
    std::cout << line << '\n';

    }
    }
    infile.close();
    return 0;
    }

    ReplyDelete
    Replies
    1. And you really think that someone besides you will find this more readable than Nick's example above?

      C++ most often transforms simple C code to inelegant garbage made of concatenation of unlikely operators, without bringing any value, except providing you some extra time to drink your coffee while it compiles of course.

      Delete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.