10

I have a script which calls two commands:

long_running_command | print_progress

The long_running_command prints a progress but I'm unhappy with it. I'm using print_progress to make it more nice (namely, I print the progress in a single line).

The problem: The pipe activates a 4K buffer, to the nice print program gets nothing ... nothing ... nothing ... a whole lot ... :)

How can I disable the 4K buffer for the long_running_command (no, I don't have the source)?

17 accepted

You can use the expect command unbuffer, e.g.

unbuffer long_running_command | print_progress

unbuffer connects to long_running_command via a pseudoterminal (pty), which makes the system treat it as an interactive process, therefore not using the 4-kiB buffering in the pipeline that is the likely cause of the delay.

For longer pipelines, you may have to unbuffer each command (except the final one), e.g.

unbuffer x | unbuffer -p y | z
5

If it is a problem with the libc modifying its buffering / flushing when output does not go to a terminal, you should try socat. You can create a bidirectional stream between almost any kind of I/O mechanism. One of those is a forked program speaking to a pseudo tty.

 socat EXEC:long_running_command,pty,ctty STDIO 

What it does is

  • create a pseudo tty
  • fork long_running_command with the slave side of the pty as stdin/stdout
  • establish a bidirectional stream between the master side of the pty and the second address (here it is STDIO)

If this gives you the same output as long_running_command, then you can continue with a pipe.

Edit : Wow Did not see the unbuffer answer ! Well, socat is a great tool anyway, so I might just leave this answer

2

I don't think the problem is with the pipe. It sounds like your long running process is not flushing its own buffer frequently enough. Changing the pipe's buffer size would be a hack to get round it, but I don't think its possible without rebuilding the kernel - something you wouldn't want to do as a hack, as it probably aversley affect a lot of other processes.

2

It used to be the case, and probably still is the case, that when standard output is written to a terminal, it is line buffered by default - when a newline is written, the line is written to the terminal. When standard output is sent to a pipe, it is fully buffered - so the data is only sent to the next process in the pipeline when the standard I/O buffer is filled.

That's the source of the trouble. I'm not sure whether there is much you can do to fix it without modifying the program writing into the pipe. You could use the setvbuf() function with the _IOLBF flag to unconditionally put stdout into line buffered mode. But I don't see an easy way to enforce that on a program. Or the program can do fflush() at appropriate points (after each line of output), but the same comment applies.

I suppose that if you replaced the pipe with a pseudo-terminal, then the standard I/O library would think the output was a terminal (because it is a type of terminal) and would line buffer automatically. That is a complex way of dealing with things, though.

1

According to this the pipe buffer size seems to be set in the kernel and would require you to recompile your kernel to alter.

1

Another way to skin this cat is the stdbuf program which is part of the GNU coreutils.

stdbuf -i0 -o0 -e0 command
0

Should also work:

long_running_command | grep --line-buffered "" | print_progress
-7

Im not sure but by the sounds of things xargs might be what you need? Read the man page.

xargs is a command on Unix and most Unix-like operating systems. It is useful when one wants to pass a large number of arguments to a command. Until Linux kernel 2.6.23, arbitrarily long lists of parameters could not be passed to a command [1], so xargs will break the list of arguments into sublists small enough to be acceptable.

From Wikipedia