Delimited Files
The goal of reading and writing throughout TidierFiles.jl is to use consistent syntax. This functions on this page focus on delimited files and are powered by CSV.jl.
using TidierFiles
read_csv/tsv/delim¤
read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missing_value = ["4"])
#read_csv(file; delim=',', col_names=true, skip=0, n_max=Inf, comment=nothing, missing_value="", col_select=nothing, escape_double=true, col_types=nothing, num_threads=1)
#read_tsv(file; delim='\t', col_names=true, skip=0, n_max=Inf, comment=nothing, missing_value="", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())
#read_delim(file; delim='\t', decimal = '.', groupmark = nothing col_names=true, skip=0, n_max=Inf, comment=nothing, missing_value="", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())
#read_csv2(file; delim=';', decimal = ',', col_names=true, skip=0, n_max=Inf, comment=nothing, missing_value="", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())
#These functions read a delimited file (CSV, TSV, or custom delimiter) into a DataFrame. The arguments are:
Row | ID | Score |
---|---|---|
Int64? | Int64 | |
1 | 3 | 77 |
2 | missing | 85 |
3 | 5 | 95 |
file
: Path or vector of paths to the file(s) or a URL(s).delim
: Field delimiter. Default is ',' forread_csv
, '\t' forread_tsv
andread_delim
.col_names
: Use first row as column names. Can betrue
,false
, or an array of strings. Default istrue
.skip
: Number of lines to skip before reading data. Default is 0.n_max
: Maximum number of rows to read. Default isInf
(read all rows).comment
: Character indicating comment lines to ignore. Default isnothing
.missing_value
: String(s) representing missing values. Default is""
.col_select
: Optional vector of symbols or strings to select columns to load. Default isnothing
.groupmark
: A symbol that separates groups of digits Default isnothing
.decimal
: An ASCII Char argument that is used when parsing float values. Default is '.'.escape_double
: Interpret two consecutive quote characters as a single quote. Default istrue
.col_types
: Optional specification of column types using a Dict. Default isnothing
(types are inferred).num_threads
: Number of threads to use for parallel execution. Default is 1 forread_csv
and the number of available threads forread_tsv
andread_delim
.kwarg
: any CSV.jl argument can be passed to any of the above functions with correct syntax, and it will be supported.
The functions return a DataFrame containing the parsed data from the file.
write_csv
and write_tsv
¤
writecsv(x, file; missingvalue="", append=false, colnames=true, eol="\n", numthreads=Threads.nthreads())
writetsv(x, file; missingvalue="", append=false, colnames=true, eol="\n", numthreads=Threads.nthreads())
These functions write a DataFrame to a CSV or TSV file. The arguments are:
x
: The DataFrame to write.file
: The path to the output file.missing_value
: The string to represent missing values. Default is an empty string.append
: Whether to append to an existing file. Default isfalse
.col_names
: Whether to write column names as the first line. Default istrue
.eol
: The end-of-line character. Default is"\n"
.num_threads
: The number of threads to use for writing. Default is the number of available threads.
read_table
¤
readtable(file; colnames=true, skip=0, nmax=Inf, comment=nothing, colselect=nothing, missingvalue="", numthreads)
This function reads a table from a whitespace-delimited file into a DataFrame. The arguments are:
file
: The path to the file to read.col_names
: Whether the first non-skipped line contains column names. Default istrue
.skip
: Number of lines to skip before processing. Default is 0.n_max
: Maximum number of lines to read. Default isInf
(read all lines).comment
: Character or string indicating comment lines to ignore. Default isnothing
.col_select
: Optional vector of symbols or strings to select columns to load. Default isnothing
.missing_value
: The string representing missing values. Default is""
.num_threads
: The number of threads to use for writing. Default is the number of available threads.
write_table
¤
writetable(x, file; delim='\t', missingvalue="", append=false, colnames=true, eol="\n", numthreads=Threads.nthreads())
This function writes a DataFrame to a file with customizable delimiter and options. The arguments are:
x
: The DataFrame to write.file
: The path to the output file.delim
: The field delimiter. Default is'\t'
(tab-separated).missing_value
: The string to represent missing values. Default is""
.append
: Whether to append to an existing file. Default isfalse
.col_names
: Whether to write column names as the first line. Default istrue
.eol
: The end-of-line character. Default is"\n"
.num_threads
: The number of threads to use for writing. Default is the number of available threads.
This page was generated using Literate.jl.