using DataArrays
using DataFrames
using Base.Dates
Arrays
[ii for ii=1:4]
4-element Array{Int64,1}: 1 2 3 4
Float64[ii for ii=1:4]
4-element Array{Float64,1}: 1.0 2.0 3.0 4.0
Array
of type Any
Any["Hello" 3 4.0 NA]
1×4 Array{Any,2}: "Hello" 3 4.0 NA
[rand(nObs, 2) for nObs in [2, 10]]
2-element Array{Array{Float64,2},1}: [0.938897 0.565; 0.220571 0.595751] [0.348172 0.223396; 0.719497 0.744549; … ; 0.561501 0.0500655; 0.914669 0.105726]
Array{T, 2}
through comprehension[[1 2] for ii=1:4]
4-element Array{Array{Int64,2},1}: [1 2] [1 2] [1 2] [1 2]
Arrays
[ ]
to collection only captures whole collection as single entry of an Array
kk = (1, 2, 3, 4)
[kk]
1-element Array{NTuple{4,Int64},1}: (1, 2, 3, 4)
Array
[kk...]
4-element Array{Int64,1}: 1 2 3 4
type foo
value
end
fooObj = foo(3)
foo(3)
kk = (fooObj, fooObj, fooObj)
[kk...]
3-element Array{foo,1}: foo(3) foo(3) foo(3)
Array
vcat
: allows combination of objects in specified structurevcat([1 2], [1 2])
2×2 Array{Int64,2}: 1 2 1 2
vcat
also works for variable number of input arguments:kk = ([1 2], [3 4], [5 6])
([1 2], [3 4], [5 6])
vcat(kk[1], kk[2], kk[3])
3×2 Array{Int64,2}: 1 2 3 4 5 6
vcat
conveniently transforms tuple of values into concise Array
:vcat(kk...)
3×2 Array{Int64,2}: 1 2 3 4 5 6
vcat([[1 2] for ii=1:4]...)
4×2 Array{Int64,2}: 1 2 1 2 1 2 1 2
[ ]
applied to spliced elements does not implicitly call vcat
anymoreArray{Int,2}
, this does not result in two-dimensional Arraykk = [[1 2] for ii=1:4]
[kk...]
4-element Array{Array{Int64,2},1}: [1 2] [1 2] [1 2] [1 2]
[jj for ii=1:4, jj=1:2]
4×2 Array{Int64,2}: 1 2 1 2 1 2 1 2
vcat
work for data structures different to Array
DataFrames
will return a DataFrame
againdf = DataFrame()
df[:a] = @data([5, 6, NA])
df[:b] = @data([8, NA, NA])
kk = (df, df)
xx = vcat(kk...)
a | b | |
---|---|---|
1 | 5 | 8 |
2 | 6 | NA |
3 | NA | NA |
4 | 5 | 8 |
5 | 6 | NA |
6 | NA | NA |
typeof(xx)
DataFrames.DataFrame
under the hood, comprehensions make use of iterators:
DataFrames
, which returns a tuple with column name and values given as DataArray
for each columndf = DataFrame()
df[:a] = @data([5, 6, NA])
df[:b] = @data([8, NA, NA])
[col for col in eachcol(df)]
2-element Array{Tuple{Symbol,DataArrays.DataArray{Int64,1}},1}: (:a, [5, 6, NA]) (:b, [8, NA, NA])
DataFrame
iterator returns tuple, so that values only (without column name) are obtained through indexing[col[2] for col in eachcol(df)]
2-element Array{DataArrays.DataArray{Int64,1},1}: [5, 6, NA] [8, NA, NA]
two applications of iterators come to mind immediately:
example: iteratively manipulating columns of DataFrame
for col in eachcol(df)
col[2][1] = 10
end
df
a | b | |
---|---|---|
1 | 10 | 10 |
2 | 6 | NA |
3 | NA | NA |
try
for col in eachcol(df)
col[2] = col[2].*10
end
catch e
show(e)
end
MethodError(setindex!, ((:a, [10, 6, NA]), [100, 60, NA], 2), 0x0000000000005549)
for col in eachcol(df)
col[2][:] = col[2].*10
end
df
a | b | |
---|---|---|
1 | 100 | 100 |
2 | 60 | NA |
3 | NA | NA |
Array{Int, 1}
fails:kk = [1, 2, 3, 4]
try
for entry in kk
entry[1] = entry[1]*5
end
catch e
show(e)
end
kk
MethodError(setindex!, (1, 5, 1), 0x000000000000554a)
4-element Array{Int64,1}: 1 2 3 4
Array
of squared entrieskk = [1 2 3 4]
kk2 = [ii.^2 for ii in kk]
1×4 Array{Int64,2}: 1 4 9 16
Array
as we get it from comprehensiondf = DataFrame()
df[:a] = @data([5, 6, NA])
df[:b] = @data([8, NA, NA])
df
a | b | |
---|---|---|
1 | 5 | 8 |
2 | 6 | NA |
3 | NA | NA |
kk = [col[2].*2 for col in eachcol(df)]
2-element Array{DataArrays.DataArray{Int64,1},1}: [10, 12, NA] [16, NA, NA]
vcat
hcat
instead:hcat([col[2].*2 for col in eachcol(df)]...)
3×2 DataArrays.DataArray{Int64,2}: 10 16 12 NA NA NA
map
map
can be customized to the iterator type usedDataFrame
column could be done in two different waysArray
(which contains the column names) will return an Array
df = DataFrame(a = [1, 2, 3], b = [4, 5, 6])
map(nam -> df[nam].*2, names(df))
2-element Array{DataArrays.DataArray{Int64,1},1}: [2, 4, 6] [8, 10, 12]
map
for DataFrame
column iteratordf2 = map(col -> col.*2, eachcol(df))
df2
a | b | |
---|---|---|
1 | 2 | 8 |
2 | 4 | 10 |
3 | 6 | 12 |
map
can also be defined for two collectionsvals1 = [10 20]
vals2 = [40 1]
map(+, vals1, vals2)
1×2 Array{Int64,2}: 50 21
reduce
individual components of a collection can be aggregatedreduce
can have different implementations for each typemap
and reduce
together, individual entries of iterable collections can be manipulated and aggregated to a single result example: calculating row means
df
a | b | |
---|---|---|
1 | 1 | 4 |
2 | 2 | 5 |
3 | 3 | 6 |
meanDf = reduce((x,y) -> (x[2].+y[2])./size(df, 2), eachcol(df))
3-element DataArrays.DataArray{Float64,1}: 2.5 3.5 4.5
example: calculating row sum with weighted columns
map
to calculate weighted columnsreduce
to sum up individual weighted columnsdf = DataFrame(a = [1, 2, 3, 4], b = [4, 5, 6, 7], c = [2, 4, 8, 10])
a | b | c | |
---|---|---|---|
1 | 1 | 4 | 2 |
2 | 2 | 5 | 4 |
3 | 3 | 6 | 8 |
4 | 4 | 7 | 10 |
wgts = [0.4 0.2 0.4]
kk = map((x, y) -> x.*y[2], wgts, eachcol(df))
3-element Array{DataArrays.DataArray{Float64,1},1}: [0.4, 0.8, 1.2, 1.6] [0.8, 1.0, 1.2, 1.4] [0.8, 1.6, 3.2, 4.0]
wgts[1] * [1, 2, 3, 4]
4-element Array{Float64,1}: 0.4 0.8 1.2 1.6
reduce
reduce((x, y) -> (x .+ y), map((x, y) -> x.*y[2], wgts, eachcol(df)))
4-element DataArrays.DataArray{Float64,1}: 2.0 3.4 5.6 7.0
versioninfo()
Julia Version 0.6.0 Commit 9036443 (2017-06-19 13:05 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Pkg.status()
172 required packages: - AbstractFFTs 0.2.0 - Atom 0.6.1 - AutoGrad 0.0.7 - AutoHashEquals 0.1.1 - AxisAlgorithms 0.1.6 - AxisArrays 0.1.4 - BenchmarkTools 0.0.8 - Blink 0.5.3 - Blosc 0.3.0 - BufferedStreams 0.3.3 - BusinessDays 0.7.1 - CSV 0.1.4 - Calculus 0.2.2 - CatIndices 0.0.2 - CategoricalArrays 0.1.6 - Clustering 0.8.0 - CodeTools 0.4.6 - Codecs 0.3.0 - ColorTypes 0.5.2 - ColorVectorSpace 0.4.4 - Colors 0.7.4 - Combinatorics 0.4.1 - Compat 0.28.0 - Compose 0.5.3 - ComputationalResources 0.0.2 - Conda 0.5.3 - Contour 0.3.0 - Convex 0.5.0 - CoordinateTransformations 0.4.1 - CoupledFields 0.0.1 - CustomUnitRanges 0.0.4 - DBAPI 0.1.0 - DSP 0.3.2 - Dagger 0.2.0 - DataArrays 0.6.2 - DataFrames 0.10.0 - DataStreams 0.1.3 - DataStructures 0.6.0 - DecFP 0.3.0 - DecisionTree 0.6.1 - DiffBase 0.2.0 - Distances 0.4.1 - DistributedArrays 0.4.0 - Distributions 0.14.2 - DualNumbers 0.3.0 - FFTViews 0.0.2 - FFTW 0.0.3 - FileIO 0.5.1 - FixedPointNumbers 0.3.9 - Formatting 0.2.1 - ForwardDiff 0.4.2 - GLM 0.7.0 - GR 0.23.0 - GZip 0.3.0 - Gadfly 0.6.3 - Glob 1.1.1 - Graphics 0.2.0 - HDF5 0.8.2 - HTTPClient 0.2.1 - Hexagons 0.1.0 - Hiccup 0.1.1 - HttpCommon 0.2.7 - HttpParser 0.3.0 - HttpServer 0.2.0 - HypothesisTests 0.5.1 - IJulia 1.5.1 - IdentityRanges 0.0.1 - ImageAxes 0.3.1 - ImageCore 0.4.0 - ImageFiltering 0.1.4 - ImageMetadata 0.2.3 - ImageTransformations 0.3.1 - Images 0.11.0 - IndexedTables 0.2.1 - IndirectArrays 0.1.1 - Interact 0.4.5 - Interpolations 0.6.2 - IntervalSets 0.1.1 - IterTools 0.1.0 - Iterators 0.3.1 - JDBC 0.2.0 - JLD 0.6.11 - JSON 0.13.0 - JavaCall 0.5.1 - JuMP 0.17.1 - JuliaWebAPI 0.3.1 - Juno 0.3.0 - KernelDensity 0.3.2 - Knet 0.8.3 - LNR 0.0.2 - LaTeXStrings 0.2.1 - Lazy 0.11.7 - LegacyStrings 0.2.2 - Libz 0.2.4 - LightGraphs 0.9.4 - LightXML 0.5.0 - LineSearches 0.1.5 - Loess 0.3.0 - Logging 0.3.1 - MLBase 0.7.0 - MNIST 0.0.2 - MacroTools 0.3.7 - MappedArrays 0.0.7 - MathProgBase 0.6.4 - MbedTLS 0.4.5 - Measures 0.1.0 - Media 0.3.0 - Mustache 0.1.4 - Mux 0.2.3 - NaNMath 0.2.6 - NamedArrays 0.6.1 - NamedTuples 4.0.0 - NearestNeighbors 0.3.0 - Nettle 0.3.0 - NullableArrays 0.1.1 - ODBC 0.5.2 - OffsetArrays 0.3.0 - Optim 0.7.8 - PDMats 0.7.0 - PaddedViews 0.1.0 - Parameters 0.7.2 - ParserCombinator 1.7.11 - PlotRecipes 0.2.0 - PlotlyJS 0.6.4 - Plots 0.12.3+ master - Polynomials 0.1.5 - PooledArrays 0.1.1 - PositiveFactorizations 0.0.4 - Primes 0.1.3 - ProtoBuf 0.4.0 - PyCall 1.14.0 - PyPlot 2.3.2 - QuadGK 0.1.2 - QuantEcon 0.12.1 - Query 0.6.0 - RCall 0.7.3 - RDatasets 0.2.0 - RangeArrays 0.2.0 - Ratios 0.1.0 - Reactive 0.5.2 - Reexport 0.0.3 - Requests 0.5.0 - Requires 0.4.3 - ReverseDiffSparse 0.7.3 - Rmath 0.1.7 - Roots 0.4.0 - Rotations 0.5.0 - Rsvg 0.1.0 - SCS 0.3.3 - SHA 0.3.3 - SIUnits 0.1.0 - ScikitLearnBase 0.3.0 - ShowItLikeYouBuildIt 0.0.1 - Showoff 0.1.1 - SimpleTraits 0.5.0 - SortingAlgorithms 0.1.1 - SpecialFunctions 0.2.0 - StatPlots 0.4.2 - StaticArrays 0.6.1 - StatsBase 0.17.0 - StatsFuns 0.5.0 - TexExtensions 0.1.0 - TextParse 0.1.6 - TiledIteration 0.0.2 - TimeSeries 0.10.0 - Tokenize 0.1.8 - URIParser 0.1.8 - UnicodePlots 0.2.5 - WeakRefStrings 0.2.0 - WebSockets 0.2.3 - WoodburyMatrices 0.2.2 - ZMQ 0.4.3 17 additional packages: - BaseTestNext 0.2.2 - BinDeps 0.6.0 - Cairo 0.3.1 - DataValues 0.2.0 - DocStringExtensions 0.4.0 - Documenter 0.11.2 - DynAssMgmt 0.0.0- master (unregistered) - EconDatasets 0.0.2+ master - GeometryTypes 0.4.2 - Gtk 0.13.0 - IterableTables 0.4.2 - LibCURL 0.2.2 - NetworkLayout 0.1.1 - PlotThemes 0.1.4 - PlotUtils 0.4.3 - RData 0.1.0 - RecipesBase 0.2.2
scriptEndIsReached = true
true