How To Compare Two Rows In A Data Frame
Compare 2 data.frames to find the rows in information.frame i that are not present in data.frame two
Solution 1: [1]
sqldf
provides a nice solution
a1 <- data.frame(a = ane:5, b=letters[ane:v]) a2 <- data.frame(a = 1:iii, b=letters[one:3]) require(sqldf) a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
And the rows which are in both data frames:
a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')
The new version of dplyr
has a part, anti_join
, for exactly these kinds of comparisons
require(dplyr) anti_join(a1,a2)
And semi_join
to filter rows in a1
that are besides in a2
semi_join(a1,a2)
Solution 2: [2]
This doesn't answer your question direct, but it volition give you lot the elements that are in common. This can exist done with Paul Murrell's packet compare
:
library(compare) a1 <- data.frame(a = 1:5, b = messages[ane:5]) a2 <- data.frame(a = 1:iii, b = letters[1:three]) comparison <- compare(a1,a2,allowAll=TRUE) comparing$tM # a b #1 1 a #2 2 b #iii iii c
The role compare
gives yous a lot of flexibility in terms of what kind of comparisons are allowed (east.g. irresolute order of elements of each vector, irresolute order and names of variables, shortening variables, changing case of strings). From this, you lot should be able to figure out what was missing from one or the other. For example (this is not very elegant):
difference <- data.frame(lapply(1:ncol(a1),function(i)setdiff(a1[,i],comparison$tM[,i]))) colnames(difference) <- colnames(a1) difference # a b #one four d #2 5 e
Solution three: [3]
Solution 4: [4]
Information technology is certainly not efficient for this detail purpose, but what I often do in these situations is to insert indicator variables in each data.frame and so merge:
a1$included_a1 <- TRUE a2$included_a2 <- TRUE res <- merge(a1, a2, all=TRUE)
missing values in included_a1 will note which rows are missing in a1. similarly for a2.
I problem with your solution is that the column orders must lucifer. Another trouble is that it is piece of cake to imagine situations where the rows are coded as the same when in fact are dissimilar. The reward of using merge is that y'all go for free all fault checking that is necessary for a good solution.
Solution five: [five]
I wrote a packet (https://github.com/alexsanjoseph/compareDF) since I had the same issue.
> df1 <- data.frame(a = ane:v, b=messages[1:v], row = 1:v) > df2 <- data.frame(a = 1:3, b=letters[i:3], row = 1:3) > df_compare = compare_df(df1, df2, "row") > df_compare$comparison_df row chng_type a b 1 4 + 4 d 2 v + 5 east
A more complicated example:
library(compareDF) df1 = data.frame(id1 = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Bulldoze", "Duster 360", "Merc 240D"), id2 = c("Maz", "Maz", "Dat", "Hor", "Dus", "Mer"), hp = c(110, 110, 181, 110, 245, 62), cyl = c(6, 6, 4, 6, 8, 4), qsec = c(16.46, 17.02, 33.00, 19.44, xv.84, twenty.00)) df2 = information.frame(id1 = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", " Hornet Sportabout", "Valiant"), id2 = c("Maz", "Maz", "Dat", "Hor", "Dus", "Val"), hp = c(110, 110, 93, 110, 175, 105), cyl = c(half dozen, half dozen, 4, six, eight, half dozen), qsec = c(16.46, 17.02, 18.61, 19.44, 17.02, twenty.22)) > df_compare$comparison_df grp chng_type id1 id2 hp cyl qsec ane 1 - Hornet Sportabout Dus 175 8 17.02 2 2 + Datsun 710 Dat 181 4 33.00 three 2 - Datsun 710 Dat 93 4 18.61 iv iii + Duster 360 Dus 245 viii xv.84 v 7 + Merc 240D Mer 62 4 20.00 6 8 - Valiant Val 105 6 20.22
The package also has an html_output control for quick checking
df_compare$html_output
How To Compare Two Rows In A Data Frame,
Source: https://localcoder.org/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in
Posted by: fairleyhusith.blogspot.com
0 Response to "How To Compare Two Rows In A Data Frame"
Post a Comment